posts
How to use CRaC with Java applications

How to use CRaC with Java applications

Nov 6, 2023
Dmitry Chuyko
13.8

Liberica JDK with Coordinated Restore at Checkpoint (CRaC) support will enable the developers to bring their applications to a new level, with startup and warmup times reduced to mere milliseconds! But like with any new technology, you need to familiarize yourself with CRaC before integrating it into enterprise development.

This article provides a step-by-step tutorial on using CRaC with Java projects, including Spring Boot applications.

Prerequisites

  • Liberica JDK 17 with CRaC support (Although CRaC support is available for Liberica JDK 17 and 21, in this article, we assume JDK 17 with CRaC to be your default JDK). You can download the Java builds with CRaC here.
  • Spring Boot 3.2 that supports CRaC
  • Your favorite IDE

Note that with deb and rpm packages, CRaC will work out of the box because they grant the necessary access rights to the criu executable under the hood. In the case of tar.gz packages, we have to grant these rights manually.

If you downloaded the tar.gz file, run the following commands for Liberica JDK 17.0.9 with CRaC:

$ sudo chown root:root jdk-17.0.9-crac/lib/criu
$ sudo chmod u+s jdk-17.0.9-crac/lib/criu

Check the owner and permissions of the criu executable:

$ ls -al jdk-17.0.9-crac/lib/criu
$ -rwsrwxr-x 1 root root 8478552 Oct 30 17:31 jdk-17.0.9-crac/lib/criu

Simple Java application

Create a simple Java application as follows:

public class Example {
    public static void main(String args[]) throws InterruptedException {
        // This is a part of the saved state
        long startTime = System.currentTimeMillis();
        for(int counter: IntStream.range(1, 100).toArray()) {
            Thread.sleep(1000);
            long currentTimeMillis = System.currentTimeMillis();
            System.out.println("Counter: " + counter + "(passed " + (currentTimeMillis-startTime) + " ms)");
            startTime = currentTimeMillis;
        }
    }
}

Note that as CRaC preserves the exact state of the running application, the snapshot may contain secrets or other kinds of sensitive information, so you should consider and eliminate possible security risks when working with this feature. 

Verify that you are using Liberica JDK with CRaC by checking the Java version, and then compile and start the application with CRaC support:

$ java --version
openjdk version "17.0.9" 2023-10-17 LTS
OpenJDK Runtime Environment (build 17.0.9+14-LTS)
OpenJDK 64-Bit Server VM (build 17.0.9+14-LTS, mixed mode, sharing)
$ javac Example.java
$ java -XX:CRaCCheckpointTo=checkpoint-dir Example

The -XX:CRaCCheckpointTo=checkpoint-dir option points to the directory where the JVM data will be stored upon the checkpoint.

Make a checkpoint with jcmd:

$ jcmd Example JDK.checkpoint
86221:
CR: Checkpoint ...

The application console output:

Counter: 1(passed 1007 ms)
Counter: 2(passed 1010 ms)
Counter: 3(passed 1001 ms)
Counter: 4(passed 1000 ms)
Counter: 5(passed 1000 ms)
Counter: 6(passed 1000 ms)
Nov 01, 2023 5:05:36 PM jdk.internal.crac.LoggerContainer info
INFO: Starting checkpoint
Killed

Here comes the most interesting part — let’s try to restore the application. Run the following command (please note that only the -XX:CRaCRestoreFrom option is passed as the java argument):

$ java -XX:CRaCRestoreFrom=checkpoint-dir

Output:

$ java -XX:CRaCRestoreFrom=checkpoint-dir
Output:
Counter: 7(passed 91124 ms)
Counter: 8(passed 1000 ms)
Counter: 9(passed 1001 ms)
Counter: 10(passed 1000 ms)
Counter: 11(passed 1000 ms)
Counter: 12(passed 1001 ms)
Counter: 13(passed 1000 ms)
Counter: 14(passed 1000 ms)

The passed time in milliseconds in line Counter: 7(passed 91124 ms) shows that the application was interrupted and then restored.

Solving the possible issues

To demonstrate the potential issues, let’s write another application:

import java.util.concurrent.Executors;
import java.util.concurrent.ScheduledExecutorService;
import java.util.concurrent.TimeUnit;

public class ExampleWithCRaC {
    private ScheduledExecutorService executor;
    private long startTime = System.currentTimeMillis();
    private int counter = 0;

    public static void main(String args[]) throws InterruptedException {
        ExampleWithCRaC exampleWithCRaC = new ExampleWithCRaC();
        exampleWithCRaC.startTask();
    }

    private void startTask() throws InterruptedException {
        executor = Executors.newScheduledThreadPool(1);
        executor.scheduleAtFixedRate(() -> {
            long currentTimeMillis = System.currentTimeMillis();
            System.out.println("Counter: " + counter + "(passed " + (currentTimeMillis-startTime) + " ms)");
            startTime = currentTimeMillis;
            counter++;
        }, 1, 1, TimeUnit.SECONDS);
        Thread.sleep(1000*30);
        executor.shutdown();
    }
}

Start the application and make a checkpoint:

$ java -XX:CRaCCheckpointTo=checkpoint-dir ExampleWithCRaC
$ jcmd ExampleWithCRaC JDK.checkpoint
96828:
CR: Checkpoint ...

The output will be:

Counter: 0(passed 1007 ms)
Counter: 1(passed 999 ms)
Counter: 2(passed 1000 ms)
Counter: 3(passed 1000 ms)
Counter: 4(passed 1000 ms)
Counter: 5(passed 1000 ms)
Counter: 6(passed 1000 ms)
Counter: 7(passed 1000 ms)
Counter: 8(passed 1000 ms)
Counter: 9(passed 1000 ms)
Counter: 10(passed 1000 ms)
Nov 01, 2023 5:41:54 PM jdk.internal.crac.LoggerContainer info
INFO: Starting checkpoint
Killed

Now, let’s try to restore the application:

$ java -XX:CRaCRestoreFrom=checkpoint-dir
Counter: 11(passed 64673 ms)

The application started and then finished unexpectedly — we expected to get some iterations of the counter, but instead, more than 30 seconds passed, and the app finished.

So the application should handle the checkpoint event. This can be achieved with classes from the jdk.crac package shipped with Liberica JDK with CRaC.

Let’s modify our example and restart the counter threads. The jdk.crac.Resource interface allows us to handle the checkpoint and restore events by implementing the beforeCheckpoint and afterRestore methods.

import java.util.concurrent.Executors;
import java.util.concurrent.ScheduledExecutorService;
import java.util.concurrent.TimeUnit;

import jdk.crac.Context;
import jdk.crac.Core;
import jdk.crac.Resource;

public class ExampleWithCRaCRestore {
    private ScheduledExecutorService executor;
    private long startTime = System.currentTimeMillis();
    private int counter = 0;

    class ExampleWithCRaCRestoreResource implements Resource {
        @Override
        public void beforeCheckpoint(Context<? extends Resource> context) throws Exception {
            executor.shutdown();
            System.out.println("Handle checkpoint");
        }

        @Override
        public void afterRestore(Context<? extends Resource> context) throws Exception {
            System.out.println(this.getClass().getName() + " restore.");
            ExampleWithCRaCRestore.this.startTask();
        }
    }

    public static void main(String args[]) throws InterruptedException {
        ExampleWithCRaCRestore exampleWithCRaC = new ExampleWithCRaCRestore();
        Core.getGlobalContext().register(exampleWithCRaC.new ExampleWithCRaCRestoreResource());
        exampleWithCRaC.startTask();
    }

    private void startTask() throws InterruptedException {
        executor = Executors.newScheduledThreadPool(1);
        executor.scheduleAtFixedRate(() -> {
            long currentTimeMillis = System.currentTimeMillis();
            System.out.println("Counter: " + counter + "(passed " + (currentTimeMillis-startTime) + " ms)");
            startTime = currentTimeMillis;
            counter++;
        }, 1, 1, TimeUnit.SECONDS);
        Thread.sleep(1000*30);
        executor.shutdown();
    }

}

Let’s try one more time. Compile the application and make a checkpoint: 

$ javac ExampleWithCRaCRestore.java
$ java -XX:CRaCCheckpointTo=checkpoint-dir ExampleWithCRaCRestore
Counter: 0(passed 1007 ms)
Counter: 1(passed 999 ms)
Counter: 2(passed 1000 ms)
Counter: 3(passed 1000 ms)
Counter: 4(passed 1000 ms)
Counter: 5(passed 1000 ms)
Counter: 6(passed 1000 ms)
Nov 01, 2023 5:54:04 PM jdk.internal.crac.LoggerContainer info
INFO: Starting checkpoint
Handle checkpoint
Killed

Now, run

$ jcmd ExampleWithCRaCRestore JDK.checkpoint
100389:
CR: Checkpoint ...

Finally, let’s restore our application:

$ java -XX:CRaCRestoreFrom=checkpoint-dir
ExampleWithCRaCRestore$ExampleWithCRaCRestoreResource restore.
Counter: 7(passed 61407 ms)
Counter: 8(passed 1000 ms)
Counter: 9(passed 1000 ms)
Counter: 10(passed 1000 ms)
Counter: 11(passed 1000 ms)

The issue is solved!

Reference Spring Boot application 

First of all, let's try to use CRaC with the previous Spring Boot version. We will use the Spring Boot Petclinic application as an example. The source code is available here.

After pulling the project, build the jar file with mvn package and use the following command:

$ java -XX:CRaCCheckpointTo=cr -jar ./target/spring-petclinic-3.2.0-SNAPSHOT.jar


              |\      _,,,--,,_
             /,`.-'`'   ._  \-;;,_
  _______ __|,4-  ) )_   .;.(__`'-'__     ___ __    _ ___ _______
 |       | '---''(_/._)-'(_\_)   |   |   |   |  |  | |   |       |
 |    _  |    ___|_     _|       |   |   |   |   |_| |   |       | __ _ _
 |   |_| |   |___  |   | |       |   |   |   |       |   |       | \ \ \ \
 |    ___|    ___| |   | |      _|   |___|   |  _    |   |      _|  \ \ \ \
 |   |   |   |___  |   | |     |_|       |   | | |   |   |     |_    ) ) ) )
 |___|   |_______| |___| |_______|_______|___|_|  |__|___|_______|  / / / /
 ==================================================================/_/_/_/

:: Built with Spring Boot :: 3.2.0


2023-11-24T02:24:08.205-08:00  INFO 17053 --- [           main] o.s.s.petclinic.PetClinicApplication     : Starting PetClinicApplication v3.2.0-SNAPSHOT using Java 21.0.1 with PID 17053 (.../target/spring-petclinic-3.2.0-SNAPSHOT.jar started ...)
...

Checkpoint the running application:

$ jcmd spring-petclinic JDK.checkpoint
17053:
An exception during a checkpoint operation:
jdk.crac.CheckpointException
        Suppressed: jdk.crac.impl.CheckpointOpenSocketException: sun.nio.ch.ServerSocketChannelImpl[/[0:0:0:0:0:0:0:0]:8080]
                at java.base/jdk.internal.crac.JDKSocketResourceBase.lambda$beforeCheckpoint$0(JDKSocketResourceBase.java:44)
                at java.base/jdk.crac.Core.checkpointRestore1(Core.java:174)
                at java.base/jdk.crac.Core.checkpointRestore(Core.java:299)
                at java.base/jdk.crac.Core.checkpointRestoreInternal(Core.java:312)

As you can see, CRaC failed to perform a checkpoint due to jdk.crac.CheckpointException(jdk.crac.impl.CheckpointOpenSocketException).

To use CRaC with new Spring Boot and Spring Framework we need to add dependency on org.crac/crac package to pom.xml.

<dependency>
    <groupId>org.crac</groupId>
    <artifactId>crac</artifactId>
    <version>1.4.0</version>
</dependency>

Let's check the difference of our updated pom.xml:

$ git diff
diff --git a/pom.xml b/pom.xml
index 287a08a..f403155 100644
--- a/pom.xml
+++ b/pom.xml
@@ -36,6 +36,11 @@
   </properties>

   <dependencies>
+    <dependency>
+        <groupId>org.crac</groupId>
+        <artifactId>crac</artifactId>
+        <version>1.4.0</version>
+    </dependency>
     <!-- Spring and Spring Boot dependencies -->
     <dependency>
       <groupId>org.springframework.boot</groupId>

Now, let's check the dependency and used version for Spring Framework using Maven dependency command:

$ mvn dependency:tree  | grep "spring-boot:jar"
[INFO] |  +- org.springframework.boot:spring-boot:jar:3.2.0:compile

$ mvn dependency:tree  | grep "spring-core:jar"
[INFO] |  +- org.springframework:spring-core:jar:6.1.1:compile

As a result, Spring Boot 3.2 with Spring Framework 6.1.1 will be used for building the Petclinic app. Build it as usual with mvn clean package and start:

java -XX:CRaCCheckpointTo=cr -jar ./target/spring-petclinic-3.2.0-SNAPSHOT.jar


              |\      _,,,--,,_
             /,`.-'`'   ._  \-;;,_
  _______ __|,4-  ) )_   .;.(__`'-'__     ___ __    _ ___ _______
 |       | '---''(_/._)-'(_\_)   |   |   |   |  |  | |   |       |
 |    _  |    ___|_     _|       |   |   |   |   |_| |   |       | __ _ _
 |   |_| |   |___  |   | |       |   |   |   |       |   |       | \ \ \ \
 |    ___|    ___| |   | |      _|   |___|   |  _    |   |      _|  \ \ \ \
 |   |   |   |___  |   | |     |_|       |   | | |   |   |     |_    ) ) ) )
 |___|   |_______| |___| |_______|_______|___|_|  |__|___|_______|  / / / /
 ==================================================================/_/_/_/

:: Built with Spring Boot :: 3.2.0


2023-11-24T02:28:52.989-08:00  INFO 17325 --- [           main] o.s.s.petclinic.PetClinicApplication     : Starting PetClinicApplication v3.2.0-SNAPSHOT using Java 21.0.1 with PID 17325 (.../target/spring-petclinic-3.2.0-SNAPSHOT.jar started ..)
2023-11-24T02:28:52.995-08:00  INFO 17325 --- [           main] o.s.s.petclinic.PetClinicApplication     : No active profile set, falling back to 1 default profile: "default"
2023-11-24T02:28:53.948-08:00  INFO 17325 --- [           main] .s.d.r.c.RepositoryConfigurationDelegate : Bootstrapping Spring Data JPA repositories in DEFAULT mode.
2023-11-24T02:28:53.992-08:00  INFO 17325 --- [           main] .s.d.r.c.RepositoryConfigurationDelegate : Finished Spring Data repository scanning in 37 ms. Found 2 JPA repository interfaces.
2023-11-24T02:28:54.678-08:00  INFO 17325 --- [           main] o.s.b.w.embedded.tomcat.TomcatWebServer  : Tomcat initialized with port 8080 (http)
2023-11-24T02:28:54.686-08:00  INFO 17325 --- [           main] o.apache.catalina.core.StandardService   : Starting service [Tomcat]
2023-11-24T02:28:54.686-08:00  INFO 17325 --- [           main] o.apache.catalina.core.StandardEngine    : Starting Servlet engine: [Apache Tomcat/10.1.16]
2023-11-24T02:28:54.718-08:00  INFO 17325 --- [           main] o.a.c.c.C.[Tomcat].[localhost].[/]       : Initializing Spring embedded WebApplicationContext
2023-11-24T02:28:54.719-08:00  INFO 17325 --- [           main] w.s.c.ServletWebServerApplicationContext : Root WebApplicationContext: initialization completed in 1658 ms
2023-11-24T02:28:55.024-08:00  INFO 17325 --- [           main] com.zaxxer.hikari.HikariDataSource       : HikariPool-1 - Starting...
2023-11-24T02:28:55.184-08:00  INFO 17325 --- [           main] com.zaxxer.hikari.pool.HikariPool        : HikariPool-1 - Added connection conn0: url=jdbc:h2:mem:be9e0d83-4345-497c-84ce-2df90b5740bb user=SA
2023-11-24T02:28:55.185-08:00  INFO 17325 --- [           main] com.zaxxer.hikari.HikariDataSource       : HikariPool-1 - Start completed.
2023-11-24T02:28:55.330-08:00  INFO 17325 --- [           main] o.hibernate.jpa.internal.util.LogHelper  : HHH000204: Processing PersistenceUnitInfo [name: default]
2023-11-24T02:28:55.374-08:00  INFO 17325 --- [           main] org.hibernate.Version                    : HHH000412: Hibernate ORM core version 6.3.1.Final
2023-11-24T02:28:55.400-08:00  INFO 17325 --- [           main] o.h.c.internal.RegionFactoryInitiator    : HHH000026: Second-level cache disabled
2023-11-24T02:28:55.550-08:00  INFO 17325 --- [           main] o.s.o.j.p.SpringPersistenceUnitInfo      : No LoadTimeWeaver setup: ignoring JPA class transformer
2023-11-24T02:28:56.297-08:00  INFO 17325 --- [           main] o.h.e.t.j.p.i.JtaPlatformInitiator       : HHH000489: No JTA platform available (set 'hibernate.transaction.jta.platform' to enable JTA platform integration)
2023-11-24T02:28:56.299-08:00  INFO 17325 --- [           main] j.LocalContainerEntityManagerFactoryBean : Initialized JPA EntityManagerFactory for persistence unit 'default'
2023-11-24T02:28:56.544-08:00  INFO 17325 --- [           main] o.s.d.j.r.query.QueryEnhancerFactory     : Hibernate is in classpath; If applicable, HQL parser will be used.
2023-11-24T02:28:57.751-08:00  INFO 17325 --- [           main] o.s.b.a.e.web.EndpointLinksResolver      : Exposing 13 endpoint(s) beneath base path '/actuator'
2023-11-24T02:28:57.837-08:00  INFO 17325 --- [           main] o.s.b.w.embedded.tomcat.TomcatWebServer  : Tomcat started on port 8080 (http) with context path ''
2023-11-24T02:28:57.850-08:00  INFO 17325 --- [           main] o.s.s.petclinic.PetClinicApplication     : Started PetClinicApplication in 5.204 seconds (process running for 5.568)

Now, we can perform the checkpoint:

$ jcmd spring-petclinic JDK.checkpoint
17325:
CR: Checkpoint ...

The application output is as follows:

2023-11-24T02:29:39.847-08:00  INFO 17325 --- [Attach Listener] jdk.crac                                 : Starting checkpoint
2023-11-24T02:29:39.876-08:00  INFO 17325 --- [Attach Listener] o.s.b.j.HikariCheckpointRestoreLifecycle : Evicting Hikari connections
Killed

We can check that the cr directory is created and contains a lot of files:

$ ls cr
...  core-17415.img  core-17419.img  files.img      mm-17325.img       pstree.img    tty-info.img
...  core-17416.img  cppath          fs-17325.img   pagemap-17325.img  seccomp.img
...  core-17417.img  dump4.log       ids-17325.img  pages-1.img        stats-dump
...  core-17418.img  fdinfo-2.img    inventory.img  perfdata           timens-0.img

These files represent the dumped HotSpot JVM memory with all the necessary information about restoring the application.

Finally, let's restore the Petclinic app:

$ java -XX:CRaCRestoreFrom=cr
2023-11-24T02:30:59.012-08:00  WARN 17325 --- [l-1 housekeeper] com.zaxxer.hikari.pool.HikariPool        : HikariPool-1 - Thread starvation or clock leap detected (housekeeper delta=1m33s725ms577µs950ns).
2023-11-24T02:30:59.015-08:00  INFO 17325 --- [Attach Listener] o.s.c.support.DefaultLifecycleProcessor  : Restarting Spring-managed lifecycle beans after JVM restore
2023-11-24T02:30:59.019-08:00  INFO 17325 --- [Attach Listener] o.s.b.w.embedded.tomcat.TomcatWebServer  : Tomcat started on port 8080 (http) with context path ''
2023-11-24T02:30:59.020-08:00  INFO 17325 --- [Attach Listener] o.s.c.support.DefaultLifecycleProcessor  : Spring-managed lifecycle restart completed (restored JVM running for 50 ms)

You can verify that the application is running successfully on localhost:8080.

As we can see in the log file, we now have extremely low startup (restoring) time, only 50 ms.

To conclude, CRaC integrates conveniently with Spring Boot applications, although more complex projects may require additional testing and troubleshooting.

If you have any questions regarding this feature, feel free to reach out, and our engineers will be happy to help!

Contact us

Subcribe to our newsletter

figure

Read the industry news, receive solutions to your problems, and find the ways to save money.

Further reading