Posts

What is CRaC? A guide to cutting Java startup and warmup from minutes to milliseconds

Apr 11, 2024
Dmitry Chuyko
9.5

In our previous article, we discussed the issues of Java startup. Java services may need dozens of seconds to reach peak performance, and during this period, they process fewer requests and consume more memory. And the process begins from ground zero every time you start your services!

There are several solutions to the problem:

  • Application Class Data Sharing (AppCDS) helps to reduce startup by creating an archive of pre-initialized JVM and application classes so that the application doesn’t initialize the classes from scratch, but takes ready metadata from the dump.
  • GraalVM Native Image uses ahead-of-time compilation and classes initialization to create a single native image with an almost instant startup but may not be suitable for some projects if they rely heavily on the dynamic Java features.
  • Project Leyden uses AppCDS to load class metadata and compiled code during the training runs to reduce Java startup and warmup during the production run. The project is still in the makings, but early-access builds are available.
  • Coordinated Restore at Checkpoint is an OpenJDK Project that helps to minimize startup and warmup and preserve the benefits of JIT-compilation for further performance optimization.

As BellSoft recently released Liberica JDK 17 & 21 builds and ready-to-use containers with CRaC support so that you can already download Java with CRaC support, we would like to shed light on this exciting feature and how it will benefit your Java project.

What is Coordinated Restore at Checkpoint (CRaC)?

CRaC offers Java developers a Checkpoint/Restore API to create an image of a running application at an arbitrary point in time (“checkpoint”), and then start the image from the checkpoint file (snapshot), restoring the application’s state from the moment when the checkpoint was made. Essentially, Java with CRaC enables you to pause the application and restart it from the moment it was paused. Additionally, you can distribute numerous replicas of this file, which is especially relevant for deployment on multiple instances.

Coordinated processes for accident-free checkpoint

Coordinated Checkpoint/Restore makes the application aware that it is being paused and restarted. This way, the application can cancel the checkpoint if it deems the moment unsuitable for saving the state (when it performs certain operations such as saving the user data, for example). In addition, it can perform essential preliminary tasks, such as closing network connections and open file descriptors, and then return to normal operation after restore and react to possible changes in the environment since the checkpoint.

CRIU vs CRaC

CRIU (Checkpoint and Restore in Userspace) is a technology for Linux that serves as a foundation for CRaC. CRIU uses the ptrace kernel interface and allows freezing a running application and restoring it from the saved checkpoint files. The existing OpenJDK CRaC implementation includes CRIU and adds several enhancements and adjustments tailored to Java applications. Namely, CRaC imposes more restrictions on the restore process. For instance, CRIU can save the state of a TCP socket and then restore the connection. With CRaC, all connections must be closed before checkpoint, which makes the whole process more reliable.

Spring Boot with CRaC

Spring Boot currently integrates with CRaC as a Proof-of-Concept. Thanks to CRaC support, Liberica JDK, the recommended runtime for Spring, will help developers in smoothly integrating the functionality into their Spring Boot projects enabling unprecedented startup and warmup speed with minimal code rewriting. Moreover, Alpaquita Containers with CRaC offer “plug-and-play” experience with CRaC API, allowing seamless integration into your workloads and achieving up to 164 times faster startup!

Spring Boot Petclinic and Alpaquita Containers with CRaC support: startup study results

Additionally, in certain cases, Alpaquita Containers with CRaC support may result in final images that have a 10% smaller footprint (you can read more about the experimental setup and conclusions in a dedicated article).

Spring Boot Petclinic and Alpaquita Containers with CRaC: image footprint study results

Looks impressive, doesn’t it? Follow our tutorial on using CRaC with Java in a container and start experimenting with the feature! 

Which applications need CRaC

The CRaC is most beneficial for applications characterized by

  • short runs,
  • frequent restarts,
  • deployment to multiple replicas,
  • possibility to perform a training run.

If Operations engineers allocate too little memory to their instances, the application starts up slowly, leading to increased costs, especially If the enterprises use cloud services where they pay for the time your code executes. But if the instances are too large, companies overpay for resources that are never utilized. Additionally, when a Java application goes through the standard startup and warmup processes, the CPU consumption tends to be higher than in the stabilized state.

However, in the case of using CRaC, applications will start almost instantly and at peak performance, minimizing the latency and optimizing resource consumption.

Main considerations when using CRaC

CRaC represents a stateful approach as it preserves the exact state of a running application at a given time, together with information about the Java heap, native memory, JIT-compiled code, settings, etc. As a result, the snapshot may contain sensitive data. Developers should keep that in mind when working with the technology.They should either carefully assess how the snapshot files are created, stored, and accessed, or ensure that the snapshot is taken at the moment when JVM doesn’t store any sensitive information.

In addition, developers must consider possible issues with randomness. Since the java.util.Random seed is created upon initialization, the pseudo-random numbers upon restore will be predictable. One way to mitigate this issue is to create a new seed in an afterRestore() method. However, a better solution is to use java.security.SecureRandom, that provides more reliable random numbers generation. With SecureRandom, which is integrated into CRaC, the SecureRandom seed (along with NativePRNG) will be cleaned and the random operations will be locked before taking the snapshot. The lock is removed in an afterRestore() method, ensuring secure random number generation after snapshot restoration. 

Download JDK builds with CRaC and start experimenting!

The CRaC API helps you to minimize the startup of your instances and reduce resource consumption without sacrificing the JIT capabilities for further performance optimization if required. With production-ready containers from BellSoft, there is no need to adjust the JDK or the OS to work with CRaC API, allowing you to focus on implementing the feature into your workloads.

Visit the Liberica JDK Download Center to get a JDK build with CRaC or select a container image on our Docker Hub repository. And if you have any questions about the functionality, feel free to contact us, and our engineers will be happy to assist you!

Contact us

Subcribe to our newsletter

figure

Read the industry news, receive solutions to your problems, and find the ways to save money.

Further reading