Waiting five seconds for an application to load is bearable, but the irritation grows every time you repeat the process. Imagine loading an app 100+ times a day. Now, imagine your bank account is charged for every second of your expectation.
Sounds like a nightmare? It becomes a reality when you deploy Java microservices to the cloud and have to restart them constantly. The situation worsens if you use services that charge you for the compute time you consume.
This article looks into ways of cutting Java application startup time and reaching peak performance almost instantly. We hope that you find the solution most suitable for your project.
Table of Contents
Why you have problems with Java application startup
When we say that Java applications start slowly, we mean several consecutive processes: JVM startup, application startup, and JVM warmup. Interestingly, the startup part is not the main culprit of our anguish.
How does a Java application start up?
- First of all, the JVM service code and data get loaded and initialized. Then, the initialization of core classes follows. After that, the dependencies of the main class get initialized. The invocation of the main method drives further execution. The JVM startup takes a few milliseconds on modern hardware.
- After that, the application starts. The application classes are loaded and initialized, the dependencies are resolved, and the necessary resources are loaded. At this stage, application-specific initialization may also occur (such as the database connections and so on). The application startup takes longer than the JVM start: for microservices, this phase can last seconds or up to a minute.
Taken together, JVM and application start yield the time to first operation. But JVM still has a lot to do before reaching the state of peak performance.
Warmup: code interpretation and optimization
During the warmup phase, JVM compiles and optimizes the code. It is a very resource-intense and lengthy process because JVM needs to execute, compile, and profile different versions of machine code to select the most performant one. As a rule, the better the resulting code, the more required optimizations.
Code compilation and optimization take substantially longer than the actual startup and may last several minutes in case of complex applications. In addition, JIT compilation is associated with higher overhead because of the necessity to translate bytecode to machine code and significant memory consumption due to the JVM size plus compiled code.
The worst thing is that the process begins from ground zero every time you start your program! It may lead to
- Higher cloud costs if your provider charges for every second of operation and
- Resource overutilization because you must allocate more memory to your application than it needs.
So, what can we do to accelerate the startup and warmup? The JVM startup reduction won’t make any difference: there’s no point in trying to win a couple of milliseconds. The application startup can be optimized as we will see below, but the biggest effort should be given to reducing the warmup phase as it takes the longest. We want to reach peak performance in as little time as possible regardless of the complexity of our application.
One option is to switch off the C2 compiler to skip lengthy performance optimizations. But we don’t recommend doing that: by winning several seconds, you will lose greatly in overall performance. Instead, consider the alternative techniques described below.
Ways to accelerate JVM startup and warmup
Application Class Data Sharing (AppCDS)
Application Class Data Sharing (AppCDS) is the OpenJDK feature aimed at improving a Java application’s startup time and memory footprint by creating an archive of classes used by the application.
When the application starts, the JVM needs to find, load, and initialize the necessary classes and then map them into an internal data structure. Depending on the number of required classes (which may be hundreds or thousands), the process may take quite some time. The AppCDS enables the developers to store this information in an archive, which will be used for further launches or even shared among multiple JVM instances. JEP:350 Dynamic CDS Archives introduced in JDK 13 aims to further enhance the developer experience with the feature by eliminating the need to do trial runs to create a class list.
However, by reducing the time of the startup stage, AppCDS doesn’t affect the warmup stage, so the overall gains are not that significant. In addition, even though dynamic class facilitates archive creation, it is associated with certain pitfalls:
- The archive will not be created if the application exits abruptly, even if a substantial number of classes has already been loaded;
- Classes not loaded during the execution will not be saved, which means that you have to ensure that all use cases are considered during the trial run.
Ahead-of-time compilation (Native Image)
The AOT compiler translates Java bytecode into OS-specific machine code, performs necessary optimizations, and eliminates unused code dependencies at the build stage. The resulting executable (native image)
- Starts up almost instantly (in 1/10 s) because there’s no need to interpret bytecode and search for hotspot;
- Reaches peak performance immediately without warmup because all performance optimizations are done before application execution;
- Doesn’t require JVM to run but includes the necessary runtime components (garbage collectors, thread scheduling, etc.);
- Provides lower overhead.
Java developers wishing to integrate AOT compilation into their projects can use GraalVM Native Image, part of the GraalVM project. BellSoft also offers a GraalVM CE-based native-image compiler, Liberica Native Image Kit, recommended by Spring. In addition, the most popular Java frameworks offer baked-in support for Native Image, including Spring Boot 3.
AOT seems to solve the startup problem, but migrating existing projects to Native Image can be pretty challenging. The AOT compilation happens under the closed-world assumption, meaning that the compiler has to know about all methods and dependencies used at runtime, or else they won’t make it to the native executable. As a result, the application will throw runtime errors or behave unexpectedly. Native Image does not support dynamic features such as Reflection, Serialization, JNI, so you have to either rewrite your application or provide relevant configuration at build time. We discussed some intricacies of development with Native Image in our previous article.
To sum up, you must carefully evaluate the complexity and specifics of your application to determine whether migration to Native Image is worth the trouble.
Project Leyden is the OpenJDK project under development, whose main objective is to improve Java applications’ startup time through static images.
Project Leyden will rely on the existing JDK components, including the HotSpot JVM, the jlink tool, and the AppCDS. The project team works toward gradually adopting the full closed-world constraint, starting with less strict constraints and delivering incremental optimizations. This will enable more projects to benefit from static images.
Static images work under the closed-world assumption, just like GraalVM-derived native images, thus presenting similar advantages and caveats. Static images won’t be able to optimize performance while the application is running, missing out on JIT flexibility.
So we want to preserve the power of the JVM JIT compiler, which can optimize performance on the fly but skip the lengthy warmup procedure. Could we warm up the application once, save this state, and pick up from where we left off for all the following starts?
We can do that if we get our JVM hooked on CRaC!
Coordinated Restore at Checkpoint (CRaC) is an OpenJDK project aimed at reducing startup and providing high performance immediately for Java applications. Based on the CRIU project for Linux with enhancements tailored to the specifics of the JVM process, CRaC offers the following functionality:
You start your Java application the usual way and wait until it warms up and reaches stable performance. Then, you take a snapshot of the current JVM state at an arbitrary time (“checkpoint”) and save it to a set of files. When you start your application next time from this set of files (“restore”), you continue the operation from the moment the snapshot was taken, so the performance level will be the same. This way, the application starts almost immediately at a stable performance level without any warmup. At the same time, the JIT compiler is still there, ready for further performance optimization if required.
That was the “Restore at Checkpoint” part, but what about “Coordinated”? Coordinated restore means that the application is aware that it is being checkpointed and restarted, so it can perform certain before-checkpoint and after-restore operations to ensure smooth operation. For instance, it will ensure that all files, connections, and sockets are closed, or the snapshot won’t be taken. In addition, it can react to changes in the environment.
What is more, CRaC also enables the developers to optimize memory consumption in the cloud by starting the application in a test container, taking a snapshot, and then sending the application together with the snapshot to the production environment.
Popular Java frameworks have already started adopting the CRaC feature, which enables the developers to benefit from the functionality with little to no code adjustments.
To sum up, the reduction of Java startup and warmup time always comes at a cost, be it inability to affect both processes with AppCDS or lack of adaptive performance optimization with AOT. Although CRaC seems to be the most promising solution in this regard, developers have to familiarize themselves with it before integrating it in production.
We are sure you have lots of questions about the CRaC feature, so we will delve into the technology in our following article. Subscribe to our newsletter and don’t miss it!