posts
A novel GraalVM optimization for faster native images startup

A novel GraalVM optimization for faster native images startup

Jan 23, 2024
Peter Zhelezniakov
4.3

Native images generated with GraalVM native-image compiler already start up in less than a second. But the BellSoft engineers added a new optimization to SubstrateVM, which enables native images of multithreaded applications to start up even faster!

The essence of the change: improved monitor enter/exit routines in SubstrateVM

We improved the way monitor enter/exit routines are executed in SubstrateVM, which lies at the heart of GraalVM Native Image.

Monitor enter happens when, for example, a synchronized block is about to be executed. Take a look at the following code snippet:

synchronized (obj) {
    ...
}

The first line states that the monitor associated with the object obj is entered. Upon exit from the synchronized block, the monitor is said to be exited.

When a monitor associated with an object is entered, a lock object is read from the object header. Then, an identifier of the current thread is written into it to indicate that the lock is acquired by the current thread. As several threads can compete to enter the monitor, we write the owner thread ID using compare-and-swap (CAS) operation to ensure that only one thread can succeed in acquiring the lock, and other threads lose.

Lock objects are created lazily. This makes sense, since only few objects in a program are ever used for synchronization. When we synchronize on an object for the first time, the lock object has to be created. But in this case, we know it is not shared with other threads, since we have just created it, so we don't have to use CAS to write thread ID into it.

Another optimization occurs when a monitor is exited. SubstrateVM guarantees that monitor entries and exits are properly ordered, which means we never exit a monitor that has not been previously entered. When a monitor is exited, we can be sure that a lock object exists, and simply read it from the object header. As a result, we can skip calling into the generic lock acquisition method which checks whether the lock exists and creates a new one if it does not.

Optimization results

We tested two Liberica NIK builds, with and without the improvement. For our study, we used the Spring Petclinic demo on Ubuntu Linux, Intel 8 core CPU.

The first test represents a non-contended case as it is single-threaded and reads data using various combinations of input streams, which is a synchronized operation according to the specification. The results demonstrated a 34% improvement as follows:

Reading data using various combinations of input streams

On the contrary, the second test represents the contended scenario where a specific number of threads was started that called the same synchronized method repeatedly. The results with optimized and unoptimized builds are shown below.

Calling a synchronized method repeatedly

Thanks to the optimization, the application startup time improved by 3.3%.

The patch will be integrated into all GraalVM distributions soon 

The improvement has already been merged to GraalVM master branch, and it is available in all Liberica Native Image Kit (NIK) builds starting January 2024.

Subscribe to our newsletter and don’t miss the information about the latest GraalVM enhancements!

 

Subcribe to our newsletter

figure

Read the industry news, receive solutions to your problems, and find the ways to save money.

Further reading