We are happy to announce the general availability of Liberica Native Image Kit (NIK) version 23.0 with several enhancements and new features.
New features and enhancements
Summary of improvements in Liberica NIK 23.0:
- Add JFR ThreadCPULoad event implementation;
- Thread user cpu time implementation for LinuxThreadCpuTimeSupport.getThreadCpuTime() method;
- URLConnection.getContentType() returns null for html resource;
- Remove type checks from JNI-to-Java call stubs, which can break compatibility;
- Add support for TrayIcon on macOS and Windows. Note that if you run a TrayIcon app on macOS, you need to add the Info.plist file:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>CFBundleIdentifier</key>
<string>nativeimage.trayicon.sample</string>
</dict>
</plist>
But the most impactful new feature is the integration of the ParallelGC implementation. Read on to find out more about the project where BellSoft engineers took active part and the benefits it will bring to AOT-compiled programs.
Integration of ParallelGC
Two major advantages of GraalVM are enhanced JIT compiler and AOT compilation that provides almost instant startup time of applications (native images), which is critical for specific cloud services such as Lambdas. But GraalVM CE will become even more powerful, because native images will have significantly lower latency thanks to a new Parallel GC integrated to Liberica NIK (GraalVM CE-based native-image compiler) as an experimental feature.
Justification
Up until now, GraalVM Native Image technology has used only SerialGC in GraalVM Community Edition (a single-threaded stop-the-world garbage collector) and G1 (Garbage-First) GC in Enterprise Edition. SerialGC freezes all threads while performing the collection (hence “stop-the-world”) and works in one thread itself leading to increased latency. Latency is a critical performance indicator that greatly affects user experience and justifies the integration of an additional low-latency GC to GraalVM CE to fully utilize available multiprocessor hardware and reduce pause times.
It should be noted that when Graal compiler turns JVM-based code into a platform-dependent native executable, it runs on SubstrateVM, which is a partial JVM implementation. Garbage Collector in SubstrateVM is written in Java, so it would be hard to port to it the existing HotSpot ParallelGC, which is implemented in C++. Therefore, the team of developers focused on extending the SubstrateVM with a framework that enables the parallelization of GC work. In the following section we will explain how it was achieved.
Implementation of ParallelGC to SubstrateVM
As opposed to SerialGC, ParallelGC uses several threads to scan the objects, hence the accelerated GC procedure. The default setting is one thread per core, max amount is eight per core, so we can regulate the number of threads with the -XX:ParallelGCThreads
option. To activate ParallelGC, use the --gc=parallel
flag.
The parallelization of GC in SubstrateVM will be performed in several steps. The first step was to parallelize the phase computing transitive closure, which contributes most to the GC pause time. For this purpose, we need to partition this work into tasks, which will be performed by individual worker threads.
The threads are activated at startup, which affects the startup time slightly. ParallelGC reuses most of the SerialGC code and allocates memory in thread-local fashion to minimize interference between worker threads.
Our ParallelGC keeps a busy worker thread counter. It means that when worker threads block, they decrement the counter, and increment it if they wake up. When the counter drops to zero, it means that all work has been done. The last active worker thread signals the completion of the parallel phase before blocking, and the rest of the garbage collection routine executes in a single thread. Worker threads remain blocked until they are woken up by the next garbage collection cycle.
Results and future plans
To validate the ParallelGC implementation, we used the HyperAlloc benchmark developed by the Amazon Corretto team, a workload simulating application characteristics that affect garbage collector latency. Experiment prerequisites:
- Intel i7-8565U CPU,
- 8 cores at 2 GHz,
- Ubuntu 20.04.
The figures below demonstrate the results obtained using 8 worker threads and different heap sizes.
SubstrateVM GC pause times with 32 Gb of heap memory
SubstrateVM GC pause times with 2 Gb of heap memory
SubstrateVM GC pause times with 512 Mb of heap memory
The graphs above demonstrate the reduction of GC pause time by 10-40%. An improvement over SerialGC was also obtained with smaller numbers of GC threads.
The next step within our project is to make it possible to execute more GC phases in parallel. The most promising candidates for implementation are incremental collections, root set scanning, and post-GC cleanup.
The graph below shows the scalability of ParallelGC in Substrate VM.
SubstrateVM GC pause times with different worker thread counts
Conclusion
Integration of ParallelGC to GraalVM CE will enable companies to benefit from Native Image technology and at the same time, improve essential KPIs. The BellSoft engineers will continue contributing to the enhancement of GraalVM technology.
Right now, ParallelGC is available for macOS and Linux as an experimental feature not intended for production use. We encourage you to try it out and report any bugs discovered in the process via our GitHub page. For that purpose, download the new version of Liberica NIK 23.0, which is already available in the Download Center.