posts

HotSpot vs. OpenJ9: performance comparison

figure
Jun 28, 2022
Bob Booshehri

Which Java Virtual Machine to choose, HotSpot or OpenJ9? Both are tunable open-source JVM implementations. HotSpot is a well-established JVM implementation initially developed by Sun Microsystems. OpenJ9, developed by IBM, is not as widespread in the industry but has gained popularity in recent years.

OpenJ9 claims great performance in terms of startup time, latency, throughput, and memory footprint, based on the DayTrader7 benchmark application study, where three fine-tuned OpenJ9 configurations are compared to default HotSpot.

BellSoft’s engineers have decided to check whether HotSpot can be configured so that it displays comparable or better performance results. You can find the summary of their testing below.

  1. Experiment setup
  2. Results
    1. Startup
    2. Footprint
    3. Latency and throughput
  3. Conclusion

Experiment setup

We used the DayTrader7 application from the original study as a benchmark. It is not a microservice application but rather a small monolith launched on a web server, so we consider this configuration the most remarkable. We also utilized a server-class machine on Linux and a desktop-class machine on Windows as platforms and OpenJDK 11 as JDK binaries: AdoptOpenJDK build with three OpenJ9 flag configurations and Liberica JDK with default and tuned HotSpot.

The focus was on the startup time, footprint, latency, and throughput. Apache JMeter 5.4.1 was utilized to measure three latter metrics.

We had no goal of generating synthetic performance numbers with HotSpot fine-tuning tricks but rather evaluating the standard settings of this JVM implementation.

Results

Startup

There are several parameters that help reduce the application startup with HotSpot. The first one is Application Class Data Sharing or AppCDS. It allows placing application classes in the shared archive, thus accelerating startup. This feature appeared in HotSpot 1.5 and has been improved in subsequent versions. For example, now it is possible to generate archives automatically. The following flag is needed to activate AppCDS:

-XX:SharedArchiveFile=app-cds.jsa

Note that you must create the AppCDS archive explicitly. The parameter above enables you to specify the name of the archive file for class storage. Without it, the classes will be stored in the JDK installation directory, which is undesirable. After creating the archive, you can launch the application with it, and the JVM maps it into its memory and has most classes it requires available. The AppCDS documentation can be found on Oracle’s website.

In addition, the -XX:TieredStopAtLevel=1 parameter enables the developers to run the application with the client (C1) compiler without profiling, which provides faster code compilation. This configuration is suitable in case startup acceleration is a top priority.

Below you will find the startup time comparison with OpenJ9 and HotSpot. The startup time was calculated as the difference between the startup time and “Application daytrader7 started in X seconds“ output.

Tested configurations:

  • OpenJ9 (1) (heap 256 MB): “-Xmx256m“
  • OpenJ9 (2) (heap 256 MB, class cache): “-Xmx256m -Xshareclasses:name=mvn”
  • OpenJ9 (3) (heap 256 MB, class cache, fast compilation): “-Xmx256m -Xshareclasses:name=mvn -Xtune:virtualized -Xscmx200m”
  • HotSpot (heap 256 MB): “-Xmx256m”
  • HotSpot (heap 256 MB, AppCDS, C1): “-Xmx256m -XX:SharedArchiveFile=app-cds.jsa -XX:TieredStopAtLevel=1”
Startup time (ms)
OpenJ9 (1) (heap 256 MB) OpenJ9 (2) (heap 256 MB, class cache) OpenJ9 (3) (heap 256 MB, class cache, fast compilation) HotSpot (heap 256 MB) HotSpot (heap 256 MB, AppCDS, C1)
6.986 s 6.064 s 6.159 s 6.366 s 4.972 s

Startup time results (lower is better)

Liberica JDK with aforementioned parameters demonstrated the best result (4.972 ms), whereas the OpenJ9 without additional parameters apart from heap size limitation gave the worst outcome (6.986 ms).

Footprint

For our experiment, we used the daytrader7.jmx test plan offered by Daytrader7 developers. Test plan parameters were adjusted according to the machine class: four threads were used for the server-class machine, and six — for the desktop-class one.

Test plan is a script including many HTTP requests imitating the user’s work in the online stock trading system. A user can log in / log out, look up a portfolio or stock quotes, and buy or sell stock shares.

Before measuring statistics, one virtual machine warm-up was performed. After that, 5–7 iterations on average were made, 120–300 s each with 30 s breaks. Finally, the mean was calculated. Before measuring each configuration (warm-up run), a daytrader database was created and filled with 15 thousands users. The database was reset prior to each iteration within configuration measurement.

The default HotSpot consumes more memory than OpenJ9 when tested under load. However, the same -XX:SharedArchiveFile=app-cds.jsa and -XX:TieredStopAtLevel=1 parameters used for decreasing startup time provided the 30% footprint reduction. This is due to the fact that the C1 compiler uses less memory in the code cache because there are no additional intermediate expressions of the compiled code. And AppCDS minimizes footprint thanks to the common class metadata shared by different JVMs.

Tested configurations:

  • OpenJ9 (1) (heap 1G): “-Xmx1G“
  • OpenJ9 (2) (heap 1G, class cache): “-Xmx1G -Xshareclasses:name=mvn”
  • OpenJ9 (3) (heap 1G, class cache, fast compilation): “-Xmx1G -Xshareclasses:name=mvn -Xtune:virtualized -Xscmx200m”
  • HotSpot (heap 1G): “-Xmx1G”
  • HotSpot (heap 1G, AppCDS, C1): “-Xmx1G -XX:SharedArchiveFile=app-cds.jsa -XX:TieredStopAtLevel=1”
Memory consumption (at ~126 s)
OpenJ9 (1) (heap 1G) OpenJ9 (2) (heap 1G, class cache) OpenJ9 (3) (heap 1G, class cache, fast compilation) HotSpot (heap 1G) HotSpot with (heap 1G, AppCDS, C1)
418.6 MB 423.7 MB 389.6 MB 612.8 MB 424.7 MB

Footprint results (lower is better)

The test plan included a combination of various queries executing complex operations. Red dots on the graph mark the startup time. The maximum heap size was set to 256 MB (the -Xmx256m flag). The indicators were measured at ~126 s. This point was chosen because different HotSpot and OpenJ9 configurations have different startup times. After the application startup, memory consumption increases due to the load and then stabilizes. As you can notice on the graph, OpenJ9 demonstrates leaps in memory consumption prior to stabilization. This memory is required for the warm-up, and it may become a problem because the application consumes more memory than we expect in stable condition. So if we set the memory limits to what the app requires after the warm-up, stabilization time may increase, and peak performance falls.

Memory consumption prior to stabilization

As far as the leaps are concerned, they deserve separate research of optimization strategy. For example, Liberica Lite, which is a lightweight version of our Liberica JDK, can release memory for workflows not under load, and OpenJ9 has recently received JIT as a Service, which enables the VM to carry the peak memory consumption from many instances over to one compilation service.

In addition, Liberica Lite enables the creation of microcontainers for a drastic reduction of resource consumption. This metric is especially valuable for cloud computing, where small containers equal cost-efficiency. A container based on Liberica Lite and Alpine Linux is only 42.72MB — the tiniest on the market so far!

Returning to our study, our goal was to evaluate the app memory consumption in a stabilized condition. After 126 s, all VM configurations demonstrated stable values, which could be measured. It can be seen on the graph that the configured HotSpot is equal to or superior to two OpenJ9 configurations.

Latency and throughput

The maximum heap size was set to 1 GB in this set of experiments (-Xmx1G). This way, we could analyze the app behavior within the limits of a typical small cloud node.

The default HotSpot configuration demonstrates the best throughput (5% higher than the best OpenJ9 configuration) and latency (69% lower than the best OpenJ9 configuration) but falls behind OpenJ9 in terms of memory consumption. The -XX:SharedArchiveFile=app-cds.jsa -XX:TieredStopAtLevel=1 flags optimized the footprint indicators but didn’t affect throughput and latency. However, HotSpot has a wide selection of Garbage Collectors, and we can use the one most suitable for our case. Here, we switched to Serial Garbage Collector (the -XX:+UseSerialGC parameter), providing less overhead. Serial GC is the simplest garbage collector implementation, which works with one thread, thus reducing thread overhead. It is most suitable for single-processor machines or applications with small heaps.

In addition, to decrease the footprint, we set InitialHeapSize at 80MB (the -Xms80m flag).

Tested configurations:

  • OpenJ9 (1) (heap 1G): “-Xmx1G“
  • OpenJ9 (2) (heap 1G, class cache): “-Xmx1G -Xshareclasses:name=mvn”
  • OpenJ9 (3) (heap 1G, class cache, fast compilation): “-Xmx1G -Xshareclasses:name=mvn -Xtune:virtualized -Xscmx200m”
  • HotSpot (max heap 1G, initial heap 80 MB): “-Xmx1G -Xms80m”
  • HotSpot (max heap 1G, initial heap 80 MB, AppCDS, C1): “-Xmx1G -Xms80m -XX:SharedArchiveFile=app-cds.jsa -XX:TieredStopAtLevel=1”
  • HotSpot (max heap 1G, initial heap 80 MB, SerialGC): “-Xmx1G -Xms80m -XX:SharedArchiveFile=app-cds.jsa -XX:TieredStopAtLevel=1 -XX:+UseSerialGC“
Latency (~99.998%)
OpenJ9 (1) (heap 1G) OpenJ9 (2) (heap 1G, class cache) OpenJ9 (3) (heap 1G, class cache, fast compilation) HotSpot (max heap 1G, initial heap 80 MB) HotSpot (max heap 1G, initial heap 80 MB, SerialGC) HotSpot (max heap 1G, initial heap 80 MB, AppCDS, C1)
2091 ms 386 ms 641 ms 120 ms 123 ms 1852 ms

Latency by percentile distribution (lower is better)

The indicators were measured at ~99.998% percentile. As can be seen on the graph, the default HotSpot configuration had one of the best indicators and made it to TOP-3 (lib-def-xms80m), but the results with -XX:SharedArchiveFile=app-cds.jsa -XX:TieredStopAtLevel=1 parameters are the opposite (lib-xms80m). Still, even this configuration is associated with lower latency than OpenJ9 (1), which shows the worst outcomes. However, HotSpot with SerialGC (lib-xms80m-serialgc) demonstrates the best results in terms of latency and memory.

Throughput
OpenJ9 (1) (heap 1G) OpenJ9 (2) (heap 1G, class cache) OpenJ9 (3) (heap 1G, class cache, fast compilation) HotSpot (max heap 1G, initial heap 80 MB) HotSpot (max heap 1G, initial heap 80 MB, SerialGC) HotSpot (max heap 1G, initial heap 80 MB, AppCDS, C1)
854.1 tps 779.1 tps 758.5 tps 897.9 tps 592.9 tps 594.1 tps

Throughput results (higher is better)

The graph above shows that the default HotSpot with G1 has the best throughput but leaves a more significant footprint. You can choose the configuration with Serial GC or AppCDS to focus on better latency or memory footprint indicators. You can also use parallel GC, which uses multiple threads for garbage collector acceleration and helps to improve throughput.

Conclusion

The experiment results are similar on all tested platforms and demonstrate that Liberica JDK with configured HotSpot is comparable to or better than OpenJ9. It is worth noting that the default HotSpot gives much lower latency and higher throughput than OpenJ9 on a server-class node, which is beneficial for long-term cost reduction. At the same time, a significant memory footprint left by this configuration can be minimized by switching to another VM image (Alpine Linux uses twice as less RAM as CentOS) or introducing native image technology into the project.

BellSoft provides Native Image Kit, a utility for accelerating the startup and lowering memory consumption. If you want to know more about this technology, watch a special episode of JRush — our series of always-free web-conferences for senior Java developers, where we discuss the trends and best practices of Java development with the leading experts in the field.

posts
Alpaquita vs Alpine: a head-to-head comparison
figure
Nov 10, 2022
Dmitry Chuyko
shorts
Critical vulnerabilities in OpenSSL 3.0
Nov 11, 2022
Sergey Chernyshev

Find out about the newest CVEs discovered in OpenSSL 3.0 and how to eliminate the risk of exploits

Subcribe to our newsletter

figure

Read the industry news, receive solutions to your problems, and find the ways to save money.