posts

Alpaquita Linux performance — the race is on!

Oct 14, 2022

Dmitry Chuyko

19.5

BellSoft recently introduced a Linux distribution tailor-made for cloud-native Java applications, Alpaquita Linux. Coming from a family of musl-based distros, Alpaquita boasts an impressive and unmatched advantage of having two standard C library implementations: optimized musl (musl perf) and glibc. Libc is a core OS component that provides the most widely used functions and serves as a bridge between the kernel and user programs. musl has a cleaner codebase than glibc, but the latter is more efficient in specific cases. We improved musl to eliminate existing performance issues and made glibc-based distro available, especially for companies that want to take advantage of the offering but are unwilling to migrate. In addition, Alpaquita Linux is

Performant with kernel optimizations, musl-perf, and additional mallocs
Secure with signed modules, kernel hardening, timely security patches, and security advisory
Supported by a team of engineers who made the product. LTS releases, 24/7 commercial support, and regular updates make it the ultimate solution for enterprise
Java-friendly with tools for Java development and full compatibility with other BellSoft products. It is a part of the Alpaquita Cloud Native Platform — a small container with the OS, Liberica JDK, and Liberica NIK

Now, it’s high time we give it a ride! Alpaquita’s base image is only 3.22MB, but we will prove that it is as fast as a thoroughbred horse!

To make the race more compelling, we will make Alpaquita compete with other popular Linux distributions and utilize industry-standard benchmarks as race tracks.

Curious to know which Linux distros are the fastest and which bring up the rear? Find the results below!

Methodology
Results

Startup
Memory bandwidth
String operations: glibc vs musl vs musl perf
Throughput
Performance of malloc implementations
Java coupling with DaCapo

Conclusion

Methodology

We ran the tests on the following machine, which is similar to many instances used in the cloud (for instance, AWS and Azure) or servers like Hetzner:

Intel(R) Core(TM) i5-6600 CPU @ 3.30GHz
4 cores
Full virtualization
Type 1 hypervisor: KVM
Type 2 hypervisor: QEMU

A single VM was running on the machine as a workload, so it was dedicated to performance measurement. The following command was used to start the QEMU:

qemu-system-x86_64 -cpu host -enable-kvm -smp 4 -m 4096 -device e1000,netdev=net0 -netdev user,id=net0,hostfwd=tcp::5515-:22 -display none -daemonize -hda <image_file>

We utilized both musl- and glibc-based distributions to compare the performance of two libc implementations. As Alpaquita Linux has two variations, with glibc and musl, we tested both of them plus the stock musl-based variant to see whether there is a difference in results. Another musl-based distro used in experiments was Alpine Linux, which is the foundation for Alpaquita, but with stock musl and without performance optimizations. Regarding glibc-based systems, we selected CentOS, RHEL, and Debian as the most popular Linux distributions for servers and the cloud. A more detailed Linux Server/Cloud comparison can be found in our overview.

Tested Linux distributions:

Alpaquita Linux Stream v22 with glibc
Alpaquita Linux Stream v22 with musl def (stock musl library)
Alpaquita Linux Stream v22 with musl perf (optimized BellSoft musl)
Alpine Linux v3.16
CentOS v9
RHEL v8
Debian v11

Rationale behind the chosen versions:

Alpaquita Stream releases can be viewed as CentOS Stream type of releases. Therefore, Alpaquita Linux Stream 22 is not an LTS release, which will be out later this year as Alpaquita LTS v22 with subsequent updates to v22.1 and so on
Alpine Linux v3.16 and CentOS Stream v9 are the latest releases
Debian v11 is the latest stable version
RHEL v9 was still in beta when the study was performed, so we utilized the latest stable version 8 at that time

Results

Startup

The system startup time was measured at different stages:

initramfs init: initialization of a root filesystem providing early userspace
mounted root: the root filesystem is mounted
login: the system allows to log in through console
iface is up: the interface is up and running, which allows working with the console
“network”: network services are connected

We used Alpaquita with glibc and stock musl for this test. Note that the result deviation is about 4%.

Startup time (s)

OS load stage	alpaquita-22-glibc	alpaquita-22-musl	alpine-3.16	centos-9	debian-11	redhat-8
initramfs init	0.14	0.14	0.20	0.75	0.64	0
mounted root	1.04	1.03	1.44	2.00	1.42	1.68
login	1.57	1.51	6.82	6.18	2.07	6.00
iface is up	1.95	1.86	3.41	3.94	1.96	3.77
"network"	2.98	2.89	4.47	5.35	2.98	5.14

Startup time results

As you can see on the graph, both Alpaquita Linux configurations (glibc and musl) show the best results at all stages, followed closely by glibc-based Debian. At the login stage, which was a toughest stretch for other distros, musl-based Alpaquita loaded 77% faster than CentOS! Note that there is no data for Red Hat at the initramfs init stage. This is because Red Hat uses an older kernel version and doesn’t print out the timestamp for the first stage in dmesg.

On the whole, Alpaquita Linux starts up the fastest, while CentOS and Red Hat demonstrate the worst results.

Memory bandwidth

We used the Stream benchmark to measure memory bandwidth, i.e., the memory volume we can use at a given time. Stream is a popular RAM benchmark measuring the sustainable main memory bandwidth in MB/s and the computation rate for simple vector kernels. It uses four kernels for different memory operations:

Copy: transfer rate measurement without arithmetic operations
Scale: adding a simple arithmetic operation
Triad: chained/overlapped/fused multiply/add operations
Add or Sum: adding a third operand

We used the Stream benchmark implementation provided by Phoronix via the Phoronix Test Suite.

Stream (MB/s)

name	alpaquita-22-glibc	alpaquita-22-musl-def	alpaquita-22-musl-perf	alpine-3.16	centos-9	debian-11	redhat-8
Copy	21618.3	19071.5	21649.8	19198.8	21628.4	20522.1	20468.5
Scale	14745.8	14707.5	14744.8	14745.8	14748.7	14787.4	14752.6
Triad	16437	16441.1	16442.5	16474.9	16446.3	16482	16445.6
Add	16432.3	16428.2	16434.6	16466.6	16439.3	16473.8	16440.2

Stream results

All distributions showed similar performance with Scale, Triad, and Add operations, but the most remarkable results were obtained with Copy operations. We can see that Alpaquita with optimized musl (musl-perf) shows the best performance (21618.3 MB/s), followed closely by CentOS (21628.4 MB/s) and Alpaquita with glibc (21618.3 MB/s). It is indicative of superior performance of BellSoft musl to the standard library implementation (musl-def).

Note that musl perf is 100% compatible with the stock musl, so if you use a musl-based distro, the migration won’t cause any issues.

String operations: glibc vs musl vs musl perf

Many companies are hesitant about using a musl-based Linux in production due to inferior musl performance compared to glibc. To overcome the boundaries of a stock musl implementation, we developed musl perf with enhanced performance. The following tests with String operations evaluate the efficiency of three libc variants.

We ran three series of hand-crafted tests with various String lengths: 34 chars, 132 chars, and 4100 chars because the performance may vary significantly depending on the String size. Why does it matter? Firstly, ISA-specific optimizations differ for various String lengths. Secondly, Strings of about 30 characters are more common, but the bigger the String, the longer the processing, so we can see the performance divergence more clearly.

String operations 36 chars (ns/op)

name	alpaquita-22-glibc	alpaquita-22-musl-def	alpaquita-22-musl-perf	alpine-3.16	centos-9	debian-11	redhat-8
strlen()	20.98	23.13	21.09	24.34	21.01	21.53	21.89
strnlen()	21.19	26.09	21.10	26.94	21.97	22.52	22.17
wcslen()	20.23	21.67	20.76	22.98	21.50	21.23	21.91
wcsnlen()	21.47	23.68	24.17	24.26	20.44	21.85	22.16
memcmp()	21.46	33.43	21.52	30.52	21.78	20.90	21.93
memset()	20.50	20.52	21.02	21.85	20.89	20.69	21.65
memmove()	20.59	27.69	21.21	28.08	21.10	21.69	21.68
memmove_fw()	20.88	64.47	21.55	65.46	20.64	21.02	21.70
memcpy()	20.84	31.80	21.74	30.89	19.94	20.90	21.80
strcmp()	21.15	29.47	21.70	35.43	22.25	21.47	22.23
strncmp()	21.47	45.18	21.57	49.76	22.82	21.21	22.46
strcpy()	21.60	24.86	22.09	26.07	21.37	21.46	24.39
strncpy()	22.21	28.91	22.91	28.64	23.44	22.06	24.97
strchr()	21.65	26.25	21.90	27.26	21.60	22.02	22.42
strrchr()	21.42	43.01	23.33	41.14	21.07	23.07	22.69
memmove_nop()	18.74	67.05	20.79	20.42	19.26	19.17	20.45
memcpy_r()	20.51	30.38	21.33	31.71	20.85	20.41	21.71

Results of String operations, 36 chars

String operations 132 chars (ns/op)

name	alpaquita-22-glibc	alpaquita-22-musl-def	alpaquita-22-musl-perf	alpine-3.15	alpine-3.16	centos-9	debian-11	redhat-8
strlen()	21.97	29.58	22.97	30.62	29.84	22.33	22.05	23.06
strnlen()	22.89	39.96	22.73	37.10	35.38	23.62	22.30	23.33
wcslen()	22.24	36.09	22.34	29.16	29.64	23.00	22.21	23.05
wcsnlen()	23.25	34.88	30.66	35.06	36.88	24.52	22.25	23.55
memcmp()	22.70	89.81	22.49	85.20	68.50	23.81	22.35	23.60
memset()	21.01	22.25	21.29	22.20	22.68	22.21	20.74	21.96
memmove()	20.91	28.56	21.24	27.70	27.93	21.94	21.39	22.22
memmove_fw()	21.44	78.32	21.43	77.39	77.94	21.93	21.17	22.30
memcpy()	21.05	33.56	21.30	34.45	33.44	22.09	21.05	22.27
strcmp()	22.86	72.97	23.10	75.26	81.95	77.49	22.83	23.70
strncmp()	23.57	139.10	23.12	137.56	138.50	93.24	23.50	23.79
strcpy()	27.80	36.92	28.79	33.12	39.50	28.27	27.48	24.53
strncpy()	29.36	38.39	29.89	37.76	38.37	30.63	27.84	25.96
strchr()	23.12	44.19	23.33	36.67	37.81	23.46	23.40	23.91
strrchr()	23.51	107.08	24.39	110.72	103.69	24.99	23.95	24.37
memmove_nop()	19.16	19.73	19.57	92.80	20.41	19.89	19.94	20.42
memcpy_r()	21.50	33.68	21.73	33.66	34.44	23.46	21.23	22.28

Results of String operations, 132 chars

String operations 4100 chars (ns/op)

name	alpaquita-22-glibc	alpaquita-22-musl-def	alpaquita-22-musl-perf	alpine-3.16	centos-9	debian-11	redhat-8
strlen()	42.60	300.76	43.87	299.58	44.01	48.62	50.80
strnlen()	53.48	438.14	46.88	393.59	57.60	51.24	52.32
wcslen()	44.65	373.87	43.50	376.22	44.90	56.57	56.27
wcsnlen()	56.67	476.98	297.37	476.68	56.65	48.36	55.69
memcmp()	71.57	1458.78	72.71	1455.40	71.07	72.89	74.41
memset()	86.89	90.98	86.45	92.00	62.55	87.03	88.14
memmove()	36.46	50.69	36.80	45.69	37.88	39.46	36.64
memmove_fw()	37.36	596.58	37.08	595.35	38.68	39.21	39.03
memcpy()	55.86	91.37	56.08	97.52	56.11	56.88	56.53
strcmp()	83.81	1468.49	83.01	1461.64	86.63	82.32	81.47
strncmp()	82.92	3480.92	90.03	3476.01	88.32	88.02	86.67
strcpy()	83.25	452.00	81.90	448.43	83.60	82.54	83.46
strncpy()	84.68	508.51	83.67	507.27	83.65	84.40	82.92
strchr()	65.75	526.52	63.13	504.89	66.62	77.08	77.93
strrchr()	90.48	2483.10	89.17	2445.37	90.62	90.31	90.83
memmove_nop()	18.62	1132.01	20.62	20.41	19.32	19.23	20.53
memcpy_r()	53.79	113.54	54.47	121.52	54.42	54.25	55.31

Results of String operations, 4100 chars

All three graphs show a significant discrepancy between the performance of standard musl-based systems and glibc-based ones: stock musl demonstrated the worst results in all tests. We can also see that the results of improved musl (Alpaquita with musl-perf) are equal to those of glibc. It means that optimizations we introduced into musl will enable companies to use our compact distribution Alpaquita Linux without the detrimental impact on application performance.

Throughput

To test the throughput of Linux distributions, we used the Nginx benchmark provided by Phoronix. Nginx is a highly performant open-source web server that can also be used for reverse proxying, load balancing, caching, and other tasks. Its purpose is to control the workload and protect the backend. Nginx Ingress Controller is commonly used with Kubernetes clusters for routing traffic without creating numerous load balancers.

The Nginx benchmark runs on a single host and measures the number of HTTP requests handled per second with a configurable number of concurrent clients.

Nginx (reqs/s)

name	alpaquita-22-glibc	alpaquita-22-musl-def	alpaquita-22-musl-perf	alpine-3.16	centos-9	debian-11	redhat-8
Concurrent Requests: 1	21905.96	21811.14	21866.25	21362.84	15717.46	20528.18	16800.98
Concurrent Requests: 20	77072.15	76141.41	76053.68	69856.36	55991.42	55019.88	61122.7
Concurrent Requests: 100	85526.76	85256.34	84218.03	77749.41	61052.06	55717.28	63137.18
Concurrent Requests: 200	85082.81	82633.25	83891.91	76352.81	59159.26	56608.38	62187.37
Concurrent Requests: 500	81313.28	78921.85	80128.55	75187.77	54158.91	58064.64	60493.26
Concurrent Requests: 1000	79049.85	78305.92	78541.19	72996.01	53429.85	57546.15	59431.69

Nginx results

All three Alpaquita configurations demonstrated the best results across all tests, followed by Alpine. It is also remarkable that glibc-based Alpaquita outperformed other glibc-based distros significantly, which means that Alpaquita may be a good choice for companies that do not want to migrate from glibc but put a focus on good application throughput.

Performance of malloc implementations

Linux memory allocators (mallocs) can accelerate or slow down the applications. For instance, in the case of musl, switching to mimalloc or jemalloc may solve some performance issues. As far as JVM is concerned, it allocates user objects in the Java heap and collects garbage, but the JVM code uses the system allocator, so choosing a correct malloc is essential for Java apps.

Alpaquita Linux comes with three additional malloc implementations for different environments:

mimalloc is a small allocator used in large scalable services with low latency
rpmalloc is the tiniest allocator with lock free thread caching function
jemalloc enables the developers to solve fragmentation issues and supports scalable concurrency

To test the work of memory allocators in various Linux distributions, we used a Mimalloc-bench originally developed for mimalloc, but later adopted for other implementations as well. The tests measure the number of operations performed in one second.

malloc benchmarks utilized in the study

espresso: a programmable logic array analyzer in the context of cache aware memory allocation
barnes: a hierarchical n-body particle solver [4], simulating the gravitational forces between 163840 particles
alloc-test: simulates intensive allocation workloads with a Pareto size distribution
cache-thrash: part of Hoard benchmarking suite, designed to exercise heap cache locality
cache-scratch: introduced with the Hoard allocator to test for passive-false sharing of cache lines
mstress: simulates real-world server-like allocation patterns, using N threads with with allocations in powers of 2 where objects can migrate between threads and some have long life times

Alpaquita configurations with mallocs:

alpaquita-22-musl-perf-je — Alpaquita with musl-perf + jemalloc
alpaquita-22-musl-perf-mi — Alpaquita with musl-perf + mimalloc
alpaquita-22-musl-perf-rp — Alpaquita with musl-perf + rpmalloc

malloc performance (operation, 1/s)

name	alpaquita-22-glibc	alpaquita-22-musl-def	alpaquita-22-musl-perf-je	alpaquita-22-musl-perf-mi	alpaquita-22-musl-perf	alpaquita-22-musl-perf-rp	alpine-3.16	centos-9	debian-11	redhat-8
espresso	0.19	0.13	0.21	0.22	0.13	0.22	0.12	0.20	0.20	0.20
barnes	0.36	0.37	0.37	0.38	0.37	0.37	0.37	0.38	0.36	0.36
alloc-test1	0.22	0.14	0.27	0.29	0.14	0.25	0.14	0.23	0.24	0.23
alloc-test-4	0.22	0.02	0.26	0.28	0.02	0.25	0.01	0.22	0.23	0.22
cache-thrash-1	0.71	0.71	0.71	0.71	0.71	0.71	0.72	0.72	0.64	0.64
cache-thrash-16	2.63	2.63	2.63	2.70	2.70	2.63	2.70	2.70	2.38	2.38
cache-thrash-4	2.70	2.70	2.70	2.70	2.70	2.70	2.70	2.70	2.38	2.38
cache-scratch-1	0.71	0.72	0.71	0.71	0.71	0.71	0.72	0.72	0.64	0.64
cache-scratch-16	2.70	2.63	2.63	2.63	2.70	2.63	2.70	2.70	0.86	1.20
cache-scratch-4	2.70	2.70	2.70	2.70	2.70	2.70	2.70	2.70	0.15	0.15
mstress-1	0.17	0.13	0.23	0.39	0.13	0.16	0.12	0.17	0.17	0.17
mstress-16	0.14	0.07	0.17	0.24	0.07	0.13	0.06	0.13	0.14	0.13
mstress-4	0.171	0.124	0.231	0.368	0.124	0.160	0.118	0.166	0.170	0.170

malloc benching results

The above graph indicates that no memory allocator excels in all conditions. For instance,

the glibc malloc and three optimized musl mallocs (mimalloc, rpmalloc, jemalloc) showed the best results in espresso and alloc-test
Debian and Red Hat default mallocs demonstrated the worst results in all cache-scratch and cache-thrash tests, whereas other allocators demonstrated similar performance, with musl perf showing stable superior performance in all cases
mimalloc was a clear leader in all mstress tests

To sum up, a variety of malloc implementations and the possibility to choose between glibc and musl to use with Alpquita gives additional flexibility to system configuration. The choice of a malloc implementation should be driven by workloads it will be used with.

Learn more about choosing a suitable malloc

Java coupling with DaCapo

DaCapo benchmark suite is a set of real-world Java applications with different memory loads used to evaluate system/CPU performance. The results are measured in ms required for the completion of a workload.

We used Alpaquita with different malloc implementations in this benchmark to assess whether they improve the situation in cases when stock musl demonstrates inferior performance to glibc.

DaCapo benchmarks utilized in the study

h2: executes a JDBCbench-like in-memory benchmark, executing a number of transactions against a model of a banking application
fop: takes an XSL-FO file, parses it and formats it, generating a PDF file
pmd: analyzes a set of Java classes for a range of source code problems
xalan: transforms XML documents into HTML
avrora: simulates a number of programs run on a grid of AVR microcontrollers
jython: interprets a the pybench Python benchmark
luindex: uses Lucene to indexes a set of documents
sunflow: renders a set of images using ray tracing
lusearch: uses Lucene to do a text search of keywords over a corpus of data comprising the works of Shakespeare and the King James Bible
tradebeans: runs the daytrader benchmark via a Java Beans to a GERONIMO backend with an in memory h2 as the underlying database

DaCapo (ms)

name	alpaquita-22-glibc	alpaquita-22-musl-def	alpaquita-22-musl-perf-je	alpaquita-22-musl-perf-mi	alpaquita-22-musl-perf	alpaquita-22-musl-perf-rp	alpine-3.16	centos-9	redhat-8
h2	3119	3270	3098	3099	3264	3104	3217	3235	3725
fop	1115	1232	1107	1104	1249	1120	1230	1167	986
pmd	1577	1716	1587	1592	1746	1658	1757	1611	1571
xalan	2018	2133	2057	2003	2143	2061	2152	1977	1706
avrora	3675	3765	3589	3646	3738	3628	3844	3232	3327
jython	4454	4705	4342	4412	4711	4390	4609	4554	3904
luindex	876	977	862	860	959	868	953	895	826
sunflow	2349	2438	2216	2288	2421	2332	2468	2386	2162
lusearch	1328	1451	1331	1335	1423	1436	1559	1411	1203
tradebeans	3649	3886	3539	3539	3917	3613	4039	3735	3159

DaCapo results

In all sets, — except for h2 where the leadership belongs to glibc in Red Hat, — musl perf (Alpaquita) and stock musl (both Alpaquita and Alpine) demonstrated the best results as compared to glibc. Remarkably, Java applications benefit from musl more than from glibc in these scenarios. Companies with a Java-based project may be recommended to test their applications in a musl environment and note the changes in performance. However, it is important to measure other metrics, such as latency, footprint, etc., because libc implementations may behave differently with specific workloads.

Conclusion

The benchmarking results show that

Alpaquita Linux outperforms other Linux distributions for startup and throughput
In cases where glibc demonstrated superior results to stock musl, Alpaquita Linux with optimized musl has similar or superior results compared to glibc-based distros
glibc-based Alpaquita gives better results for startup and throughput as compared to other glibc-based distros and is equal to them in other cases
There is no perfect malloc for all use cases, but the fact that Alpaquita Linux has three additional mallocs means greater variability in performance tuning
musl-based distributions demonstrated better results in the DaCapo benchmark suite, which indicates better musl suitability for specific workloads. But further testing with additional metrics is required to choose an optimal solution

It turns out, Alpaquita won the race hands down! It is fast, reliable, and incredibly flexible, providing the best options for enterprises:

Optimized musl for those who want to save cloud resources but are unwilling to deal with performance issues of stock musl
glibc package for cases when migration to another libc is undesirable or complicated. For cases when startup and throughput are crucial, it can even be more advantageous than traditional glibc-based systems
Three additional memory allocators for various use case scenarios

And, of course,

Commercial support with LTS releases and strict update schedule
Tools for Java development
A unified solution for cloud-native Java deployment with Liberica Lite and Liberica NIK on board — Alpaquita Cloud Native Platform

So let Alpaquita Linux loose with your application riding a perfect seat and see that it is a total game-changer!

Download Alpaquita Linux for free

Alpaquita Cloud Native Platform

Pricing

Alpaquita

Containers for Spring Boot

Liberica JDK

Liberica JDK 6&7

Liberica JDK for Embedded

Liberica JDKPerformance Edition

Liberica JDK with CRaC

Liberica Native Image Kit

Liberica Mission Control

Liberica Administration Center

Blog

Documentation

Support for Liberica JDK

Support for Alpaquita Linux

Support for Liberica Native Image Kit

Company

Newsroom

Download Liberica JDK

Download Liberica Native Image Kit

Download Alpaquita Linux

Alpaquita Linux performance — the race is on!

Oct 14, 2022

Dmitry Chuyko

19.5

Methodology

Results

Startup

Memory bandwidth

String operations: glibc vs musl vs musl perf

Throughput

Performance of malloc implementations

Java coupling with DaCapo

Conclusion

Liberica Native Image Kit 23.0.5, 23.1.4, and 24.0.2 builds are released

Jul 26, 2024

Peter Zhelezniakov

0

Creating a Bot for Documentation Assistance with Spring AI and Unstructured.io

Jul 25, 2024

Pasha Finkelshteyn

0

Subcribe to our newsletter

Further reading

How to reduce the size of Docker container images

Dec 14, 2022

Dmitry Chuyko

0

Alpaquita vs Alpine: a head-to-head comparison

Nov 10, 2022

Dmitry Chuyko

0

Alpaquita Linux features explained

Oct 27, 2022

Dmitry Chuyko

0

Alpaquita Cloud
Native Platform

Liberica JDK
Performance Edition