Posts

Alpaquita Linux performance — the race is on!

Oct 14, 2022
Dmitry Chuyko
19.5

BellSoft recently introduced a Linux distribution tailor-made for cloud-native Java applications, Alpaquita Linux. Coming from a family of musl-based distros, Alpaquita boasts an impressive and unmatched advantage of having two standard C library implementations: optimized musl (musl perf) and glibc. Libc is a core OS component that provides the most widely used functions and serves as a bridge between the kernel and user programs. musl has a cleaner codebase than glibc, but the latter is more efficient in specific cases. We improved musl to eliminate existing performance issues and made glibc-based distro available, especially for companies that want to take advantage of the offering but are unwilling to migrate. In addition, Alpaquita Linux is

  • Performant with kernel optimizations, musl-perf, and additional mallocs
  • Secure with signed modules, kernel hardening, timely security patches, and security advisory
  • Supported by a team of engineers who made the product. LTS releases, 24/7 commercial support, and regular updates make it the ultimate solution for enterprise
  • Java-friendly with tools for Java development and full compatibility with other BellSoft products. It is a part of the Alpaquita Cloud Native Platform — a small container with the OS, Liberica JDK, and Liberica NIK

Now, it’s high time we give it a ride! Alpaquita’s base image is only 3.22MB, but we will prove that it is as fast as a thoroughbred horse!

To make the race more compelling, we will make Alpaquita compete with other popular Linux distributions and utilize industry-standard benchmarks as race tracks.

Curious to know which Linux distros are the fastest and which bring up the rear? Find the results below!

Methodology

We ran the tests on the following machine, which is similar to many instances used in the cloud (for instance, AWS and Azure) or servers like Hetzner:

  • Intel(R) Core(TM) i5-6600 CPU @ 3.30GHz
  • 4 cores
  • Full virtualization
  • Type 1 hypervisor: KVM
  • Type 2 hypervisor: QEMU

A single VM was running on the machine as a workload, so it was dedicated to performance measurement. The following command was used to start the QEMU:

qemu-system-x86_64 -cpu host -enable-kvm -smp 4 -m 4096 -device e1000,netdev=net0 -netdev user,id=net0,hostfwd=tcp::5515-:22 -display none -daemonize -hda <image_file>

We utilized both musl- and glibc-based distributions to compare the performance of two libc implementations. As Alpaquita Linux has two variations, with glibc and musl, we tested both of them plus the stock musl-based variant to see whether there is a difference in results. Another musl-based distro used in experiments was Alpine Linux, which is the foundation for Alpaquita, but with stock musl and without performance optimizations. Regarding glibc-based systems, we selected CentOS, RHEL, and Debian as the most popular Linux distributions for servers and the cloud. A more detailed Linux Server/Cloud comparison can be found in our overview.

Tested Linux distributions:

  • Alpaquita Linux Stream v22 with glibc
  • Alpaquita Linux Stream v22 with musl def (stock musl library)
  • Alpaquita Linux Stream v22 with musl perf (optimized BellSoft musl)
  • Alpine Linux v3.16
  • CentOS v9
  • RHEL v8
  • Debian v11

Rationale behind the chosen versions:

  • Alpaquita Stream releases can be viewed as CentOS Stream type of releases. Therefore, Alpaquita Linux Stream 22 is not an LTS release, which will be out later this year as Alpaquita LTS v22 with subsequent updates to v22.1 and so on
  • Alpine Linux v3.16 and CentOS Stream v9 are the latest releases
  • Debian v11 is the latest stable version
  • RHEL v9 was still in beta when the study was performed, so we utilized the latest stable version 8 at that time

Results

Startup

The system startup time was measured at different stages:

  • initramfs init: initialization of a root filesystem providing early userspace
  • mounted root: the root filesystem is mounted
  • login: the system allows to log in through console
  • iface is up: the interface is up and running, which allows working with the console
  • “network”: network services are connected

We used Alpaquita with glibc and stock musl for this test. Note that the result deviation is about 4%. 

Startup time (s)

OS load stage

alpaquita-22-glibc

alpaquita-22-musl

alpine-3.16

centos-9

debian-11

redhat-8

initramfs init

0.14

0.14

0.20

0.75

0.64

0

mounted root

1.04

1.03

1.44

2.00

1.42

1.68

login

1.57

1.51

6.82

6.18

2.07

6.00

iface is up

1.95

1.86

3.41

3.94

1.96

3.77

"network"

2.98

2.89

4.47

5.35

2.98

5.14

Startup time results

As you can see on the graph, both Alpaquita Linux configurations (glibc and musl) show the best results at all stages, followed closely by glibc-based Debian. At the login stage, which was a toughest stretch for other distros, musl-based Alpaquita loaded 77% faster than CentOS! Note that there is no data for Red Hat at the initramfs init stage. This is because Red Hat uses an older kernel version and doesn’t print out the timestamp for the first stage in dmesg.

On the whole, Alpaquita Linux starts up the fastest, while CentOS and Red Hat demonstrate the worst results.

Memory bandwidth

We used the Stream benchmark to measure memory bandwidth, i.e., the memory volume we can use at a given time. Stream is a popular RAM benchmark measuring the sustainable main memory bandwidth in MB/s and the computation rate for simple vector kernels. It uses four kernels for different memory operations:

  • Copy: transfer rate measurement without arithmetic operations
  • Scale: adding a simple arithmetic operation
  • Triad: chained/overlapped/fused multiply/add operations
  • Add or Sum: adding a third operand

We used the Stream benchmark implementation provided by Phoronix via the Phoronix Test Suite

Stream (MB/s)

name

alpaquita-22-glibc

alpaquita-22-musl-def

alpaquita-22-musl-perf

alpine-3.16

centos-9

debian-11

redhat-8

Copy

21618.3

19071.5

21649.8

19198.8

21628.4

20522.1

20468.5

Scale

14745.8

14707.5

14744.8

14745.8

14748.7

14787.4

14752.6

Triad

16437

16441.1

16442.5

16474.9

16446.3

16482

16445.6

Add

16432.3

16428.2

16434.6

16466.6

16439.3

16473.8

16440.2

Stream results

All distributions showed similar performance with Scale, Triad, and Add operations, but the most remarkable results were obtained with Copy operations. We can see that Alpaquita with optimized musl (musl-perf) shows the best performance (21618.3 MB/s), followed closely by CentOS (21628.4 MB/s) and Alpaquita with glibc (21618.3 MB/s). It is indicative of superior performance of BellSoft musl to the standard library implementation (musl-def).

Note that musl perf is 100% compatible with the stock musl, so if you use a musl-based distro, the migration won’t cause any issues.

String operations: glibc vs musl vs musl perf

Many companies are hesitant about using a musl-based Linux in production due to inferior musl performance compared to glibc. To overcome the boundaries of a stock musl implementation, we developed musl perf with enhanced performance. The following tests with String operations evaluate the efficiency of three libc variants.

We ran three series of hand-crafted tests with various String lengths: 34 chars, 132 chars, and 4100 chars because the performance may vary significantly depending on the String size. Why does it matter? Firstly, ISA-specific optimizations differ for various String lengths. Secondly, Strings of about 30 characters are more common, but the bigger the String, the longer the processing, so we can see the performance divergence more clearly.

String operations 36 chars (ns/op)

name

alpaquita-22-glibc

alpaquita-22-musl-def

alpaquita-22-musl-perf

alpine-3.16

centos-9

debian-11

redhat-8

strlen()

20.98

23.13

21.09

24.34

21.01

21.53

21.89

strnlen()

21.19

26.09

21.10

26.94

21.97

22.52

22.17

wcslen()

20.23

21.67

20.76

22.98

21.50

21.23

21.91

wcsnlen()

21.47

23.68

24.17

24.26

20.44

21.85

22.16

memcmp()

21.46

33.43

21.52

30.52

21.78

20.90

21.93

memset()

20.50

20.52

21.02

21.85

20.89

20.69

21.65

memmove()

20.59

27.69

21.21

28.08

21.10

21.69

21.68

memmove_fw()

20.88

64.47

21.55

65.46

20.64

21.02

21.70

memcpy()

20.84

31.80

21.74

30.89

19.94

20.90

21.80

strcmp()

21.15

29.47

21.70

35.43

22.25

21.47

22.23

strncmp()

21.47

45.18

21.57

49.76

22.82

21.21

22.46

strcpy()

21.60

24.86

22.09

26.07

21.37

21.46

24.39

strncpy()

22.21

28.91

22.91

28.64

23.44

22.06

24.97

strchr()

21.65

26.25

21.90

27.26

21.60

22.02

22.42

strrchr()

21.42

43.01

23.33

41.14

21.07

23.07

22.69

memmove_nop()

18.74

67.05

20.79

20.42

19.26

19.17

20.45

memcpy_r()

20.51

30.38

21.33

31.71

20.85

20.41

21.71

Results of String operations, 36 chars

String operations 132 chars (ns/op)

name

alpaquita-22-glibc

alpaquita-22-musl-def

alpaquita-22-musl-perf

alpine-3.15

alpine-3.16

centos-9

debian-11

redhat-8

strlen()

21.97

29.58

22.97

30.62

29.84

22.33

22.05

23.06

strnlen()

22.89

39.96

22.73

37.10

35.38

23.62

22.30

23.33

wcslen()

22.24

36.09

22.34

29.16

29.64

23.00

22.21

23.05

wcsnlen()

23.25

34.88

30.66

35.06

36.88

24.52

22.25

23.55

memcmp()

22.70

89.81

22.49

85.20

68.50

23.81

22.35

23.60

memset()

21.01

22.25

21.29

22.20

22.68

22.21

20.74

21.96

memmove()

20.91

28.56

21.24

27.70

27.93

21.94

21.39

22.22

memmove_fw()

21.44

78.32

21.43

77.39

77.94

21.93

21.17

22.30

memcpy()

21.05

33.56

21.30

34.45

33.44

22.09

21.05

22.27

strcmp()

22.86

72.97

23.10

75.26

81.95

77.49

22.83

23.70

strncmp()

23.57

139.10

23.12

137.56

138.50

93.24

23.50

23.79

strcpy()

27.80

36.92

28.79

33.12

39.50

28.27

27.48

24.53

strncpy()

29.36

38.39

29.89

37.76

38.37

30.63

27.84

25.96

strchr()

23.12

44.19

23.33

36.67

37.81

23.46

23.40

23.91

strrchr()

23.51

107.08

24.39

110.72

103.69

24.99

23.95

24.37

memmove_nop()

19.16

19.73

19.57

92.80

20.41

19.89

19.94

20.42

memcpy_r()

21.50

33.68

21.73

33.66

34.44

23.46

21.23

22.28

Results of String operations, 132 chars

String operations 4100 chars (ns/op)

name

alpaquita-22-glibc

alpaquita-22-musl-def

alpaquita-22-musl-perf

alpine-3.16

centos-9

debian-11

redhat-8

strlen()

42.60

300.76

43.87

299.58

44.01

48.62

50.80

strnlen()

53.48

438.14

46.88

393.59

57.60

51.24

52.32

wcslen()

44.65

373.87

43.50

376.22

44.90

56.57

56.27

wcsnlen()

56.67

476.98

297.37

476.68

56.65

48.36

55.69

memcmp()

71.57

1458.78

72.71

1455.40

71.07

72.89

74.41

memset()

86.89

90.98

86.45

92.00

62.55

87.03

88.14

memmove()

36.46

50.69

36.80

45.69

37.88

39.46

36.64

memmove_fw()

37.36

596.58

37.08

595.35

38.68

39.21

39.03

memcpy()

55.86

91.37

56.08

97.52

56.11

56.88

56.53

strcmp()

83.81

1468.49

83.01

1461.64

86.63

82.32

81.47

strncmp()

82.92

3480.92

90.03

3476.01

88.32

88.02

86.67

strcpy()

83.25

452.00

81.90

448.43

83.60

82.54

83.46

strncpy()

84.68

508.51

83.67

507.27

83.65

84.40

82.92

strchr()

65.75

526.52

63.13

504.89

66.62

77.08

77.93

strrchr()

90.48

2483.10

89.17

2445.37

90.62

90.31

90.83

memmove_nop()

18.62

1132.01

20.62

20.41

19.32

19.23

20.53

memcpy_r()

53.79

113.54

54.47

121.52

54.42

54.25

55.31

Results of String operations, 4100 chars

All three graphs show a significant discrepancy between the performance of standard musl-based systems and glibc-based ones: stock musl demonstrated the worst results in all tests. We can also see that the results of improved musl (Alpaquita with musl-perf) are equal to those of glibc. It means that optimizations we introduced into musl will enable companies to use our compact distribution Alpaquita Linux without the detrimental impact on application performance.

Throughput

To test the throughput of Linux distributions, we used the Nginx benchmark provided by Phoronix. Nginx is a highly performant open-source web server that can also be used for reverse proxying, load balancing, caching, and other tasks. Its purpose is to control the workload and protect the backend. Nginx Ingress Controller is commonly used with Kubernetes clusters for routing traffic without creating numerous load balancers.

The Nginx benchmark runs on a single host and measures the number of HTTP requests handled per second with a configurable number of concurrent clients.

Nginx (reqs/s)

name

alpaquita-22-glibc

alpaquita-22-musl-def

alpaquita-22-musl-perf

alpine-3.16

centos-9

debian-11

redhat-8

Concurrent Requests: 1

21905.96

21811.14

21866.25

21362.84

15717.46

20528.18

16800.98

Concurrent Requests: 20

77072.15

76141.41

76053.68

69856.36

55991.42

55019.88

61122.7

Concurrent Requests: 100

85526.76

85256.34

84218.03

77749.41

61052.06

55717.28

63137.18

Concurrent Requests: 200

85082.81

82633.25

83891.91

76352.81

59159.26

56608.38

62187.37

Concurrent Requests: 500

81313.28

78921.85

80128.55

75187.77

54158.91

58064.64

60493.26

Concurrent Requests: 1000

79049.85

78305.92

78541.19

72996.01

53429.85

57546.15

59431.69

Nginx results

All three Alpaquita configurations demonstrated the best results across all tests, followed by Alpine. It is also remarkable that glibc-based Alpaquita outperformed other glibc-based distros significantly, which means that Alpaquita may be a good choice for companies that do not want to migrate from glibc but put a focus on good application throughput.

Performance of malloc implementations

Linux memory allocators (mallocs) can accelerate or slow down the applications. For instance, in the case of musl, switching to mimalloc or jemalloc may solve some performance issues. As far as JVM is concerned, it allocates user objects in the Java heap and collects garbage, but the JVM code uses the system allocator, so choosing a correct malloc is essential for Java apps.

Alpaquita Linux comes with three additional malloc implementations for different environments:

  • mimalloc is a small allocator used in large scalable services with low latency
  • rpmalloc is the tiniest allocator with lock free thread caching function 
  • jemalloc enables the developers to solve fragmentation issues and supports scalable concurrency

To test the work of memory allocators in various Linux distributions, we used a Mimalloc-bench originally developed for mimalloc, but later adopted for other implementations as well. The tests measure the number of operations performed in one second.

  • espresso: a programmable logic array analyzer in the context of cache aware memory allocation 
  • barnes: a hierarchical n-body particle solver [4], simulating the gravitational forces between 163840 particles
  • alloc-test: simulates intensive allocation workloads with a Pareto size distribution
  • cache-thrash: part of Hoard benchmarking suite, designed to exercise heap cache locality
  • cache-scratch: introduced with the Hoard allocator to test for passive-false sharing of cache lines
  • mstress: simulates real-world server-like allocation patterns, using N threads with with allocations in powers of 2 where objects can migrate between threads and some have long life times

Alpaquita configurations with mallocs:

  • alpaquita-22-musl-perf-je — Alpaquita with musl-perf + jemalloc
  • alpaquita-22-musl-perf-mi — Alpaquita with musl-perf + mimalloc
  • alpaquita-22-musl-perf-rp — Alpaquita with musl-perf + rpmalloc

malloc performance (operation, 1/s)

name

alpaquita-22-glibc

alpaquita-22-musl-def

alpaquita-22-musl-perf-je

alpaquita-22-musl-perf-mi

alpaquita-22-musl-perf

alpaquita-22-musl-perf-rp

alpine-3.16

centos-9

debian-11

redhat-8

espresso

0.19

0.13

0.21

0.22

0.13

0.22

0.12

0.20

0.20

0.20

barnes

0.36

0.37

0.37

0.38

0.37

0.37

0.37

0.38

0.36

0.36

alloc-test1

0.22

0.14

0.27

0.29

0.14

0.25

0.14

0.23

0.24

0.23

alloc-test-4

0.22

0.02

0.26

0.28

0.02

0.25

0.01

0.22

0.23

0.22

cache-thrash-1

0.71

0.71

0.71

0.71

0.71

0.71

0.72

0.72

0.64

0.64

cache-thrash-16

2.63

2.63

2.63

2.70

2.70

2.63

2.70

2.70

2.38

2.38

cache-thrash-4

2.70

2.70

2.70

2.70

2.70

2.70

2.70

2.70

2.38

2.38

cache-scratch-1

0.71

0.72

0.71

0.71

0.71

0.71

0.72

0.72

0.64

0.64

cache-scratch-16

2.70

2.63

2.63

2.63

2.70

2.63

2.70

2.70

0.86

1.20

cache-scratch-4

2.70

2.70

2.70

2.70

2.70

2.70

2.70

2.70

0.15

0.15

mstress-1

0.17

0.13

0.23

0.39

0.13

0.16

0.12

0.17

0.17

0.17

mstress-16

0.14

0.07

0.17

0.24

0.07

0.13

0.06

0.13

0.14

0.13

mstress-4

0.171

0.124

0.231

0.368

0.124

0.160

0.118

0.166

0.170

0.170

malloc benching results

The above graph indicates that no memory allocator excels in all conditions. For instance,

  • the glibc malloc and three optimized musl mallocs (mimalloc, rpmalloc, jemalloc) showed the best results in espresso and alloc-test
  • Debian and Red Hat default mallocs demonstrated the worst results in all cache-scratch and cache-thrash tests, whereas other allocators demonstrated similar performance, with musl perf showing stable superior performance in all cases
  • mimalloc was a clear leader in all mstress tests

To sum up, a variety of malloc implementations and the possibility to choose between glibc and musl to use with Alpquita gives additional flexibility to system configuration. The choice of a malloc implementation should be driven by workloads it will be used with. 

Learn more about choosing a suitable malloc

Java coupling with DaCapo

DaCapo benchmark suite is a set of real-world Java applications with different memory loads used to evaluate system/CPU performance. The results are measured in ms required for the completion of a workload.

We used Alpaquita with different malloc implementations in this benchmark to assess whether they improve the situation in cases when stock musl demonstrates inferior performance to glibc.

  • h2: executes a JDBCbench-like in-memory benchmark, executing a number of transactions against a model of a banking application
  • fop: takes an XSL-FO file, parses it and formats it, generating a PDF file
  • pmd: analyzes a set of Java classes for a range of source code problems
  • xalan: transforms XML documents into HTML
  • avrora: simulates a number of programs run on a grid of AVR microcontrollers
  • jython: interprets a the pybench Python benchmark
  • luindex: uses Lucene to indexes a set of documents
  • sunflow: renders a set of images using ray tracing
  • lusearch: uses Lucene to do a text search of keywords over a corpus of data comprising the works of Shakespeare and the King James Bible
  • tradebeans: runs the daytrader benchmark via a Java Beans to a GERONIMO backend with an in memory h2 as the underlying database

DaCapo (ms)

name

alpaquita-22-glibc

alpaquita-22-musl-def

alpaquita-22-musl-perf-je

alpaquita-22-musl-perf-mi

alpaquita-22-musl-perf

alpaquita-22-musl-perf-rp

alpine-3.16

centos-9

redhat-8

h2

3119

3270

3098

3099

3264

3104

3217

3235

3725

fop

1115

1232

1107

1104

1249

1120

1230

1167

986

pmd

1577

1716

1587

1592

1746

1658

1757

1611

1571

xalan

2018

2133

2057

2003

2143

2061

2152

1977

1706

avrora

3675

3765

3589

3646

3738

3628

3844

3232

3327

jython

4454

4705

4342

4412

4711

4390

4609

4554

3904

luindex 

876

977

862

860

959

868

953

895

826

sunflow 

2349

2438

2216

2288

2421

2332

2468

2386

2162

lusearch 

1328

1451

1331

1335

1423

1436

1559

1411

1203

tradebeans

3649

3886

3539

3539

3917

3613

4039

3735

3159

 

DaCapo results

In all sets, — except for h2 where the leadership belongs to glibc in Red Hat, — musl perf (Alpaquita) and stock musl (both Alpaquita and Alpine) demonstrated the best results as compared to glibc. Remarkably, Java applications benefit from musl more than from glibc in these scenarios. Companies with a Java-based project may be recommended to test their applications in a musl environment and note the changes in performance. However, it is important to measure other metrics, such as latency, footprint, etc., because libc implementations may behave differently with specific workloads.

Conclusion

The benchmarking results show that 

  • Alpaquita Linux outperforms other Linux distributions for startup and throughput
  • In cases where glibc demonstrated superior results to stock musl, Alpaquita Linux with optimized musl has similar or superior results compared to glibc-based distros
  • glibc-based Alpaquita gives better results for startup and throughput as compared to other glibc-based distros and is equal to them in other cases
  • There is no perfect malloc for all use cases, but the fact that Alpaquita Linux has three additional mallocs means greater variability in performance tuning
  • musl-based distributions demonstrated better results in the DaCapo benchmark suite, which indicates better musl suitability for specific workloads. But further testing with additional metrics is required to choose an optimal solution

It turns out, Alpaquita won the race hands down! It is fast, reliable, and incredibly flexible, providing the best options for enterprises:

  • Optimized musl for those who want to save cloud resources but are unwilling to deal with performance issues of stock musl
  • glibc package for cases when migration to another libc is undesirable or complicated. For cases when startup and throughput are crucial, it can even be more advantageous than traditional glibc-based systems
  • Three additional memory allocators for various use case scenarios

And, of course,

  • Commercial support with LTS releases and strict update schedule
  • Tools for Java development
  • A unified solution for cloud-native Java deployment with Liberica Lite and Liberica NIK on board — Alpaquita Cloud Native Platform

So let Alpaquita Linux loose with your application riding a perfect seat and see that it is a total game-changer!

Download Alpaquita Linux for free

Subcribe to our newsletter

figure

Read the industry news, receive solutions to your problems, and find the ways to save money.

Further reading