JRush episode 4th: Fresh Java on modern Arm servers

 

Transcript:

(00:01) foreign finances for something really useful not just for hitting the world which is probably already overheated by global warming um and we see that Java moves constantly and now the frequency and the release schedule becomes more tired and we see many exciting things but not overloaded releases that being ready for two that's about Java um what about me and our company I worked at bellsoft for a few years and bellsoft is a unique company because it also produces a liberica jdk default open jdk distribution for spring
(01:03) boot containers it also produces alpacita Linux and American Native image kit which perfectly combine into alpacita Cloud native platform that acnp that you can see in our live stream and what I do participate in development of open jdk I also speak at Advance and the overall contribution of companies very significant in Opportunity cases prior to bellsoft I worked at Oracle also uh in in the domain of open jdk development and before that I used to work at Deutsche Bank so uh the topic of financial savings optimizations is very very close
(01:56) to me so let's look at uh nice and beautiful mobile phones of the past this all area develops so rapidly and that you won't be able to capture a good picture that will last for half of here we now have slightly different devices that are much more modern and much more powerful and they are able to perform tasks that used to be performed only by server class machines just and the same happens in servers that new hardware develops rapidly and software that runs on that Hardware has many optimizations it can be seen
(02:47) in in a while but it all started many years ago armed as a company and arm as an architecture it started many years ago and that's what we see now you still see that company and that architecture under the same name but slightly different logos and the number of devices powered by arm grew enormously you see that the estimate from the end of last year shows that there are 250 billion processors in different devices that are of course processors not all famous servers not even all of them power mobile phones maybe something smaller
(03:40) but it is a huge number so what's that I say what's that instruction set it's a way to communicate for software for people who write that software and run uh it's on a real Hardware to tell Hardware what to do to bring you a valuable result the result of your program and it executes thank you and that specification for that special language of course it has multiple versions and new versions appear then time goes and there are major and minor versions there are also application profiles or just different profiles
(04:35) among them application profile is the most important for us as Java developers and Java users because it really allows us to run jvm that you are used to the jvm that contains different compilers jit compilers or aot compilers that performs many optimizations for the featured runtime Etc um there are also names like ar-64 or a64 that denote execution modes and the exact instructions set for that execution modes because same CPU can work in different execution modes it can be selected sometimes this is the one that that is used for
(05:29) server side or computing and also there's a code name arm64 that's used as a name for Linux kernel port for uh for that class of machines it's it's really simple if you think about that but you just have to know all the acronyms if you work with the technology there are multiple vendors that produce real CPUs real cores and what they do they license that technology IC or some IP designs from arm they need to make an agreement and then to produce real Hardware and arm even is going to produce some
(06:16) Hardware itself but we haven't seen that yet if we look uh how specification changed for years we see some dates here uh data yes then a certain spec version has been finalized and a few years after real Hardware is being released so you see some Hardware listed on top of frame boxes and inside the frame boxes there are just a few selected features of each of spec version that features are actually extensions and they're optional and mandatory extensions in some versions and typical practice that we observe is
(07:12) that if some vendor implements or releases new new core or new CPU [Music] it implements some spec version and extra possible extra extensions from newer versions that's a normal practice because you can offer a good optimizations extra features if you if you already know that next spec will contain certain uh certain things what are the extensions some useful ones related to cryptography related to check some calculations related to CMD processing of data like from the very beginning there was now next noun extension which wasn't
(08:08) optional and a bit later the extension called SV appeared scalable Vector extension that allows machines that Implement that in Silicon to compete with things like AVX and if you process long rows of data like streams of some numbers it allows you to effectively and quickly compute on top of that that's kind of uh a task that that we can imagine them some quad streaming is being processed or other financial data uh that that's structured as a long stream of numbers or there are extensions that help to work with machine learning tasks like uh
(09:02) before 16 support and among that Hardware there are notable points like thunderx was a machine by cadium that had 96 cores it wasn't um too powerful so each single core was really kind of a low end compared to a big different Hardware existed at that time but there are a lot of course later next Generations introduced like 256 cores or even 384 cores mostly hover for HPC but also could be used as a just a powerful server machine some designs here uh denoted and called nervous that's something that
(10:09) arm develops as a reference and some good descended vendors can take and Implement their Hardware and also you can see apple ones and one and then two and others the uh on pair one CPU listed here has been released only two weeks ago that's a new one we'll see some specs later and you see the newer the machine is the newer is the spec version that it implements there is a very recent version of specification called rmv9 and there are already some CPUs by the way they are mobile ones most of them but neverson 2 is a server
(11:05) class course well I mentioned that there are different vendors here are just a few of them there are many more just you see familiar logos here and this is really uh necessary thing for many of the tenders to stay in this space I mentioned nervous and this is uh an implementation managed by arm and arm commits to development of this design and we see real Hardware that uses it in in a while so how it all started machines with many cores which are arm cores just few years ago it was very unusual to see one or to touch one
(12:06) so I was really happy to do that in person like the touch rail server because we saw no real ones and it had many many cores now it's not an up-to-date actual model but newer ones look like this so this is one based on pair one and you see that it contains many billion blocks like uh recent ddr5 memory or special processing cards it looks gorgeous and it's really a powerful thing and and such machines become to consume more and more power but at the same time they start to consume more of course that have more processing power in them
(13:04) so you see the hardware is real what about software software ecosystem [Music] for arm is now just a software ecosystem that exists in the world because vendors of I would say all major software Technologies block on how to Port their products or the open source code to that architecture it becomes a primary target of course along with x86 and the same is true for open jdk you see that it's of course it is of course listed here um we'll uh talk about that in more detail you can check for more software Technologies
(13:59) that ecosystem also includes diagnostic tools related to Java which is important because we really like to measure performance and to no I'll analyze it and that performance has been greatly improved by a certain in times of jdk 11 that bellsoft implemented had in many intrinsics so-called nav optimized functions in openjk well besides of performance there is a question of correctness and we write programs especially concurrent programs uh sometimes x86 forgives some programmers mistakes which isn't true
(14:45) for arm there are really interesting observations then the program is incorrect and we executed and had out of order Hardware uh we can see uh real failures I mentioned Ultra family and there are multiple Generations there already and that latest generation contains uh up to 192 cores in one server and if you build a rack you can run older than uh 7 000 virtual machines on that track that's that's a huge number so um another vendor is Amazon in Amazon there are three generations of graviton processors two of them are
(15:44) actually Implement nervous design and you see how it's developed so every few years back is updated and the processors are replaced by newer generation the production process becomes more and more tiny you see that frequencies stay about the same because of physical limitations or as well as number of course and in the last line there's some projected software performance how can we observe that in Wildlife it became 25 faster as Amazon says in last generation or like 10 times faster since the first generation well they can run benchmarks
(16:35) like Java benchmarks here's some voluntary selection and we can compare two generations of graviton processes for example and we really can see that 25 Improvement right at the same time we can compare generations of open jdk like we can compare open jdk 20 versus open jdk8 and you see again even for some trivial set of benchmarks we can see about the same level of improvement and if we combine the improvements with newer software and your Hardware we'll see like 50 percent so it was trying but we get some improvements we get some
(17:19) development what about costs at the time then uh Amazon Hardware was released people just compared costs of certain workloads that can be run on that service well it turned to be an x86 Massacre because it was so much cheaper to run that workloads on arm Hardware so because of that the share of arm Computing on arm in clouds grows very rapidly because it allows us to save and to be more effective and clouds providers expect even more growth in just few years what about development Hardware well we have such machines like machines powered
(18:15) by Apple silicon or we have machines running Windows on Linux which are now even probably more powerful than x86 machines as well as seeing what happens the with Apple and there are now software ports on all the systems that allow to run open jdk natively and why it is it important because if you run open jdk in Native mode versus some emulation or virtualization it works times faster so of course we need correct and effective ports for for Native combination of arm and certain operator systems look so there are many optimizations in jdk
(19:05) 11 and since then I want to stop that in detail but there are also projects in open jdk ongoing projects being developed and they developed uh keeping arm Target in mind so all that new features already work perfectly on I'm serious day then they are released and then we use all that stuff and prepare our applications for deployment we typically build containers and now we can build small containers for a muscle-based operating systems like Alpine and that all will work on Java Service as well you can take good base images and build
(19:48) very very slim containers and then you prepare images on arm development arm desktops or laptops locally you can really both Target arm and x86 after all because there is a cross build process so it's easy now to perform development for both targets which is really a case if you're multi-cloud or your Cloud doesn't provide some kind of instance in certain regions that really happens and by the way clouds that we have right now that offer amp Hardware I already mentioned Amazon and Ultra machines can
(20:32) be found in Oracle cloud and in azure so it's a very wide offering of our machines so we see that hardware and software work perfectly together and allow you to pay less and get more and you even can get better from new and newer Java versions just beware of that correctness of your programs well that always was the responsibility of tower programmers so think carefully about migration strategy or consult and specialists thank you

Summary

The video discusses the benefits of using ARM-based hardware for developing and running Java applications. The speaker highlights the significant improvements in performance, cost-effectiveness, and development tools since the first generation of Graviton processors. They also emphasize the importance of software updates and new hardware in achieving improved results.

About Dmitry

Dmitry Chuyko is a Senior Performance Architect at BellSoft, an OpenJDK committer, and a public speaker. Prior to joining BellSoft, Dmitry worked on the Hotspot JVM at Oracle, and before that he had many years of programming experience in Java. He is currently focused on optimizing HotSpot for x86 and ARM, previously being involved in rolling out JEP 386, which enables the creation of the smallest JDK containers.

Social Media

Videos
card image
Nov 1, 2024
An Overview of Java Garbage Collectors

Java provides multiple garbage collectors (GCs) tailored to different performance needs. Serial GC is ideal for single-threaded apps but pauses all threads, while Parallel GC uses multiple threads to prioritize throughput.

Videos
card image
Oct 24, 2024
5 Tips for Optimizing Java Performance on Kubernetes

If your Java apps in the cloud struggle with high resource consumption, frequent container restarts, or slow response times, these five tips can help enhance their performance. First, set CPU and RAM limits properly based on load testing and account for Kubernetes overhead.

Further watching

Videos
card image
Nov 29, 2024
OpenJDK Projects That We Anticipate

OpenJDK is actively evolving, with projects like Leyden, Valhalla, Babylon, and Lilliput aiming to enhance Java's performance and capabilities. Leyden focuses on faster startup and warmup by reusing precompiled code, while Valhalla introduces value objects, primitive classes, and specialized generics for better memory and runtime efficiency.

Videos
card image
Nov 22, 2024
Reducing Java Startup Time: 4 Approaches

Java application startup can be significantly accelerated using modern tools. AppCDS stores preloaded classes in a shared archive, cutting startup time by up to 50%, while Project Leyden shifts optimizations to earlier stages with ahead-of-time compilation. GraalVM Native Image creates standalone executables for sub-second startup, and CRaC restores pre-warmed application states for instant readiness.

Videos
card image
Nov 15, 2024
Boost The Performance and Security of Your Spring Boot App with Alpaquita Containers

Alpaquita Containers offer a secure, high-performance solution for running Spring Boot applications in the cloud. These lightweight containers, built on Liberica JDK Lite and Alpaquita Linux, optimize memory and disk usage, reducing resource consumption by up to 30%.