Why Test Performance When Scaling K8s

Transcript

Hotspot JVM in containers

It's not that complex, there are multiple talks, including me, on that topic of how to tune it, right? 

But you must remember that your defaults depend on your environment. They depend on your actual resources on the board, right? That may really change your garbage collector, for example. This is the most famous example. You fall back from G1 to Serial GC.

If you just use default options in your containers. And who sets options for their containers? Yeah, that's fair. I believe it. Like, that's the real, that's not the real problem. That's the situation. That's fine. They don't break.

If we don't set options, they don't break. But, we somehow can make it more effective. We can improve the performance, and we can get really different numbers. From kind of even trivial applications. I'll just show some example, and you can do the very same with your own service. Well, what happens with our containers in Kubernetes?

JVM containers in K8s

We declare some deployment. And our container images are somehow started somewhere. But we have control. What resources we can provide, for that running pods, what options we should choose for the JVM? We can tune it when we start the application, Or we can tune it in advance, but we can't do both. Sometimes we have options. For services that are not so important, we can say: Okay, they really need that amount of memory. But they run okay if it's slower. Right? So, there are different so called quality of service settings when we regulate requests and limits for our bots. But please remember that setting your memory requirement lower than your actual XMX is a very bad idea.

So my favorite second story here is from a bank, European bank. And guys called us and said: we have a problem with container stuff.

Our deployment is constantly down, and we have that startup problem, which isn't really a startup problem for a single service. It's not the thing to start for eight or 30 seconds and then warm up, but it happens constantly. So now it's a problem. And their startups were caused by setting their limits lower than XMX. They said, okay, it works for some time. Right, it works, and then the memory consumption reaches the limit, and bang, your service is done. That's fine.


Scaling your apps

Even more interesting situation. Days like Black Friday. When you need to scale. And sometimes you already have created all your nice fleet of microservices and whatever, like the company that I mentioned in the first turn. And then they have exactly Black Friday. And they needed to sell a lot. And if you prepare for such scaling in advance, do try to prepare. Some probably trivial lessons. Like don't try to rely on the pods that won't be available to you, right? Rely on real ones. Splitter services. Some, of them need to be very accessible, some are not, right?

That's, the question of SLOs. And you have to plan in advance to create scaling strategy, because if you scale enormously large, you pay enormously large. And probably the cloud providers will cover your scaling, saying: Okay, we have the budget, we scale for all the budget, no problem. We will satisfy all the requests, but it will be too expensive. So you have to limit how much do you scale, you have to prepare a strategy, sometimes you need to scale in advance or create steps, to scale slightly in advance. And then you have to scale down. In the very best scenario to zero. That's why we did lambdas, to scale to zero. It all should be controlled by monitoring, right? That's one of SLOs and indicators that we have, how much do we spend on the cloud. And here, such solutions that provide the information about cloud spendings,

they can be useful here. When we scale, we have limitations. If we're trying to scale some service, obviously, we depend on its dependencies. If a database is used by a service and it only is able to satisfy 500 requests per second, you won't get more than 500 from this service, if every request requires a database. Clear, right? And the real situation, is sometimes we underutilize resources, such as databases that can be co located for different services. Meanwhile, pods still break after all of that. And we'll see their performance characteristics.

Test application

Let's talk about the favorite application of all Spring Boot talks, all Spring talks, Spring Boot PetClinic, this application is like that. So, you don't use Spring here. Or do you? Yes, some people use Spring. And Spring Boot. And this application and its different versions is different. So, you know probably that, Spring Boot has now version 3, 3.3 actually. And it supports nice optimizations like creating native images or Spring AOT in general, where you can lower the overhead of using such sophisticated reflection-oriented CDI framework like Spring. Well, and version 2. 7, which is older, but it is still great and widely used. And applications that are based on 2. 7x typically run on Java 8 and 11. That's not necessarily, but it happens. Anyway, we'll create some container for this application.

Many ways to build your container image

There are many ways to do it. To containerize this specification. This survey is also a few years old, I think, and it's still true. Very true. We expected that the ratio for buildbacks will raise. It doesn't. And now we know why, by the way. But that's a different story. So there are more or less way, more or less complex ways to pack your resulting build into a container. And they are all correct. But sometimes they all seem complex, too complex. And our cloud providers typically say: “Please bring your applications to our cloud”. Here are some tools to help you. But these tools are not magic tools to containerize your application.

Because they only help you to struggle with the boilerplate of cloud vendor settings. They don't create right Java options for your container. Because they know nothing about your application. They somehow just try to replicate your previous environment in that new containerized unknown jungle. But it's simple to create your own container, right? And it will be quite effective. But, even in this simple situation, we have to plan our resources in very advance. Like you see here we set run percentage. That means we know, or we have tested, that for some sizes of pods we plan to run on this value works correctly, so it doesn't break the test. And why do we send it here? Because we want to use that resources. We don't want to use only a quarter of your memory, right? Because if you put nothing, you will only use 25 percent of your pod's memory for your Java heap. Which works badly, of course, with default GC because an automated garbage collection requires extra memory to work fast.

Cloud Native buildpacks

Well, the other means of solving the same problem, making your application a container, is buildpacks.

You can run them standalone, or you can use standard Maven and Gradle commands to build your container. And if you build such container using the standard Spring Boot way of building containers,

you will get some optimized things in there. You will have extra options, you will have... not Alpaquita,

but somehow prepared and updated Linux. It will have Liberica JDK inside, by the way, because we are the default JDK provider for Spring Boot. And if you create native images, it will use Liberica Native Image Kit to compile the native image. And buildpacks are great because you need nothing. Nothing but Docker. All things, Docker and base Java to run. Maven and Graddle. Right? Even that can be downloaded if you use the wrapper script. Inside, under the hood, it will download different containers, some stack, so-called stack, to build your application. Like, you can run your build on JDK 17, but build for JDK 21, and it will be automatically downloaded somewhere with a build container in a reproducible environment. And another part of the stack will be run stack, where you will just have very slim Linux, and all the things necessary to run your application. But also, you will have a starter script, which will look at your environment before starting your Java application, before starting the JVM. And we'll add some wise options.

Like we'll say, oh hey, okay. We have some right settings set like the number of Tomcat threads, etc. And we'll calculate how much memory we can use for the Java application besides reserved memory. So to provide the best possible performance for this container it will also add diagnostic flags, so when the application or pod exits, you can see, okay, that much memory has been used, for native buffers or whatever, so you can tune that meaningful settings and create even better container.

Some settings are listed here. You can provide any general options you would like, right?

But some of them will be created based on your settings, there are some defaults, and all that can be overloaded. So in the end, we have that set of containers. We have builders as separate images.

Resulting images

We have base images, we have resulting images. And here are some with native image, with JVM, and two lower ones are built with buildpacks. And the upper one is the base one for the manual Dockerfile image that I've shown previously. Anyway, you can create very small container. 100 megabytes or 150 with your application. It's not a problem nowadays. Just think about that and build your CI/CD pipeline so that it would create an effective one [container]. So you can not spend your network traffic, disk storage, operating system memory to use it effectively, So, we came to the idea of optimizing containers before optimizing applications.

Load testing setup

We don't have to change the code, but we can make them work better just by providing wise settings.

Just, why don't you think a little about it? It's not that hard. So let's create some performance testing environment. You can use any tools you like. And in companies, that to me are solvers, there must be a performance management process somehow built just to use your process for your real business purposes. So we can harness, we can load the cluster. And inside the cluster, let's choose the right size of the pod. So assume we have some resources, hardware resources, or there’s a bare metal server in this case, and we can divide them by using different deployment settings. And it's easy to achieve in Kubernetes. But I selected some possible sizes of pods, just because, in this case,

Amazon gives those sizes for autoscaling. So you don't have to test everything with a snap of one megabyte.

Results for different configurations

You only should care about real possible pod sizes. And let's say our application should respond, must respond under 100 milliseconds, so it's business critical. And these new numbers with a release springboard say, okay, we can actually run a standalone. Without any containers, without anything. On this bare metal instance, just limit in its memory. And we must check that. Because numbers differ. Either you run in your container environment and they really can shift your numbers. Making them lower, of course. And we can actually vertically scale our application. But it requires careful tuning of JVM options. Because with this application, they're not even, they're not just JVM versions. For example, for the older version we have to tune all the Tomcat threads. We have to select our garbage collector, especially for very large heaps, right? If we have... I don't know, 100 instances, each running in 100 megabyte heap, that's simple, right? If we have For 1,000 with 1,000, not that complex for each of them. But if we have one instance running with a terabyte heap, we have to tune our JVM options. Especially if we don't have ZGC, right? So this application can be also represented as a native image, which in current experiments show lower numbers than classical JVM with warmup. Previously it was the reverse, so you have to check. And we are going to walk that simple way. To look at horizontal scaling. To fulfill more requests. Like we prepared for the Black Friday.

Latency

That's the setting And for the latency requirement, yeah, there's always a distribution of latencies by percentiles. But this is the good picture. The picture after the warmup. So we already somehow warmed up our application because classical JVM has very, not that good latency at the very start.

But, by the way, there's a tool to avoid that, to struggle with bad latency at the start, and how is it called? CRaC, yes! I expected native image, right? Because that's the second one here.

Yeah, that's, it's also about the latency part. Yes, we can use CRaC with modern JDKs and modern Spring Boot versions that supports Coordinated Checkpoint and Restore in user space. And by the way, we provide such builds for JDK so you can try them, like, right now. Because we don't just provide JDK builds, but also we provide ready made containers, so you can easily run your checkpoint, run the restore, and try an experiment. So, we can put the same single instance inside Docker, containerize it.

Wow, same performance. Surprise. Half a year ago, different numbers. They were lower, now it's okay. Let's put it not just in the container, but put it inside the Kubernetes cluster, put an ingress there, and see what happens. Okay, about the same performance. Not exactly the same, by the way, but Surprisingly close. So. Let's try to chop it. And to use more resources.

Scalability in K8s

Because we are for horizontal scaling. We still have some resources. But with default settings the application doesn't scale anymore. So let's use those resources for horizontal scaling. So what do you expect? Is it better or worse? Wow, it's better. Yeah. In some previous experiments, such scaling could make it worse. And in your case, it can make it worse. But you have to measure. No good expectation here. You can only try this. But it's measurable, right? It's the same load testing as for the single application we just probably changed the address, but... In any case, I'm going to deploy it in a cluster.

We're going to perform some stress testing or whatever. So, let's continue and create more but smaller pods. What's the expectation? Better. Yes, it is better. But, we immediately start seeing slower startups on the cluster. So we lowered, especially lower CPU resources for our pods, and it means much slower startup. So here again CRaC will help us to just bypass this slow problem. Well, we do scale, and we can go on the territory that many people start from. Same. Please, use the cheapest instances.

“You are only allowed to use super cheap instances without additional permission, OK?” Development says. Half of CPU for the JVM, which is multi threaded in any case. No RPS. I, as a developer, wasn't able to wait for the cluster to start. No wait for me. It's just, you know, not applicable here.

But I would say the performance would be worse here. The overall performance would be worse because of a worse compilation quality, right? We have longer compilation gears with longer CPU amount, et cetera. And we can scale native image case the same way. But again, we have comparable but lower numbers here. Yeah, I mean the scale. So please don't forget about all the rest good practices

Good scaling practices

Related to your Kubernetes deployments. Do the same for your performance. Try what happens in production. Try it before you go to production. Go to production with some partial deployments if you can, right?To see what happens with their help. What happens with their indicators. And what will happen with SLOs, right, and our business goals in the end? We have to care about our JVM settings. That's the only way. We just should not leave them empty. Something must be written there. We can work out better container images. It's very simple. We can use build packs and tune only few very understandable settings to effectively utilize our resources. Simple. There are useful flags to tune your containers, to tune the JVM, which can be either switched on and off, or regulated. AUDIENCE MEMBER: Is it documented? DMITRY: Yeah, of course. It's well documented. Your applications must be monitored. That's obvious! And use that monitoring results to improve your performance quality. To improve your stability. To improve your costs. To have a better scaling. Yeah, some other useful techniques. Also can be applied here. We also have a nice blog. With articles related to Kubernetes and other stuff.

Summary

Running the HotSpot JVM in containers requires careful tuning of resource limits, garbage collection, and JVM options to maximize performance and avoid issues like container restarts or poor scaling. Key strategies include setting memory limits above the Xmx value, optimizing startup and latency with tools like CRaC, and leveraging buildpacks to create efficient, lightweight container images tailored to your application's needs. Regular load testing, monitoring, and adjusting configurations based on production data are crucial for achieving optimal scaling, cost efficiency, and performance in Kubernetes environments.

About Dmitry

Dmitry Chuyko is a Senior Performance Architect at BellSoft, an OpenJDK committer, and a public speaker. Prior to joining BellSoft, Dmitry worked on the Hotspot JVM at Oracle, and before that he had many years of programming experience in Java. He is currently focused on optimizing HotSpot for x86 and ARM, previously being involved in rolling out JEP 386, which enables the creation of the smallest JDK containers.

Social Media

Videos
card image
Nov 1, 2024
An Overview of Java Garbage Collectors

Java provides multiple garbage collectors (GCs) tailored to different performance needs. Serial GC is ideal for single-threaded apps but pauses all threads, while Parallel GC uses multiple threads to prioritize throughput.

Videos
card image
Oct 24, 2024
5 Tips for Optimizing Java Performance on Kubernetes

If your Java apps in the cloud struggle with high resource consumption, frequent container restarts, or slow response times, these five tips can help enhance their performance. First, set CPU and RAM limits properly based on load testing and account for Kubernetes overhead.

Further watching

Videos
card image
Nov 29, 2024
OpenJDK Projects That We Anticipate

OpenJDK is actively evolving, with projects like Leyden, Valhalla, Babylon, and Lilliput aiming to enhance Java's performance and capabilities. Leyden focuses on faster startup and warmup by reusing precompiled code, while Valhalla introduces value objects, primitive classes, and specialized generics for better memory and runtime efficiency.

Videos
card image
Nov 22, 2024
Reducing Java Startup Time: 4 Approaches

Java application startup can be significantly accelerated using modern tools. AppCDS stores preloaded classes in a shared archive, cutting startup time by up to 50%, while Project Leyden shifts optimizations to earlier stages with ahead-of-time compilation. GraalVM Native Image creates standalone executables for sub-second startup, and CRaC restores pre-warmed application states for instant readiness.

Videos
card image
Nov 15, 2024
Boost The Performance and Security of Your Spring Boot App with Alpaquita Containers

Alpaquita Containers offer a secure, high-performance solution for running Spring Boot applications in the cloud. These lightweight containers, built on Liberica JDK Lite and Alpaquita Linux, optimize memory and disk usage, reducing resource consumption by up to 30%.