Find out the reasons for sky-rocketing cloud expenses
Reducing cloud costs is a hot topic in the modern IT world. In 2021 managing cloud computing expenses was one of the top three major enterprise issues along with enhancing security and governance. 61% of companies set cloud cost optimization as their top priority1, and this trend is bound to continue. In this article, we will learn how to identify the root cause for suboptimal app performance and inflated resource consumption.
Why are your cloud bills so high?
The realization that your company spends tremendous amounts of money on cloud resources usually comes out of a clear sky. The application demonstrated optimal peak performance on bare metal, was conveniently containerized, sent to the cloud, and scaled, the SLA was met — what could go wrong? The answer is “everything”.
Several problems could be revealed even without delving deep into the metrics:
- Developers are not directly involved in cost optimization. They are not aware of the execution environment and write the code accordingly. At the same time, EKS administrators or cloud algorithms may not take application characteristics into consideration or even be aware of them. As a result, this awareness gap becomes an abyss for corporate finances.
- An SLA provides the roadmap and the warranty for your project in the cloud. However, if a cloud services consumer doesn’t examine the SLA carefully or validate it against worst-case scenarios, there is a risk of running into unexpected costs if something goes wrong2.
- If the application performs well on bare metal, it doesn’t mean that the same performance can be expected under different loads in a cluster. As a result, pods shut down unexpectedly or HotSpot devours much more memory in the same instance.
- Not all services are the same, some of them are less critical than others. However, unloaded services consume the same amount of resources as the loaded ones if the same scaling strategy is applied to them.
- Developers build applications in containers using default settings. As a result, you get a container that doesn’t meet your business needs, takes a lot of space, or underperforms.
Let’s examine the last two points more closely.
Default containers are black boxes
Paketo Buildpacks, the most popular buildpacks for containerization, Cloud Native Buildpacks, and other similar technologies offer an opportunity to create container images directly from Maven or Gradle plugin. Developers generate small and performant containers with a couple of clicks thus reducing development time dramatically. However, they rarely configure containers themselves and prefer to use default settings, because if an automatically generated image performs well, why bother?
The problem with out-of-the-box solutions is that they are, in fact, black boxes, i.e. the developers don’t always know how these solutions work and thus pick ones that don’t meet business requirements. As a result, Paketo containers with default settings, for example, may underutilize memory and CPU. If you want to have highly performant, secure, resilient, and small containers, you have to configure them manually:
- Start with JVM tuning and optimize the setting such as -XX:+AlwaysActAsServerClassMachine or -XX:+PerfDisableSharedMem. You should also select the appropriate Garbage Collector
- Perform testing such as load testing, A/B, longevity testing, plan the capacity. Load test results should match the “no container” case
- Keep in mind that every container in the pod must have a memory limit and a memory request
- Align resource requirements with possible node sizing for correct node scheduling
- Technical overhead might be small, but don’t forget to take the K8soverhead into account. For example, Fargate uses extra 256MB RAM
- Consider using Liberica Lite and Alpaquita Linux to minimize the size of your containers without affecting the performance or security
After the deployment, you need to perform continuous health monitoring, daily load testing, and control the metrics: latency, throughput, GC statistics, etc. Find a more detailed description of metrics monitoring in the section “Life after deployment” below.
As you can see, containers require your attention and care. To summarize this section: automatic settings are convenient but suboptimal, so if you want something done right, do it yourself.
Underutilization and overutilization: two sides of one coin
Technically, all services are created equal, but incorrect memory allocation settings lead to overutilization or underutilization. In case of overutilization, the service is loaded to the limit and consumes a lot of resources, therefore
- The application doesn’t function properly but performs continuous garbage collection instead.
- We have to add one more instance, which in turn automatically doubles the expenses.
As far as underutilization is concerned, the application doesn’t use all the available memory, which leads to the same issues: continuous GC usage and the necessity to allocate one more instance instead of consuming the existing resources.
These are two different problems with the same root cause: you have a memory limit that doesn’t correspond to the functions of your application.
SLO or SLA? Both!
We have already brushed upon the SLA between you and your cloud provider in the first section. Now it’s time to talk about your obligations towards customers and/or users. Basically, an SLA is an agreement between a company and a paying user, so if your services are distributed for free, you don’t need an SLA. If that is not the case, make sure that not only lawyers, but your technical team takes part in the process of creating SLAs. Only developers know what it takes in terms of time and resources spent to deliver the services as per SLA in the case of a cloud-based application.
The same goes for SLOs. If an SLA is an agreement between a company and a customer, an SLO (service level objective) describes objectives that your developers have to satisfy in order to meet the SLA. SLOs should be precise, realistic, and tightly bound to the SLA. Make an effort to choose the essential metrics, analyze and spell out the objectives with your team, and include realistic and worst-case scenarios into SLOs and the SLA.
Cloud monitoring: life after deployment
We have summarized the most common issues that lead to high cloud expenses. Suppose you already have a running application in the cloud and yet you aren’t sure where to look to hunt down the root causes. What exactly is there to blame: JVM, the app, containers, the SLA? Read on to learn about the metrics that provide you with all the necessary data for the subsequent optimization.
Get Kubernetes metrics
Start with retrieving the Kubernetes metrics to get a clear understanding of what is going on in your clusters. For that purpose, you can use open-source tools such as Metrics Server, and various kubectl
commands.
First, install the Metrics Server. There is a chance it is already deployed in your cluster, so you can check that by running
kubectl get pods --all-namespaces | grep metrics-server
You will get a response with a list of running pods if the Metrics Server is running. Otherwise, run the following command to retrieve the latest version of the tool:
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
Metrics Server allows you to collect the data on the availability of resources: CPU, memory, storage, as well as their utilization. kubectl top
is a command that allows you to get the metrics from a particular node, pod, or all cluster’s pods or nodes.
For example, if you run
kubectl top node
you will get the current CPU and memory utilization of all nodes.
The same applies to
kubectl top pod
The command
kubectl top pod --namespace=kube-system –-container
will return the resource utilization in a pod distributed among containers.
The command
kubectl get --raw /apis/metrics.k8s.io/v1beta1/nodes/<NODE_NAME> | jq
or
kubectl get --raw /apis/metrics.k8s.io/v1beta1/namespaces/<NAMESPACE>/pods/<POD_NAME> | jq
will return the Metrics API in a JSON format from a pod or a node.
Another useful tool is kube-state-metrics
, which return the state of nodes and pods: status, capacity, amount of replicas as per deployment, etc.
Finally, you can use
kubectl logs
to retrieve the detailed information on pods or nodes.
Analyze GC data
Java applications perform memory management by means of a Garbage Collector (GC). GC tuning can have a significant impact on performance. But first, you need to get a clear understanding of what is going on with garbage collection in the application. For that purpose, GC logs are of great help.
To begin with, enable GC logs:
-Xlog:gc*:<gc.log file path>:time
After that, you can aggregate these logs using the centralized log management and APM systems or analyze individual logs. GC logs are generated in a text format and can be processed by scripts and desktop and online GC log analysis tools.
JFR is another useful monitoring tool. We have already discussed how to use it to pinpoint memory issues and identify the root cause of Stop-the-World pauses. JFR records GC events that can be processed by JDK Mission Control or APMs, and there are also simple JMX indicators.
GC throughput is a ratio of application time spent doing business logic as compared to the overall time, which includes GC and other JVM service work. The CPU resource is utilized properly when CPU load is close to 100% and GC throughput is close to 100% too.
Measure app performance under load
Remember that applications may behave differently in the cloud and under high loads. Never rely on baseline performance and know the CPU and RAM limits and requirements. In summary, you should do the following:
- Perform load testing to understand the app’s behavior under both normal conditions and possible peak load
- Perform longevity testing to validate the stability of the application and its consistent quality
- Identify peak rates and latency. Any system may fail unexpectedly, but you should try to prevent it. Keep in mind that peak rates assume high resource utilization
- Identify CPU and RAM requirements per service
- Use mock services to simplify the testing and give your team reference points to go by
Retrieve and understand VM statistics
JVM tuning involves GC optimization, but is not limited to it. In short, JVM performance tuning is a complex task, which will be the subject of a separate article. But before optimizing any settings, you have to learn how to retrieve JVM metrics and perform JVM monitoring.
It is possible to collect performance metrics observed on a client by using stress load tools like JMeter or wrk, or to study inner metrics captured on back-end and collected by the APM. They can be collected at various levels:
- machine
- JVM
- web server, BigData system, message queue, etc.
- application
Latency is often critical for customers. It is measured and targeted at certain percentiles (50%, 95%, 99%, 99.99%, 99.9999%). So it can be additionally studied at different angles like overall application lifetime or lifetime after reaching peak performance.
Conclusion
In this article, we learned what metrics you should use to identify the problems with resource consumption in your cluster and how to retrieve them. The most important message is: no default settings, algorithms, or automation tools help you optimize the resource consumption the way you desire. You have to get your hands dirty to optimize the containers and configure the JVM yourself.
If after reading the article you feel bewildered with the amount of work that lies ahead of you, book a free consultation with a BellSoft engineer. Together we will walk through the optimization hardships to reach for the stars!