posts

Enhancing JVM monitoring with Prometheus on AWS

Jul 21, 2022
Md Kamaruzzaman
22.5

This article is the fifth part of the cloud-native microservice development in Java™. In the first part of the series, we designed the microservice architecture and developed two microservices for our simple Java e-commerce application.

In the second part, we containerized our applications using the cloud-native buildpack implementation paketo.io., which is natively supported by Spring Boot. We created a container image of our microservices using Liberica JDK, a progressive Java runtime from a major OpenJDK contributor.

In the third part, the container image was published in the AWS Container Registry ECR and deployed on the managed Kubernetes service of Amazon Cloud (EKS). We also demonstrated our microservices in action using the Postman collection.

In the fourth part, we configured custom domain names for our microservices. We also secured the web traffic using TLS 1.2+ and applied logging using Fluent Bit and CloudWatch. In addition, we performed monitoring using CloudWatch Container Insights.

Nevertheless, monitoring of cloud-native microservices is a vast and important topic. This article provides a tutorial on using the Prometheus monitoring tool with microservices running on EKS Kubernetes cluster.

  1. Kubernetes cluster monitoring and TCO reduction
  2. JVM monitoring with Prometheus on AWS
    1. Enable Prometheus in Spring Boot apps
  3. Amazon Managed Service for Prometheus (AMP)
    1. Set up AMP Workspace
    2. Set up Prometheus Metrics Collector
    3. Install a Prometheus server
    4. Set up IAM Role
  4. Visualizing metrics with Amazon managed Grafana
  5. Conclusion

Kubernetes cluster monitoring and TCO reduction

For the C-Level executives and higher management, keeping the TCO on public cloud low is a key criteria. Most of the time, running the Kubernetes cluster takes significant costs. In addition, choosing the right size of the Kubernetes worker node is important: we need to keep the costs low, but at the same time have a large enough node to run the job efficiently. With JVM monitoring, we can easily have an in-depth view of the JVM runtime in the Kubernetes. It will allow us to save the cost of picking the right kubernetes node size. In addition, understanding the behavior of the application will give you grounds for solving the issues you may encounter in the cloud.

Kubernetes also gives us a declarative way to scale our application using pod replication. The replication provides a very convenient way to improve the availability and fault-tolerance of our application. The tools we will analyze in this article, Prometheus with Grafana, give us the opportunity to monitor all the pods in our EKS cluster.

JVM monitoring with Prometheus on AWS

If we run a Java application (or any JVM-based application) in a container, then the application actually runs on JVM. Thus, if we only monitor the container, we cannot get the full picture of the application. For better observability, we need to monitor the JVM runtime along with the container. One of the limitations of the Amazon CloudWatch Container Insights is that it can only collect metrics and perform alerting on the container level, but not on the JVM level.

There are several tools we can use to monitor JVM-based microservices in the Kubernetes environment. One of the best among them is Prometheus.

Prometheus is a widely used monitoring and time series database. It is part of the Cloud Native Computing Foundation (CNCF). Also, with 40k+ GitHub starts, it is the most popular open-source monitoring tool.

Prometheus offers very efficient storage using a time series database, many integrations, powerful queries using the PromQL query language, great visualization, and alerting.

Unlike CloudWatch, Prometheus is a pull-based monitoring system that actively collects or scrapes monitoring data from the application exposed via the Metrics API. In the cloud-native microservice application development, pull-based monitoring systems like Prometheus can have some advantage over push-based monitoring as they generate less traffic in the network.

There are two ways Prometheus scrapes the metrics data. With the first approach, apps expose the Metrics API using the client library. The client library also enables developers to expose the application specific business metric (counter, gauges) to the Prometheus monitoring. This method is used for monitoring developed apps/services. The other option is Exporter-based where an Exporter running alongside an application exposes the metrics via API. This method is used to monitor third-party applications like a database. We will use client library based scraping to monitor our microservices.

To activate Prometheus monitoring for AWS, we need to update our Spring Boot applications.

Enable Prometheus in Spring Boot apps

In Java applications, there are several ways to expose the monitoring endpoints for an application. These monitoring endpoints are used to collect application metrics information including Prometheus data to interact with JMX beans in order to expose the health endpoints. In the Spring framework, Spring Boot Actuator is used to expose these production-grade monitoring endpoints in any Spring MVC application. For more information about Spring Boot Actuator, study the official Spring documentation.

If you have only just joined us, feel free to pull the ready microservices from the GitHub repository or use your own application.

In our Customer microservice, we need to add the following dependency for Prometheus to the build.gradle file: io.micrometer:micrometer-registry-prometheus.

We also need to add the following configuration to expose the Prometheus Metrics API:

management:

 endpoints:

   web:

     exposure:

       include: health, prometheus, info, metrics

   health:

     show-details: always

 metrics:

   tags:

     application: MonitoringCustomerMicroservice

In the next step, we need to make our application running on the pod auto-discoverable so that Prometheus can auto-discover and scrape the Prometheus metric from the scrape endpoint. For this, the following “annotations” section is added to the “esk-deployment.yaml” file:

template:

 metadata:

   labels:

     app: microservice-customer

   annotations:

     prometheus.io/scrape: "true"

     prometheus.io/port: "8080"

     prometheus.io/path: "/customer/actuator/prometheus"

Now, we need to create a Prometheus namespace in our EKS Kubernetes cluster:

kubectl create namespace eks-prometheus-namespace --kubeconfig ~/.kube/config

We now need to create a docker image and deploy it in the EKS cluster namespace “eks-prometheus-namespace” as described in the previous part of the series.

Once the application is successfully deployed, we can check whether the Prometheus endpoint is running by visiting the Actuator endpoint:

Actuator endpoint

The Prometheus endpoint shows the following Prometheus metrics:

Prometheus endpoint

Repeat the above steps for Order microservice to enable Prometheus there as well.

Amazon Managed Service for Prometheus (AMP)

Now that we enabled Prometheus monitoring in our microservices, it’s time to move back to the cloud.

AWS offers Prometheus compatible monitoring and alerting service to monitor containerized applications and infrastructure at scale. With Amazon Managed Service for Prometheus, we can use the open-source Prometheus query language (PromQL) to monitor and alert to the performance of containerized workloads, including JVM running in a container. One of the biggest advantages of Amazon Managed Service for Prometheus over self-managed Prometheus is that AWS manages and automatically scales the ingestion, storage, alerting, and querying as workloads vary. The other advantage is that it is integrated with EKS, ECS, and AWS distro for OpenTelemetry.

Set up AMP Workspace

AMP Workspace is the conceptual location to ingest, store, and query the Prometheus metrics for a Project/Application. Thus, AMP workspace helps to isolate different project/application monitoring in Prometheus.

We can create an AMP Workspace by using either AWS CLI or AWS Console. In the Amazon Prometheus, we can create an AMP Workspace as shown below:

AMP Workspace

Once created, we can see the details of the created Prometheus Workspace in the AWS Console:

Prometheus Workspace in the AWS Console

Set up Prometheus Metrics Collector

The AMP does not automatically scrape operational metrics. For this, we need to configure the Prometheus Metrics Collector. The Metrics Collector scrapes the operational workload from the containerized Java microservices running in the cluster and sends them to the AMP Workspace.

Collecting the metrics

There are several ways to deploy a Prometheus Metrics Collector in AWS: Prometheus server or OpenTelemetry agent. In this example, we will use the AWS distro for OpenTelemetry Collector/ Prometheus server in the cluster to collect the metrics data.

Install a Prometheus server

We will install a new Prometheus server using Helm to ingest the Prometheus metrics data.

Helm is the popular package manager for Kubernetes. It is a very convenient way to find, share, and use software built on Kubernetes.

First, we need to install Helm in our local machine as described in the official documentation.

In the next step, we need to add a new Helm chart repository for Prometheus:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts

helm repo add kube-state-metrics https://kubernetes.github.io/kube-state-metrics

helm repo update

Set up IAM Role

We need to set up the IAM roles for service account for the ingestion of metrics from Amazon EKS Cluster. The Prometheus server sends the Data using HTTPS. The data must be signed using valid AWS credentials and the AWS Signature Version 4 algorithm to authenticate and authorize each client request for the managed service. The requests are sent to an instance of AWS signing proxy which will forward the requests to the managed service.

The AWS signing proxy can be deployed to an Amazon EKS cluster to run under the identity of a Kubernetes service account. With IAM Roles for Service Account (IRSA), one can associate an IAM role with a Kubernetes service account and thus provide AWS permissions to any pod that uses that service account. This follows the principle of the least privilege by using IRSA to securely configure the AWS signing proxy to help ingest Prometheus metrics into AMP.

A detailed description of how to set up the IAM Roles for Service Account to use AMP can be found in the user guide.

After creating two roles (one for ingesting metrics data and the other for querying metrics data), we can verify the rules by searching the roles in AWS Console as shown below:

Verifying the roles

Helm enables us to overwrite the Kubernetes configuration with a value file. We need to create a value file with the name “prometheus_eks_values.yaml”:

serviceAccounts:

 server:

   name: amp-iamproxy-ingest-service-account

   annotations:

     eks.amazonaws.com/role-arn: ${IAM_PROXY_PROMETHEUS_ROLE_ARN}

server:

 remoteWrite:

   - url: https://aps-workspaces.${AWS_REGION}.amazonaws.com/workspaces/${WORKSPACE_ID}/api/v1/remote_write

     sigv4:

       region: ${AWS_REGION}

     queue_config:

       max_samples_per_send: 1000

       max_shards: 200

       capacity: 2500

In this file, we have configured the Amazon Managed Service for Prometheus as a target where the Prometheus server running on the EKS Kubernetes will send the scrape monitoring data.

Now, create the Prometheus server using the following Helm command:

helm install prometheus-chart-eks prometheus-community/prometheus -n eks-prometheus-namespace -f prometheus_eks_values.yml

Visualizing metrics with Amazon managed Grafana

Grafana is an open-source analytics and visualization web UI. It provides a dashboard containing charts, graphs, and alerts when it is connected to a supported data source. Grafana can be helpful in many use cases (logging, monitoring, tracing) with various tools. It is also used in tandem with Prometheus to provide a dashboard for Prometheus data on AWS.

Amazon provides a fully managed Grafana.

Below are the steps to set up the Amazon managed Grafana to visualize our Prometheus monitoring data.

For user authentication in the managed Grafana, we will use the AWS SSO. If the “AWS Organization” is enabled, then you can create an AWS SSO user as shown below:

Creating an AWS SSO user

The next step is to set up the managed Grafana workspace. The Grafana workspace is a virtual Grafana server used as a unified dashboard for different data sources.

From the “Amazon Grafana” > Workspaces, a new workspace can be created by selecting the “Create workspace”:

Creating a workspace

Now, provide the workspace name in the following console page. We gave the name “prometheus-metrics” as the workspace name:

Providing the workspace name

Next, we need to configure our workspace. For authentication access, we choose AWS SSO as authentication method, so that our previously created WEB SSO account can be used to log in to Grafana. For “Permission Type”, we have selected “Service managed” so that AWS automatically provides the permission.

Configuring the workspace

In the final console page, we set the “IAM permission access settings” as “Current Account”. In the section “Data Sources”, we can select different data sources so that we can have one Grafana dashboard for various purposes with various tools (e.g. Prometheus Monitoring, AWS CloudWatch Monitoring, AWS X-Ray Tracing, etc.) We selected “Amazon Managed Service for Prometheus” as a data source. As a notification channel, we selected “Amazon SNS”.

Once our Grafana workspace is created, we can assign our previously created AWS SSO user to Grafana workspace as shown below:

Assigning a user to the workspace

IAM permission access settings

Now, we can login to the Grafana workspace using the workspace URL and AWS SSO. After successful login, we can add our Prometheus workspace as “Data Source”:

Adding the data source

In the data source, we need to provide our Prometheus Query URL endpoint without the “/api/v1/query”. In addition, we enabled the “SigV4 auth” in the “Auth” section. In the “SigV4 Auth Details”, we selected our region “eu-central-1” as region and “AWS SDK Default” as authentication provider:

Data source settings

Next, we need to add a dashboard to monitor our Spring Boot application running on EKS Kubernetes cluster. In grafana.com, there already exists a dashboard to monitor the Spring Boot APM. We can import it using the import id “12900”:

Importing a dashboard

After a successful import, we will see the following dashboard with Kubernetes monitoring data including JVM specific monitoring information like CPU usage or heap used:

Spring Boot APM dashboard

There is also additional JVM monitoring information including GC information:

JVM monitoring metrics

In addition to the JVM monitoring, we can perform the in-depth Kubernetes monitoring using the Prometheus and Grafana. By adding a Grafana dashboard for Kubernetes, we can see the metrics of the Kubernetes cluster (number of nodes, pods, cluster data). We can also search the metrics for a time period. Here is a snapshot with the number of replicas running in our cluster for the namespace “eks-prometheus-namespace”:

Cluster replicas data

We can also see the history of the replica with the following search criteria:

Replica history

The above diagram shows that all our replicas (4 for our microservices and 4 for the Prometheus cluster) are running without any failure.

In the production system, it is very critical to set an alert in case there is an issue with the application. Alerting enables the production support team to react quickly if there is a prod issue. Setting an alert is essential for fulfilling the SLA/SLO of an application.

Here are some use cases for alerting:

  • CPU is used over 80%
  • Memory is used over 80%
  • Pods are restarting every 10 minutes
  • Database is 90% full

When an alert condition is met, a trigger is usually activated to inform the production support team or security team. Depending on the alert, one or more of the following triggers are used: email, SMS, chat.

Amazon Managed Service for Grafana allows setting the alerts by specifying alert rules.

Conclusion

We demonstrated how to implement the JVM monitoring in a cloud-native way using the leading cloud provider AWS and the most popular open-source monitoring tool Prometheus. Please note that JFR can also be used for JVM monitoring and diagnostics, but it is used mainly for non-cloud-native applications. If you want to learn more about JFR, you can read our post about JFR with code examples.

In modern cloud-native enterprise software development, monitoring is an intricate part of TCO reduction strategy. In case your application underperforms or your cloud bills bloat, you have to know what went wrong, and monitoring is a great way to do that.

If you are looking for other ways to reduce TCO of Java applications, check out our articles dedicated to the topic:

BellSoft strives to make your Java experience as smooth and profitable and possible:

  • We created the smallest containers on the market for you to save precious cloud resources
  • Our GraalVM-based utility, Liberica Native Image Kit, helps to accelerate the startup of applications up to 1/10s
  • Our High-Powered support costs 5 times less than that of Oracle’s and is there for you 24/7. Receive help from our experts within 24 hours, off-cycle fixes, and emergency security patches

Start your journey with us by trying out Liberica JDK, a 100% open-source and free OpenJDK distribution with the widest range of supported platforms.

Subcribe to our newsletter

figure

Read the industry news, receive solutions to your problems, and find the ways to save money.

Further reading