July 22, 2020
Hunting down code hotspots is probably the most common task for Java profilers. JDK Flight Recorder (JFR) and Mission Control (MC) are free and open source performance/profiling products available with OpenJDK 11. They have a few powerful tools for code execution profiling.
In this article, I would like to walk you through the code profiling features they offer.
What is a code hotspot?
Usually, you are starting a profiler when there is a performance problem, or you want to optimize code to meet specific performance goals. “Performance” is most commonly expressed as execution time (time needed to execute operation) or throughput (number of operations executed per time unit).
If you want to reduce the time spent on a request, it is obvious to focus on code that takes the longest to run. This is what we call “hot code” or “code hotspots.” Profilers are great tools for identifying “code hotspots” and JFR + MC is an excellent profiler combo.
Let’s launch the application and start profiling, right?
Starting profiling with Mission Control
We will work with OpenJDK 11 and Mission Control 7.1, JDK Flight Recorder is integrated into OpenJDK 11. You can get the distribution of OpenJDK and Mission Control for your OS from Liberica if you don’t have them already.
If you stick with Java 8, Flight Recorder support is available with OpenJDK 8u262. You can find more details in this post by BellSoft.
You can use mission control with either a locally running JVM or a remote one. In the latter case, you need JMX port configured on a remote JVM.
Of course, you would need an application you will profile and some load to keep it busy.
Working with Local JVM
Locally running JVM processes are listed in the “JVM Browser” view. You need to identify the JVM you want to profile. Besides, you can either open the JMX console for that JVM or control flight recorder.
Working with remote JVM
With OpenJDK, Flight Recorder is available remotely via JMX. As might be expected, you would need a JMX socket configured on JVM that you would like to profile.
There are two ways to make the JMX socket available:
- JMX could be enabled via JVM command-line arguments. You can find the required configuration options in the official documentation.
- Using the
jcmdcommand, you can start the JMX socket on JVM with no up-front configuration and no restart necessary.
Below is an example of a command to start JMX with minimal configuration, you would need to know the PID of a process and be logged under the same user as the target JVM.
jcmd PID ManagementAgent.start \ jmxremote.authenticate=false \ jmxremote.ssl=false \ jmxremote.port=5555
Once the command is executed, you can connect to your JVM via port 5555 using the instruction in the following pages.
If the JVM you want to profile is behind NAT/firewall (e.g., it is running within Kubernetes), you may need to use port forwarding and additional configuration tweaks to make JMX work.
Having configured the JMX socket, you need to add remote JVM to it in the “JVM Browser” view. Click the button
Before starting the profiler
Before starting profiling and focusing on code, it would be nice to get an overall CPU usage picture on the host where your JVM is running.
- Is the CPU overutilized?
- Do other processes compete for CPU resources?
- How much CPU is the JVM process consuming?
- Which Java threads account for the highest CPU usage?
Answers to these questions will help you to choose the right reports in Mission Control later.
These questions could be resolved with standard system tools such as top and pidstats, but it is also possible to use JMX console in Mission Control.
Mission Control JMX Console and Thread monitor
Before starting Flight Recorder, I always recommend looking at the JMX console built into Mission Control.
With JMX console, you can both monitor the system and process CPU usage. These metrics are available at the overview tab.
You can observe CPU usage per thread as well, which is quite beneficial information for code execution profiling. You need to go to the “Threads” tab and check the “CPU Profiling” box above the table of threads. After this, you could see values in the “Total CPU Usage” column.
Pay attention, though, that Mission Control shows the percent of CPU usage from all cores available on the host. As a single thread cannot consume more than a single core, in that table, 100% / N (where N is the number of cores) is the max value you can see.
What if JMX is not an option?
What if you have console access to JVM, but no way to connect to it via JMX?
You can still record JFR files using jcmd, then copy them to your machine and open in Mission Control. I will elaborate on this scenario later in this article.
Starting JDK Flight Recorder session
Let’s start the Flight Recorder now. Right click on the “Flight Recorder” node in the tree on the “JVM Browser” view and select the “Start Flight Recording …” option.
You will see a dialog with options to start the Flight Recorder session. This screenshot reflects starting Flight Recorder for 2 minutes, with “Profiling - on server” default settings.
Now you can hit the “Next” button and get another dialog allowing you to tweak the most common Flight Recorder options for the next recording session.
Some of these options would affect the reports explained in this article:
- “Method Sampling” — this option controls the frequency of stack trace sampling. “Maximum” means 100 samples per second, which is a good trade-off between performance and data quality.
- “Exceptions” — you can choose whenever to record all Throwables or just Errors. Recording all Throwables may be expensive, but may be useful to pinpoint excessive exception usage in application.
- Thresholds “Synchronization”, “File I/O”, “Socket I/O” — for detailed analysis of “cold” hotspots, I would recommend you set these thresholds to zero. Though it may increase overhead significantly, so do it only if you really need this information.
Now let’s hit “Finish” to start the Flight Recording session. You can click “Next” again, although, in that case, you would have a chance to view/edit Flight Recorder configuration at a lower level.
After starting the Flight Recorder session, Mission Control would wait for the specified duration, then stop Flight Recorder, dump the data file, open it and present several reports.
Making JFR recording without Mission Control
JFR recording could be produced without Mission Control, too. You can use jcmd to create JFR files, then copy them to desktop and open In Mission Control.
Here is a quick instruction to capture JFR recording. You would need to know the PID of the target JVM and execute jcmd under the same user account.
- Start JFR recording with command below
jcmd <PID> JFR.start settings=profile
The result of this command should be something like what we have below. You would need to remember the number of the recording for later.
Started recording 1. No limit (duration/maxsize/maxage) in use. Use JFR.dump recording=1 filename=FILEPATH to copy recording data to file.
- Wait a little for data to collect
- Use the command suggested in step 1 to dump JFR data
jcmd <PID> JFR.dump recording=<N> filename=<FILENAME>
This action would create a JFR file.
- Stop the recording session
jcmd <PID> JFR.stop recording=<N>
Now that you have a JFR file, you can open in Mission Control later.
Method profiling report
As you may remember from the previous article, Flight Recorder collects various types of events and Mission Control builds a number of reports from that data.
We want to identify areas of code that have contributed most to the execution time of our request. The usual profiling method for such tasks is a stack trace sampling.
The idea is simple. During execution at regular intervals, Flight Recorder is recording trace for each thread. Taking samples from a single thread and observing that method X is present on a stack in 50% of the sample, you can assume that the sum of all method X invocation time is 50% of profiling session time.
While the idea is simple, well, in practice, you have to deal with multiple threads and non-ideal sample distribution. Still, this approach remains extremely useful in practice.
Let’s open the “Method profiling” report.
The report itself is pretty barebone, and if you have used a different Java profiler before, it feels lacking details.
They are hidden in a “Stack Trace” view (panel below).
Power of “Stack Trace” view
Stack view allows visualizing a set of stack traces as a tree.
How do stack traces become a tree?
We all know how Java stack traces look like (be it exception stack trace or stack trace from thread dump). Data produced by sampling profiling is just a bunch of stack traces.
Here are steps on how this data is transformed into a tree.
- Convert each stack trace into a string by concatenating frames.
- Now that you have a multiset of strings, calculate each one’s occurrences and compile a histogram.
- You’ve got a histogram, a table with two columns “trace” and “count.” If you sorted this table and group by common prefix, it would become a tree.
In the context menu, you would have the following “Stack view” customization options.
Group Stacktraces From
This option controls how frames are concatenated in step 1 of the algorithm above and what would be root nodes in the tree.
- Last Frame — aka the “hot methods” mode.
- Thread Root — aka the “call tree” mode.
Distinguish Frames By
This option controls which information is stripped from the frame description in step 1.
Take note of the “Line Number” option. Line Numbers add clutter to the tree, but sometimes you want them to be visible to get the exact reference at source code.
The default tree is horizontally compressed, which is good but could confuse fresh eyes. Here you can switch back to a classic tree presentation.
Stack view would work for all reports based on events that incorporate stack trace (including method profiling).
Mission Control could even mix different types of events in the same stack view, but be aware that mixing events sampled by different rules (e.g., method profiling and IO traces) does not make much sense.
Why CPU usage is important
I mention what CPU metrics are important during profiling with JDK Flight Recorder.
If we look at Java thread at any given time, it could be
- running on CPU in Java code
- ready to run in Java code but waiting in OS queue to get CPU
- running on CPU in native (non-Java) code
- ready to run in native code but waiting in OS queue to get CPU
- waiting to do something in native non-Java code
- waiting/blocked at the Java level
JFR samples only threads in categories 1 or 2. There are also separate sampling for 3, 4 and 5, which is not well exposed in any dedicated report in Mission Control at the time of writing.
So, if your server is starving on CPU, JFR will show you a particular picture, but it would be skewed due to mixing 1 and 2 categories. Sampling picture and would not accurately reflect real code computation cost in this situation.
If the server is not CPU starved and threads you are profiling are actively consuming CPU, method sampling is the right tool for you to start with.
If the server is not CPU starved and threads you are profiling are low on CPU, you are likely to have a “cold” hotspot, falling into categories 5 and 6.
Category 5 code you can analyze with the “Native method sampling” event.
Category 6 is either contention or IO, examined with separate reports.
So by looking at CPU, you get an idea which report would be most useful next.
Retrospective thread CPU usage in Mission Control
What if you have a JFR recording and have no idea of CPU usage by threads at runtime? No problem, that data is recorded in a JFR file as well.
You need to go to the “Event Browser” and find the “Thread CPU Load” event under the “Operating System” / “Processor” category.
Here CPU usage for each thread is recorded with 10-second intervals. Remember that the percentage is taken from the total number of cores: in my screenshot, 25% would mean 100% of a single CPU core.
JVM and host OS CPU usages are also recorded. Look under the “CPU Load” event type in the same category.
Limit scope to a subset of threads
Cool, now you can find CPU hungry threads, but the “Method Profiling” report shows all threads. Filtering by a subset of threads would be a good idea.
Mission Control has very flexible and very nonobvious filtering features.
We have to have only a subset of threads in the “Method Profiling” report. Here are the steps.
- Go to the “Threads” view
- Select one or more threads in a list of threads on the left
- Right click and choose “Store Selection” in the context menu
- Go back to “Method Profiling”
- In the filter combo select source “Threads Histogram Selection” and aspect “Thread Name is …”
Now report includes events only for threads you have chosen.
Any filter is shared across all reports, so do not forget to reset it then switching reports.
What do I see in the “Stack Trace” view of the “Threads” report?
While walking through the steps above, you may have noticed that the “Stack Trace” view is available in “Threads”. Moreover, it was reacting to selection in the thread list.
How is the “Stack Trace” on the “Threads” report different from the “Method Profiling” report?
Each report has a scope: a set of events (usually certain types of events) used to calculate the report. When you are working with report UI, a focused subset of events in scope is maintained behind the scene.
A filter could narrow the scope, and this is what we did in the previous section.
Any event may include the stack trace, and many do. The “Stack Trace” view listens to a focused set of events in an active report and visualizes all stack traces available.
In “Method Profiling,” only method profiling samples are in scope.
The “Threads” report includes almost every event bound to some thread (both sampled and non-sampled). So, while technically they could be aggregated together, it does make little sense.
Nevertheless, you can use filters to fix this to make the “Threads” report show the same picture as “Method Profiling”.
- Go to “Event Browser”
- Find “Method Profiling Sample” is event types tree
- Right click on it and choose “Store Selection” in the context menu
- Go back to the “Threads” report
- In filter combo select source “Event Types Tree Selection” and aspect “Event Type is Method Profiling Sample”
Now your “Stack Trace” would include only method profiling data, and you can change select threads without switching reports.
Caveats of JFR method profiling
The way JFR takes stack traces during profiling is precise. The majority of Java profilers use a JVM wide thread dump to get traces of all threads at once.
A thread dump is a Stop the World operation, and doing it 100 times per second could be expensive. JFR manages to avoid Stop the World. Instead of stopping all threads at once, JFR stops them individually (using OS facilities) and captures stack traces from individual threads.
Entering/leaving Stop the World state involves sophisticated protocol between application and system threads in JVM, and JFR skips this overhead altogether.
Yet, this optimization introduces certain nuances data captured by JFR. Only stack traces ending up in Java code (not the native code) are recorded and visible in the “Method profiling” report. Thus, unless threads you are looking at were active CPU consumers, the picture in “Method profiling” would be skewed.
Besides “Method Profiling Sample”, there is the “Method Profiling Sample Native” event type which is capturing threads in states 4, 5 and 6 (see the list mentioned earlier). However, this event type is available only in the “Event Browser” not in the “Method Profiling” or the “Threads” reports. The good news is that the “Stack Trace” view also works in the “Event Browser” report.
Another important caveat for all types of Java sampling profilers is the infamous “safepoint bias.” In short, due to various effects of JVM runtime and JIT compiler, profilers could incorrectly identify the exact hot method or code line. In some edge cases, profilers could be hilariously inaccurate.
Technically, JFR avoids the “safepoint” part of the problem, though it is still biased due to the way Java JIT compiler works.
Here you can read more about a few examples of such hilarious profiler inaccuracy.
While the edge case of the “safepoint bias” effect could make you lose your trust in the profiler, it is rarely a problem in actuality. A profiler’s job is to narrow the scope of search; it doesn’t need to be 100% accurate to be useful. But do not trust it blindly either.
“Cold” code hotspots
A hotspot is a portion of code (e.g., method or line of the source file) responsible for a considerable part of execution time during request processing (or other kinds of workload) related to other code involved.
But how was that time spent? Was that code burning CPU cycles or just sitting cold off the CPU waiting for something?
“Cold” hotspot is a kind of code that consumes time, but does not consume CPU resources. Most typical kinds of “cold” hotspots are
- Blocking IO calls
- Contention points in multi-threaded applications
- Waiting for completions of async tasks
The method sampling approach used by JFR is not suitable for identifying “cold” host spots (it could see only the “CPU hot” code). There are separate reports for blocking IO and contention.
But how would you know what you are dealing with a “cold” hot spot?
Low thread CPU usage for a thread supposed to process requests is a symptom that should trigger your attention.
File I/O report
This report is composed of “File Read” and “File Write” events, capturing blocking file IO operations. Since events include file name, this report shows at the file level how many bytes were read/written to each file and how much time it has taken.
An important caveat here is to consider the JFR recording session’s threshold; only operations exceeding it would be visible here.
The report is relatively simple but could be quite powerful if combined with the filtering capabilities of Mission Control.
Naturally, filtering and “Stack Trace” views are also available here.
Socket I/O report
The Socket I/O report is fairly similar to File I/O, but aggregation is shown by remote address and port.
The taskbar lists three aggregation options available.
- By Host
- By Port
- By Host and Port
Lock instances report
Lock instances reports are based on the “Java Monitor Blocked” event type. These events are produced when a Java thread is blocking trying to acquire the semaphore.
This report is only useful with “old style” synchronized keyword-based thread coordination (and by old I do not mean outdated, “synchronized” has its case in modern Java).
What about “java.util.concurrent” based contention?
JFR equally has events related to “new style” synchronization “Java Thread Park”, though there have been no dedicated reports in Mission Control so far.
We’ve already had a brief interaction with the “Threads” report. It consists of a thread list and a timeline area.
The thread list is simply a list of threads, but the timeline is more interesting.
It has lanes for each thread and could visualize a wide range of JFR events. By default, they are so-called “Java Latency” events, but you can customize that via the context menu (see “Edit Thread Activity Lanes” option).
Typically, individual events on the timeline would be tiny. You can hover to get one’s details under the mouse pointer, but it is not very useful if events are subpixel sized.
You can zoom, though, drag the rectangle over the timeline and hit “Zoom to Selected Range” from the context menu to zoom in.
You can use the time range selected in the “Threads” report as a filter on other reports, too.
In this article, I have shown key features of JDK Flight Recorder / Mission Control most useful to code execution profiling or hunting “code hotspots.”
From its length, you could assume that Mission Control is not the most intuitive tool. This is true, and one should keep in mind a few fundamental principles to use it efficiently.
- Flight Recorder could be started for the command line, which is very helpful in environments where JMX access is impossible or complicated. In addition, it does not require any upfront JVM configuration; if your JDK supports JFR, you can start using it anytime.
- CPU usage is a crucial metric. It is essential you know which kind of hotspot you are looking for. If you cannot monitor CPU usage in real time, you can find this information in the Flight Recorder file.
- CPU consuming (“hot”) hotspots are very different from “cold” ones caused by code spending time in idle state. For a former traditional stack, trace sampling (the “Method Profiling” report) works fairly well. Later instances are more sophisticated. You may need to enable zero thresholds for I/O and contention events to fully picture idle state events in Mission Control.
- “Stack Trace” view is a potent tool you really need to get familiar with. It works for a wide range of events. For sampled events, you are most likely to be using “hot methods” mode, whereas “call tree” is more informative for traced ones.
Performance is a hard topic. No matter how great your tools are, you need to learn JVM and OS’s inner workings. Yet tools are valuable and would save you lots of time.
JDK Flight Recorder and Mission Control are a great combo, and them being completely open source is an additional advantage in my eyes. While the learning curve is steep, getting to know even a few tricks from this tool arsenal could help you greatly when dealing with performance issues in Java.