Earlier in this series, we reviewed in brief the TeX typesetting system as well as the state of Java-based TeX engines. We settled on integrating JMathTeX into our application. In this final part, we’ll profile our application created with Liberica JDK to determine where optimizations will reward us with the most significant performance gains.
Introduction
Previously, we used libraries to translate TeX expressions into SVG elements. You may have noticed there was a delay between completing the expression and seeing it rendered. While not a problem for a document having a few TeX expressions, as the number of expressions increases, the preview will take longer to render. Eventually, we’d have to choose between delayed (asynchronous) output or unresponsive input. In this article, we’ll profile our application while it executes on Liberica JDK so as to understand where best to direct our efforts. This is in line with advice offered by Donald Knuth:
"There is no doubt that the grail of efficiency leads to abuse. Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.
Yet we should not pass up our opportunities in that critical 3%. [Good programmers] will not be lulled into complacency by such reasoning, [they] will be wise to look carefully at the critical code; but only after that code has been identified." ~ Donald E. Knuth, Structured Programming with go to Statements, ACM Computing Surveys, Vol 6, No. 4, Dec. 1974, p.268.
We don’t know the code’s performance bottlenecks until we measure; spending a lot of time on micro-optimizations is usually wasteful.
There’s a momentary blip during TeX equation rendering that we’d like to eliminate. In effect, we want to determine the “critical” path that Knuth described. We can accomplish this using a software tool—called a profiler—that gathers information about what method calls take up most of the application’s run time.
Review Profilers
Free and open-source tools to measure Java application performance include:
- BellSoft’s Liberica Mission Control
- NetBeans’ Profiler
- Oracle’s VisualVM
- Andrei Pangin’s async-profiler
- Richard Warburton’s Honest Profiler
Another tool that can help with optimizing code is the Java Microbenchmark Harness (JMH), which is a solid piece of software geared towards building, running, and analyzing application performance down to the nanosecond if needed.
We’ll use async-profiler because it can produce useful results in a few keystrokes.
Install Profiler
Install and configure the profiler as follows:
- Download the latest release for your target architecture.
- Extract the archive into $HOME/bin/async-profiler.
- On Linux, configure the kernel to reveal its call stacks as follows:
- Run ~/bin/async-profiler/profiler.sh to verify the installation.
The profiler is ready.
List Java Processes
Java comes with a program called jps
that lists the name and process identifier (pid) for all running Java applications. Here’s example output when run (your output will differ):
$ jps
972229 Launcher
972067 GradleDaemon
639188 Main
978993 Jps
The profiler can invoke the jps
command to attach to the most recently launched Java process, automatically.
Update Unit Test
Writing a typical JUnit unit test annotated with @Test
won’t give us a clean profile of the application’s performance for a few reasons. First, we don’t care to test the performance of JUnit itself. Second, if we ran the test using Gradle, we’d find that Gradle takes up much of processing time. Instead, we’ll update TeXProcessorTest such that we can run it from the command-line.
We want to test the functionality for converting TeX to an SVG element with as little overhead as possible. Additionally, we should provide an equation that’s a bit more complicated than Euler’s identity, such as the Klein–Gordon equation:
Open the TeXProcessorTest
class then add the following method:
static boolean test_Performance_TeXInput_SvgOutput() {
final var processor = new TeXProcessor( null );
final var svg = processor.apply(
"""
$\\frac{1}{c^2}\\frac{\\partial^2}{\\partial t^2}
\\psi-\\nabla^2\\psi+\\frac{m^2c^2}{\\hbar^2}\\psi=0$
"""
);
return svg.startsWith( "<svg" );
}
This is about as close to the conversion from TeX to SVG as we can get without making the toSvg
method public in the TeXProcessor
class. Next, introduce a main
method that loops for a while so that we can profile our performance test:
@SuppressWarnings( "StatementWithEmptyBody" )
public static void main( final String[] args ) {
final long stop = nanoTime() + NANOSECONDS.convert( 5, MINUTES );
while( nanoTime() < stop && test_Performance_TeXInput_SvgOutput() ) {
}
}
The additional static imports should resolve to:
import static java.lang.System.nanoTime;
import static java.util.concurrent.TimeUnit.MINUTES;
import static java.util.concurrent.TimeUnit.NANOSECONDS;
The unit test is updated and we’re ready to run the profiler.
Run Unit Test
Verify that the unit test can be run by executing the following commands:
cd $HOME/dev/java/mdtexfx
./gradlew clean build test
java -cp build/libs/mdtexfx.jar:build/classes/java/test \
com.mdtexfx.processors.tex.TeXProcessorTest
Stop the unit test by pressing Control+c
.
Profile Unit Test
Complete the following steps to profile our test harness:
- Open a new terminal to run the unit test and profiler at the same time.
- Re-run the Java command from the previous section to start converting TeX to SVG.
- Start the profiler as follows:
~/bin/async-profiler/profiler.sh -d 120 jps > profile.txt
The -d 120
command-line argument instructs the profiler to sample the application for 120 seconds; the jps
argument tells the profiler to use the Java process command to find the pid of the most recently run Java application. Using > profile.txt
directs the command’s output to a file named profile.txt
, which contains the profiling results. Let’s look inside to begin our analysis.
Analyze Results
Open profile.txt
then search for the word “percent” to find the list of application hot-spots. The list should begin similar to the following:
ns percent samples top
---------- ------- ------- ---
14979296619 11.38% 1498 java.text.DecimalFormat.subformatNumber
12850404499 9.76% 1285 java_lang_Throwable::fill_in_stack_trace(...)
8279754791 6.29% 828 java.util.Arrays.copyOf
Together, over 27% of the test is spent formatting numbers, filling stack traces, and copying arrays. Given that our test doesn’t purposefully throw any stack traces, that particular result seems strange. Array copies are likely string copies. Let’s start our sleuthing at the DecimalFormat
class because that’s the biggest bottleneck.
Investigate DecimalFormat
Jump to the top of profile.txt
then search for “DecimalFormat” to find one of its call stack invocations, which will look something like the following:
--- 2679876934 ns (2.04%), 268 samples
[ 0] java.text.DecimalFormat.subformatNumber
[ 1] java.text.DecimalFormat.subformat
[ 2] java.text.DecimalFormat.doubleSubformat
[ 3] java.text.DecimalFormat.format
[ 4] java.text.DecimalFormat.format
[ 5] java.text.NumberFormat.format
[ 6] org.jfree.graphics2d.svg.SVGGraphics2D.transformDP
[ 7] org.jfree.graphics2d.svg.SVGGraphics2D.getSVGTransform
[ 8] org.jfree.graphics2d.svg.SVGGraphics2D.drawString
[ 9] be.ugent.caagt.jmathtex.CharBox.draw
[10] be.ugent.caagt.jmathtex.HorizontalBox.draw
[11] be.ugent.caagt.jmathtex.TeXIcon.paintIcon
Return to IntelliJ IDEA, open the SVGGraphics2D
class, then search for the transformDP
method. Note that the following line invokes Java’s number formatter:
return transformFormat.format(d);
Control-click the transformFormat
variable to navigate to its declaration:
private DecimalFormat transformFormat;
From the call stack, this confirms that every time part of a glyph is drawn, the transformDP
method is performing an expensive operation. The DecimalFormat
class is useful for converting floating point numbers to character strings when taking internationalization into consideration. Knowing that SVG is read and parsed by a machine, it’s computationally expensive to use Java’s built-in functionality for no practical benefit.
It so happens that all Java-based SVG transformers suffer from this particular performance problem, meaning that a library swap won’t help much, if at all. What we need is a faster number conversion algorithm. In 2018, Ulf Adams published a paper titled Ryū: fast float-to-string conversion, along with benchmarks showing a 12-fold improvement over the JDK’s number formatter. Moreover, the source code was made available under the permissive Apache 2.0 License.
While we’re looking at the SVGGraphics2D
class, notice the code for getSVGElement
around line 2731 creates multiple StringBuilder
class instances. Similarly, the draw
method starting at around line 1177 creates numerous StringBuilder
class objects, one for nearly each call to getSVGTransform
and getSVGPathData
. Looking a little deeper, we can see that getSVGPathData
is appending a lot of strings. This is likely the source of the third performance bottleneck identified by the profiler: copying arrays. For example, the following snippet is fairly representational of the string-copying code:
b.append("Q ").append(geomDP(coords[0]))
.append(" ").append(geomDP(coords[1]))
.append(" ").append(geomDP(coords[2]))
.append(" ").append(geomDP(coords[3]));
A minor improvement would be to use a space character (' '
) instead of a space string (" "
) to avoid the overhead of checking string lengths. A bigger gain would be to pre-allocate a single internal string buffer, rather than continuously create new buffers. At the top of our offender list, though, is that the geomDP
and transformDP
methods must avoid formatting numbers using Java’s bundled DecimalFormat
class.
Our findings have revealed that the JFreeSVG library has room to be optimized. Furthermore, the library has many features that go beyond our needs by writing many irrelevant strings to the SVG document. Conceptually, we want to convert glyph paths from a font format to paths in an SVG format. We haven’t any ovals, polygons, fill patterns, gradient colours, or special namespaces to worry about, giving us further optimization possibilities.
We’ll fix this issue after we look into the second bottleneck: filling stack traces.
Investigate Stack Trace Fills
Before we dig into the details, let’s revisit the purpose of an exception. According to the official Java documentation, an exception is:
"an event, which occurs during the execution of a program, that disrupts the normal flow of the program’s instructions."
That is, exceptions are meant to indicate a problem was encountered that wasn’t expected. When opening a file, for example, we anticipate that the file’s contents will be displayed. There are many ways that opening a file could fail: the file was a directory, its permissions were set to write-only, no more operating system file handles were available, it was deleted, the network connection to the file path dropped, there was data corruption, contents were an unknown file format, and so on. All of these are exceptions to the normal flow that prevent displaying the file contents.
Let’s go back to profile.txt
. Isolate the cause of the stack trace fills by searching for fill_in_stack_trace
, starting from the top of the file. We find the following:
--- 4000181209 ns (3.04%), 400 samples
[ 0] java_lang_Throwable::fill_in_stack_trace(...)
[ 1] java_lang_Throwable::fill_in_stack_trace(...)
[ 2] JVM_FillInStackTrace
[ 3] Java_java_lang_Throwable_fillInStackTrace
[ 4] java.lang.Throwable.fillInStackTrace
[ 5] java.lang.Throwable.fillInStackTrace
[ 6] java.lang.Throwable.<init>
[ 7] java.lang.Exception.<init>
[ 8] java.lang.RuntimeException.<init>
[ 9] be.ugent.caagt.jmathtex.JMathTeXException.<init>
[10] be.ugent.caagt.jmathtex.SymbolNotFoundException.<init>
[11] be.ugent.caagt.jmathtex.SymbolAtom.get
[12] be.ugent.caagt.jmathtex.TeXFormula.processEscape
[13] be.ugent.caagt.jmathtex.TeXFormula.parse
[14] be.ugent.caagt.jmathtex.TeXFormula.<init>
[15] be.ugent.caagt.jmathtex.TeXFormula.<init>
Judging from the call stack, it appears as though when processing an escape—a control sequence—if the symbol cannot be found then an exception is thrown. What makes a stack trace unusual is that we know all the symbols were found because the SVG rendered as expected. We’ll have to root around some more to understand the reason for the stack trace.
Open the TeXFormula
class and jump down to the processEscape
method starting around line 534. Here we find some trivial issues:
- Using a default buffer size (
new StringBuffer()
) will cause array copies to be performed if parsing extends beyond 16 characters. - Calling
parseString.length()
each iteration incurs a bitshift and conditional; it’s faster to store loop invariants—such as the string’s length—outside the loop.
Neither of these are a silver bullet. When we take a step back, the algorithm for the processEscape
method appears to be performing the following steps:
- If the parser encounters a backslash (
\
), process the escape command. - Create a new (empty) buffer.
- Begin parsing the control sequence.
- If the next character is a valid control sequence character, append it to a buffer.
- Convert the mutable buffer to an immutable string.
- Look up the “atom” in a symbol table.
- If the “atom” cannot be found, see if it’s a predefined formula.
- Terminate when the end of the control sequence is found.
We know the culprit is the SymbolNotFoundException
class because of the profiler output. When we look at the code for SymbolAtom.get(String)
we can see where the exception is being thrown:
Object obj = symbols.get(name);
if (obj == null) {
throw new SymbolNotFoundException(name);
} else {
return (SymbolAtom)obj;
}
Now it’s a little clearer why the code is slow. For each character in a control sequence, if the control sequence does not yet map to a known value, an exception is thrown, which causes the JVM to fill in a complete stack trace. If the input expression has the control sequence \partial
, for example, then twelve stack traces are filled: six for the unmatched atom and six for the unmatched predefined formula.
In short, the main performance problem with the library is that exceptions are being used for normal control flow. Incomplete control sequences while parsing are to be expected and should not be treated as exceptional. To fix this, the parser could read the full control sequence before checking to see if it’s a known value (i.e., an atom or a predefined formula).
Optimize
The low-level technical details regarding how to implement these optimizations is not as important as the concepts and tools we’ve applied to find the bottlenecks in our application. To wit, I forked the JMathTeX library to include the aforementioned optimizations and brought in numerous font improvements courtesy of JLaTeXMath. Let’s swap out the original JMathTeX library with the forked version to see the performance difference.
Replace TeX Engine
First, complete the following steps to replace the TeX engine:
- Clone, build, then replace the engine.
cd $HOME/dev/java rm Gradlepaths.app}}/libs/JMathTeX*.jar git clone https://github.com/DaveJarvis/JMathTeX jmathtex cd jmathtex ./gradlew clean build mv ./build/libs/jmathtex.jar ../mdtexfx/libs
- In the IDE, reload the changes to the build file.
- Open the
TeXProcessor
class. - Remove
import be.ugent.caagt.jmathtex.TeXFormula;
.
At this point IntelliJ IDEA will attempt to locate the new TeXFormula
class. If errors continue to be unresolved, invalidate caches and restart the IDE, which may be found under the File menu.
Replace the last lines of the toSvg
method in the TeXProcessor
class, starting from SVGGraphics2D with the following snippet (note that SVGGraphics2D is from JFreeSVG while SvgGraphics2D is from the JMathTeX fork):
final var graphics = new SvgGraphics2D();
graphics.initialize( width, height );
icon.paintIcon( sComponent, graphics, 0, 0 );
return graphics.toString();
This swaps JFreeSVG’s SVG renderer for an implementation that uses the Ryū algorithm, among other performance improvements. The IDE should deduce the necessary imports, again, automatically. (Aside, we can remove org.jfree:jfreesvg
from the build file.)
Remember to remove the import for the SVGGraphics2D
class to avoid a compiler error.
Replace Fonts
Next, we have to revise the font file name references. Start by opening the HtmlPreview
class. There’s some duplication we can address in how the font faces are loaded. Replace the following lines:
format( "data:,%s%s%s%s",
toFontFace( "cmex10" ),
toFontFace( "cmmi10" ),
toFontFace( "cmr10" ),
toFontFace( "cmsy10" )
) );
with the following snippet that brings in the new font names:
toDataUrl(
"jlm_cmex10",
"jlm_cmmi10",
"jlm_cmr10",
"jlm_cmsy10",
"jlm_msam10",
"jlm_msbm10"
)
Then define the toDataUrl
method as follows:
private String toDataUrl( final String... names ) {
return format( "data:," + "%s".repeat( names.length ),
Arrays.stream( names ).map( this::toFontFace ).toArray() );
}
This reduces the multiple toFontFace
calls into a mapped function call (this::toFontFace
) that is applied to all values provided by the names
parameter. In so doing, we’ve abstracted away how a list of font names becomes a data:
URL. Calling repeat
means the computer will count the correct number of %s
format specifiers on our behalf based on the number of font names given.
The font file path has changed; update the toFontFace
method by replacing the path in the format
call as follows:
format( "/fonts/cm/%s.ttf", name )
The fonts are updated.
Re-run Profiler
Re-build, then re-run the unit test and profiler as before. Open the profile.txt
file to review where the CPU is spending most of its time:
ns percent samples top
---------- ------- ------- ---
68679123646 50.44% 6868 ...RyuDouble.doubleToString
15170246750 11.14% 1517 java.util.Arrays.copyOf
Previously, the number format call occupied about 11% of the processing time, but now it takes about 50%. The more time spent converting font glyph path data to vector graphic strings, the faster TeX expressions are rendered.
Before the optimizations, 1 million TeX expressions took about 13 minutes to transform; after the optimizations, the time decreased to about 7 minutes, roughly twice as fast. Put another way, the software can now render around 2,000 short TeX expressions per second on a 3.3 GHz processor.
For those who like a challenge, more optimization opportunities abound, including:
- Parallel computation. Although replacing TeX code in the Markdown document with its SVG equivalent mark-up must be performed sequentially, producing those SVG snippets from TeX code could be performed using multiple threads.
- Garbage collection. Even with high-performance garbage collectors, if an application generates many short-lived objects, responsiveness will diminish. All the tools mentioned previously can help pinpoint problems with memory churn. Reducing churn improves performance.
Review Output
Run the application to confirm the output. The first TeX expression may not appear instantly due to object initialization. After warming up, the application should be able to typeset the document without any noticeable delays, and no caching required. Here’s a screenshot of our application running on the full edition of Liberica JDK:
We have accomplished our goal.
Related Resources
Free and open-source resources related to these articles include:
- KeenWrite — My Java-based desktop text editor with live preview, variable interpolation, TeX expressions, spell checking, R integration, detachable tabs courtesy of TiwulFX Dock, and many more features.
- Typesetting Markdown — My blog series that describes a toolchain to produce beautiful PDF documents from Markdown source files by using Pandoc and ConTeXt.
Together, these resources can help write documents using plain text while keeping the content separate from its final appearance.
Download
You may download the project’s final version.
Summary
Using a profiler provided us with insights into the slow parts of our application. After understanding where and why the code had sub-par performance, we optimized parts of the application that yielded the greatest improvement for the least amount of effort. Regardless of what profiler we use, the overall process remains the same: profile running software, analyze the results, trace the algorithm logic, and then think about how to mitigate the most severe bottlenecks.
Thank you for reading!