TeXnical Writing Part 5: Performance

TeXnical Writing Part 5: Performance


April 23, 2021


BellSoft Blog Disclaimer

Earlier in this series, we reviewed in brief the TeX typesetting system as well as the state of Java-based TeX engines. We settled on integrating JMathTeX into our application. In this final part, we’ll profile our application created with Liberica JDK to determine where optimizations will reward us with the most significant performance gains.

Introduction

Previously, we used libraries to translate TeX expressions into SVG elements. You may have noticed there was a delay between completing the expression and seeing it rendered. While not a problem for a document having a few TeX expressions, as the number of expressions increases, the preview will take longer to render. Eventually, we’d have to choose between delayed (asynchronous) output or unresponsive input. In this article, we’ll profile our application while it executes on Liberica JDK so as to understand where best to direct our efforts. This is in line with advice offered by Donald Knuth:

There is no doubt that the grail of efficiency leads to abuse. Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.

Yet we should not pass up our opportunities in that critical 3%. [Good programmers] will not be lulled into complacency by such reasoning, [they] will be wise to look carefully at the critical code; but only after that code has been identified. ~ Donald E. Knuth, Structured Programming with go to Statements, ACM Computing Surveys, Vol 6, No. 4, Dec. 1974, p.268.

We don’t know the code’s performance bottlenecks until we measure; spending a lot of time on micro-optimizations is usually wasteful.

There’s a momentary blip during TeX equation rendering that we’d like to eliminate. In effect, we want to determine the “critical” path that Knuth described. We can accomplish this using a software tool—called a profiler—that gathers information about what method calls take up most of the application’s run time.

Review Profilers

Free and open-source tools to measure Java application performance include:

Another tool that can help with optimizing code is the Java Microbenchmark Harness (JMH), which is a solid piece of software geared towards building, running, and analyzing application performance down to the nanosecond if needed.

We’ll use async-profiler because it can produce useful results in a few keystrokes.

Install Profiler

Install and configure the profiler as follows:

  1. Download the latest release for your target architecture.
  2. Extract the archive into $HOME/bin/async-profiler.
  3. On Linux, configure the kernel to reveal its call stacks as follows:
  4. Run ~/bin/async-profiler/profiler.sh to verify the installation.

The profiler is ready.

List Java Processes

Java comes with a program called jps that lists the name and process identifier (pid) for all running Java applications. Here’s example output when run (your output will differ):

$ jps
972229 Launcher
972067 GradleDaemon
639188 Main
978993 Jps

The profiler can invoke the jps command to attach to the most recently launched Java process, automatically.

Update Unit Test

Writing a typical JUnit unit test annotated with @Test won’t give us a clean profile of the application’s performance for a few reasons. First, we don’t care to test the performance of JUnit itself. Second, if we ran the test using Gradle, we’d find that Gradle takes up much of processing time. Instead, we’ll update TeXProcessorTest such that we can run it from the command-line.

We want to test the functionality for converting TeX to an SVG element with as little overhead as possible. Additionally, we should provide an equation that’s a bit more complicated than Euler’s identity, such as the Klein–Gordon equation:

Open the TeXProcessorTest class then add the following method:

  static boolean test_Performance_TeXInput_SvgOutput() {
    final var processor = new TeXProcessor( null );
    final var svg = processor.apply(
      """
        $\\frac{1}{c^2}\\frac{\\partial^2}{\\partial t^2}
        \\psi-\\nabla^2\\psi+\\frac{m^2c^2}{\\hbar^2}\\psi=0$
      """
    );

    return svg.startsWith( "<svg" );
  }

This is about as close to the conversion from TeX to SVG as we can get without making the toSvg method public in the TeXProcessor class. Next, introduce a main method that loops for a while so that we can profile our performance test:

  @SuppressWarnings( "StatementWithEmptyBody" )
  public static void main( final String[] args ) {
    final long stop = nanoTime() + NANOSECONDS.convert( 5, MINUTES );
    while( nanoTime() < stop && test_Performance_TeXInput_SvgOutput() ) {
    }
  }

The additional static imports should resolve to:

import static java.lang.System.nanoTime;
import static java.util.concurrent.TimeUnit.MINUTES;
import static java.util.concurrent.TimeUnit.NANOSECONDS;

The unit test is updated and we’re ready to run the profiler.

Run Unit Test

Verify that the unit test can be run by executing the following commands:

cd $HOME/dev/java/mdtexfx
./gradlew clean build test
java -cp build/libs/mdtexfx.jar:build/classes/java/test \
com.mdtexfx.processors.tex.TeXProcessorTest

Stop the unit test by pressing Control+c.

Profile Unit Test

Complete the following steps to profile our test harness:

  1. Open a new terminal to run the unit test and profiler at the same time.
  2. Re-run the Java command from the previous section to start converting TeX to SVG.
  3. Start the profiler as follows:
    ~/bin/async-profiler/profiler.sh -d 120 jps > profile.txt
    

The -d 120command-line argument instructs the profiler to sample the application for 120 seconds; the jps argument tells the profiler to use the Java process command to find the pid of the most recently run Java application. Using > profile.txt directs the command’s output to a file named profile.txt, which contains the profiling results. Let’s look inside to begin our analysis.

Analyze Results

Open profile.txt then search for the word “percent” to find the list of application hot-spots. The list should begin similar to the following:

          ns  percent  samples  top
  ----------  -------  -------  ---
 14979296619   11.38%     1498  java.text.DecimalFormat.subformatNumber
 12850404499    9.76%     1285  java_lang_Throwable::fill_in_stack_trace(...)
  8279754791    6.29%      828  java.util.Arrays.copyOf

Together, over 27% of the test is spent formatting numbers, filling stack traces, and copying arrays. Given that our test doesn’t purposefully throw any stack traces, that particular result seems strange. Array copies are likely string copies. Let’s start our sleuthing at the DecimalFormat class because that’s the biggest bottleneck.

Investigate DecimalFormat

Jump to the top of profile.txt then search for “DecimalFormat” to find one of its call stack invocations, which will look something like the following:

--- 2679876934 ns (2.04%), 268 samples
  [ 0] java.text.DecimalFormat.subformatNumber
  [ 1] java.text.DecimalFormat.subformat
  [ 2] java.text.DecimalFormat.doubleSubformat
  [ 3] java.text.DecimalFormat.format
  [ 4] java.text.DecimalFormat.format
  [ 5] java.text.NumberFormat.format
  [ 6] org.jfree.graphics2d.svg.SVGGraphics2D.transformDP
  [ 7] org.jfree.graphics2d.svg.SVGGraphics2D.getSVGTransform
  [ 8] org.jfree.graphics2d.svg.SVGGraphics2D.drawString
  [ 9] be.ugent.caagt.jmathtex.CharBox.draw
  [10] be.ugent.caagt.jmathtex.HorizontalBox.draw
  [11] be.ugent.caagt.jmathtex.TeXIcon.paintIcon

Return to IntelliJ IDEA, open the SVGGraphics2D class, then search for the transformDP method. Note that the following line invokes Java’s number formatter:

return transformFormat.format(d);            

Control-click the transformFormat variable to navigate to its declaration:

private DecimalFormat transformFormat;

From the call stack, this confirms that every time part of a glyph is drawn, the transformDP method is performing an expensive operation. The DecimalFormat class is useful for converting floating point numbers to character strings when taking internationalization into consideration. Knowing that SVG is read and parsed by a machine, it’s computationally expensive to use Java’s built-in functionality for no practical benefit.

It so happens that all Java-based SVG transformers suffer from this particular performance problem, meaning that a library swap won’t help much, if at all. What we need is a faster number conversion algorithm. In 2018, Ulf Adams published a paper titled Ryū: fast float-to-string conversion, along with benchmarks showing a 12-fold improvement over the JDK’s number formatter. Moreover, the source code was made available under the permissive Apache 2.0 License.

While we’re looking at the SVGGraphics2D class, notice the code for getSVGElement around line 2731 creates multiple StringBuilder class instances. Similarly, the draw method starting at around line 1177 creates numerous StringBuilder class objects, one for nearly each call to getSVGTransform and getSVGPathData. Looking a little deeper, we can see that getSVGPathData is appending a lot of strings. This is likely the source of the third performance bottleneck identified by the profiler: copying arrays. For example, the following snippet is fairly representational of the string-copying code:

b.append("Q ").append(geomDP(coords[0]))
  .append(" ").append(geomDP(coords[1]))
  .append(" ").append(geomDP(coords[2]))
  .append(" ").append(geomDP(coords[3]));

A minor improvement would be to use a space character (' ') instead of a space string (" ") to avoid the overhead of checking string lengths. A bigger gain would be to pre-allocate a single internal string buffer, rather than continuously create new buffers. At the top of our offender list, though, is that the geomDP and transformDP methods must avoid formatting numbers using Java’s bundled DecimalFormat class.

Our findings have revealed that the JFreeSVG library has room to be optimized. Furthermore, the library has many features that go beyond our needs by writing many irrelevant strings to the SVG document. Conceptually, we want to convert glyph paths from a font format to paths in an SVG format. We haven’t any ovals, polygons, fill patterns, gradient colours, or special namespaces to worry about, giving us further optimization possibilities.

We’ll fix this issue after we look into the second bottleneck: filling stack traces.

Investigate Stack Trace Fills

Before we dig into the details, let’s revisit the purpose of an exception. According to the official Java documentation, an exception is:

an event, which occurs during the execution of a program, that disrupts the normal flow of the program’s instructions.

That is, exceptions are meant to indicate a problem was encountered that wasn’t expected. When opening a file, for example, we anticipate that the file’s contents will be displayed. There are many ways that opening a file could fail: the file was a directory, its permissions were set to write-only, no more operating system file handles were available, it was deleted, the network connection to the file path dropped, there was data corruption, contents were an unknown file format, and so on. All of these are exceptions to the normal flow that prevent displaying the file contents.

Let’s go back to profile.txt. Isolate the cause of the stack trace fills by searching for fill_in_stack_trace, starting from the top of the file. We find the following:

--- 4000181209 ns (3.04%), 400 samples
  [ 0] java_lang_Throwable::fill_in_stack_trace(...)
  [ 1] java_lang_Throwable::fill_in_stack_trace(...)
  [ 2] JVM_FillInStackTrace
  [ 3] Java_java_lang_Throwable_fillInStackTrace
  [ 4] java.lang.Throwable.fillInStackTrace
  [ 5] java.lang.Throwable.fillInStackTrace
  [ 6] java.lang.Throwable.<init>
  [ 7] java.lang.Exception.<init>
  [ 8] java.lang.RuntimeException.<init>
  [ 9] be.ugent.caagt.jmathtex.JMathTeXException.<init>
  [10] be.ugent.caagt.jmathtex.SymbolNotFoundException.<init>
  [11] be.ugent.caagt.jmathtex.SymbolAtom.get
  [12] be.ugent.caagt.jmathtex.TeXFormula.processEscape
  [13] be.ugent.caagt.jmathtex.TeXFormula.parse
  [14] be.ugent.caagt.jmathtex.TeXFormula.<init>
  [15] be.ugent.caagt.jmathtex.TeXFormula.<init>

Judging from the call stack, it appears as though when processing an escape—a control sequence—if the symbol cannot be found then an exception is thrown. What makes a stack trace unusual is that we know all the symbols were found because the SVG rendered as expected. We’ll have to root around some more to understand the reason for the stack trace.

Open the TeXFormula class and jump down to the processEscape method starting around line 534. Here we find some trivial issues:

  • Using a default buffer size (new StringBuffer()) will cause array copies to be performed if parsing extends beyond 16 characters.
  • Calling parseString.length() each iteration incurs a bitshift and conditional; it’s faster to store loop invariants—such as the string’s length—outside the loop.

Neither of these are a silver bullet. When we take a step back, the algorithm for the processEscape method appears to be performing the following steps:

  1. If the parser encounters a backslash (\), process the escape command.
  2. Create a new (empty) buffer.
  3. Begin parsing the control sequence.
  4. If the next character is a valid control sequence character, append it to a buffer.
  5. Convert the mutable buffer to an immutable string.
  6. Look up the “atom” in a symbol table.
  7. If the “atom” cannot be found, see if it’s a predefined formula.
  8. Terminate when the end of the control sequence is found.

We know the culprit is the SymbolNotFoundException class because of the profiler output. When we look at the code for SymbolAtom.get(String) we can see where the exception is being thrown:

Object obj = symbols.get(name);
if (obj == null) {
  throw new SymbolNotFoundException(name);
} else {
  return (SymbolAtom)obj;
}

Now it’s a little clearer why the code is slow. For each character in a control sequence, if the control sequence does not yet map to a known value, an exception is thrown, which causes the JVM to fill in a complete stack trace. If the input expression has the control sequence \partial, for example, then twelve stack traces are filled: six for the unmatched atom and six for the unmatched predefined formula.

In short, the main performance problem with the library is that exceptions are being used for normal control flow. Incomplete control sequences while parsing are to be expected and should not be treated as exceptional. To fix this, the parser could read the full control sequence before checking to see if it’s a known value (i.e., an atom or a predefined formula).

Optimize

The low-level technical details regarding how to implement these optimizations is not as important as the concepts and tools we’ve applied to find the bottlenecks in our application. To wit, I forked the JMathTeX library to include the aforementioned optimizations and brought in numerous font improvements courtesy of JLaTeXMath. Let’s swap out the original JMathTeX library with the forked version to see the performance difference.

Replace TeX Engine

First, complete the following steps to replace the TeX engine:

  1. Clone, build, then replace the engine.
    cd $HOME/dev/java
    rm Gradlepaths.app}}/libs/JMathTeX*.jar
    git clone https://github.com/DaveJarvis/JMathTeX jmathtex
    cd jmathtex
    ./gradlew clean build
    mv ./build/libs/jmathtex.jar ../mdtexfx/libs
    
  2. In the IDE, reload the changes to the build file.
  3. Open the TeXProcessor class.
  4. Remove import be.ugent.caagt.jmathtex.TeXFormula;.

At this point IntelliJ IDEA will attempt to locate the new TeXFormula class. If errors continue to be unresolved, invalidate caches and restart the IDE, which may be found under the File menu.

Replace the last lines of the toSvg method in the TeXProcessor class, starting from SVGGraphics2D with the following snippet (note that SVGGraphics2D is from JFreeSVG while SvgGraphics2D is from the JMathTeX fork):

final var graphics = new SvgGraphics2D();
graphics.initialize( width, height );
icon.paintIcon( sComponent, graphics, 0, 0 );
return graphics.toString();

This swaps JFreeSVG’s SVG renderer for an implementation that uses the Ryū algorithm, among other performance improvements. The IDE should deduce the necessary imports, again, automatically. (Aside, we can remove org.jfree:jfreesvg from the build file.)

Remember to remove the import for the SVGGraphics2D class to avoid a compiler error.

Replace Fonts

Next, we have to revise the font file name references. Start by opening the HtmlPreview class. There’s some duplication we can address in how the font faces are loaded. Replace the following lines:

format( "data:,%s%s%s%s",
  toFontFace( "cmex10" ),
  toFontFace( "cmmi10" ),
  toFontFace( "cmr10" ),
  toFontFace( "cmsy10" )
) );

with the following snippet that brings in the new font names:

toDataUrl(
  "jlm_cmex10",
  "jlm_cmmi10",
  "jlm_cmr10",
  "jlm_cmsy10",
  "jlm_msam10",
  "jlm_msbm10"
)

Then define the toDataUrl method as follows:

private String toDataUrl( final String... names ) {
  return format( "data:," + "%s".repeat( names.length ),
                 Arrays.stream( names ).map( this::toFontFace ).toArray() );
}

This reduces the multiple toFontFace calls into a mapped function call (this::toFontFace) that is applied to all values provided by the names parameter. In so doing, we’ve abstracted away how a list of font names becomes a data: URL. Calling repeat means the computer will count the correct number of %s format specifiers on our behalf based on the number of font names given.

The font file path has changed; update the toFontFace method by replacing the path in the format call as follows:

format( "/fonts/cm/%s.ttf", name )

The fonts are updated.

Re-run Profiler

Re-build, then re-run the unit test and profiler as before. Open the profile.txt file to review where the CPU is spending most of its time:

          ns  percent  samples  top
  ----------  -------  -------  ---
 68679123646   50.44%     6868  ...RyuDouble.doubleToString
 15170246750   11.14%     1517  java.util.Arrays.copyOf

Previously, the number format call occupied about 11% of the processing time, but now it takes about 50%. The more time spent converting font glyph path data to vector graphic strings, the faster TeX expressions are rendered.

Before the optimizations, 1 million TeX expressions took about 13 minutes to transform; after the optimizations, the time decreased to about 7 minutes, roughly twice as fast. Put another way, the software can now render around 2,000 short TeX expressions per second on a 3.3 GHz processor.

For those who like a challenge, more optimization opportunities abound, including:

  • Parallel computation. Although replacing TeX code in the Markdown document with its SVG equivalent mark-up must be performed sequentially, producing those SVG snippets from TeX code could be performed using multiple threads.
  • Garbage collection. Even with high-performance garbage collectors, if an application generates many short-lived objects, responsiveness will diminish. All the tools mentioned previously can help pinpoint problems with memory churn. Reducing churn improves performance.

Review Output

Run the application to confirm the output. The first TeX expression may not appear instantly due to object initialization. After warming up, the application should be able to typeset the document without any noticeable delays, and no caching required. Here’s a screenshot of our application running on the full edition of Liberica JDK:

Output

We have accomplished our goal.

Free and open-source resources related to these articles include:

  • KeenWrite — My Java-based desktop text editor with live preview, variable interpolation, TeX expressions, spell checking, R integration, detachable tabs courtesy of TiwulFX Dock, and many more features.
  • Typesetting Markdown — My blog series that describes a toolchain to produce beautiful PDF documents from Markdown source files by using Pandoc and ConTeXt.

Together, these resources can help write documents using plain text while keeping the content separate from its final appearance.

Download

You may download the project’s final version.

Summary

Using a profiler provided us with insights into the slow parts of our application. After understanding where and why the code had sub-par performance, we optimized parts of the application that yielded the greatest improvement for the least amount of effort. Regardless of what profiler we use, the overall process remains the same: profile running software, analyze the results, trace the algorithm logic, and then think about how to mitigate the most severe bottlenecks.

Thank you for reading!

Author image

Dave Jarvis

Senior Software Developer, Special for BellSoft

BellSoft LTD [email protected] BellSoft LTD logo Liberica Committed to Freedom 199 Obvodnogo Kanala Emb. 190020 St. Petersburg RU +7 812-336-35-67 BellSoft LTD 199 Obvodnogo Kanala Emb. 190020 St. Petersburg RU +7 812-336-35-67 BellSoft LTD 111 North Market Street, Suite 300 CA 95113 San Jose US +1 702 213-59-59