TeXnical Writing Part 4: Math

TeXnical Writing Part 4: Math


March 26, 2021


BellSoft Blog Disclaimer

Welcome back! In the previous part of this series we developed a plain Markdown editor and preview panel; in this part we’re going to focus on rendering equations and formulas in our application based on Liberica JDK.

Before we jump in, let’s review a brief history of mathematical typesetting followed by some important terminology that we’ll encounter later on in this article.

Introduction

Before computer-based typesetting, much of mathematics was put to page by hand. Professional typesetters, who were often expensive and usually not mathematicians, would inadvertently introduce typographic errors into equations. Phototypesetting technology improved upon hand-typesetting, but well-known computer scientist Donald Knuth—whose third volume of The Art of Computer Programming was phototypeset in 1976—expressed dissatisfaction with its typographic quality. He set himself two goals: let anyone create high-quality books without much effort and provide software that typesets consistently on all capable computers. Two years later, he released a typesetting system and a font description language: TeX and METAFONT, respectively.

In TeX, a control sequence is a backslash followed by some text; TeX defines about 900 control sequences. Example control sequences are: \frac, \", \hskip, and \input. These control sequences are further categorized into control words, control symbols, primitives, registers, macros, and more.

A macro, short for macro instruction, is a control sequence that invokes one or more control sequences to perform a task. TeX is powered by a sophisticated macro language that forms the foundation for larger TeX engines—such as LuaTeX, pdfTeX, and XeTeX—as well as for document formats like ConTeXt and LaTeX.

We’ve seen how structured plain text formats—AsciiDoc, Markdown, and similar—separate content from presentation. In practice, documents written in LaTeX typically inextricably intertwine prose with formatting; ConTeXt makes keeping the two separate much easier.

In this spirit, we’re going to integrate a subset of TeX control sequences that are not tied to a specific format (e.g., ConTeXt or LaTeX), to keep open the possibility of using any engine to typeset and stylize a document written in a structured plain text format. We only need a subset because displaying a simple equation covers the most common use case; integrating a complete TeX typesetting engine is well outside of our project’s scope.

Java TeX Engines

Having opted to develop a Java-based Markdown editor with JavaFX, using a free, open-source, pure Java solution to convert TeX macro equations into SVG images will simplify application development and distribution. At first blush, there are a number of candidate implementations. Let’s take a closer look at the high-level technical details of each to assess their suitability for inclusion.

New Typesetting System

The New Typesetting System (NTS) is a rewrite of TeX intended to be functionally compatible while offering a modular framework for experimentation and extensions. This implementation would have to be modified to export equations in a vector format. NTS produces page-oriented output for DeVice Independent (DVI) files, so we’d need a second pass to convert from DVI to SVG. Even though libraries exist for such a task, a single pass is preferred because we’d like to render thousands of equations in real time—two passes may be too slow, architecturally.

εχTeX

εχTeX is an incredible implementation that builds upon the experiences of developers involved with NTS. When asked about SVG output, the project maintainer suggested creating a new back-end:

Maybe [start with] the rudimentary [PostScript] back-end. […] I assume that the level of abstraction is too low for a clean and readable SVG. Thus I would define or redefine TeX/LaTeX/ConTeXt macros — something along the lines of the Unit modules. Instead of passing control too [deeply] into the (unfinished) typesetter, I would generate the output earlier. Or write a new typesetter…

Despite thorough documentation and being coded in idiomatic Java, writing our own typesetter from scratch would be a tremendous undertaking. As we’ll soon see, there are alternatives that require much less effort.

javaTeX

The name javaTeX was given to two different implementations:

  • javaTeX by Carsten Hammer, which provides a viewer for the output from NTS.
  • javaTeX by Timothy Murphy, which converts Knuth’s TeX implementation to Java.

Neither of these implementations will produce SVG output without modifications. Murphy states that his implementation is too slow, but that was 1998. Both Java virtual machines and computer hardware have sped up significantly since then, but let’s keep reviewing.

JMathTeX

JMathTeX, developed at Ghent University, can typeset a subset of TeX control sequences using Java’s Graphics2D API. This is promising. TeX engines, in essence, convert control sequences into symbols that are ultimately mapped to glyphs from specific fonts. Those glyphs must be drawn onto a graphics device using primitive line-drawing instructions—regardless of whether the output is a DVI file, a PostScript printer, a Graphics2D object, or an SVG document.

Intercepting calls to the Graphics2D API with an object that generates an SVG document is a one-pass solution.

JLaTeXMath

JLaTeXMath is a fork of JMathTeX that typesets equations beautifully. Unfortunately, it is bound to LaTeX, making no easy path to extract a TeX-only implementation. But, given that it’s based on JMathTeX, it means some improvements—such as font updates and bug fixes—can be back-ported.

Assessment

There are many ways to integrate TeX: run an external executable, leverage a JavaScript-based solution such as KaTeX, call a C-based implementation via the JNI, make a web service request, and so on. Let’s run with JMathTeX because it’s a pure Java solution and restricts control sequences to those that are compatible with ConTeXt, LaTeX, and other TeX formats.

TeX to SVG

As a first step towards embedding TeX, let’s write a test that proves we can convert a simple TeX string into a vector graphic string. Begin as follows:

  1. Create a libs directory under mdtexfx to store JMathTeX.
  2. Download JMathTeX-0.7pre.jar into mdtexfx/libs.
  3. Update the dependencies section inside the build.gradle file to include:
    • an older version of the JDOM library;
    • the JFreeSVG library; and
    • all .jar files in the libs directory (i.e., JMathTeX).
       implementation 'org.jdom:jdom:1.1.3'
       implementation 'org.jfree:jfreesvg:3.4'
       implementation fileTree(include: ['**/*.jar'], dir: 'libs')
      
  4. Reload the Gradle file (see the official documentation for detailed instructions).
  5. Create a new Java class named com.mdtexfx.processors.tex.TeXProcessor.
  6. Right-click the TeXProcessor class name within the editor.
  7. Select Generate → Test.
  8. Click OK to confirm.

We have everything we need now to write a unit test method that generates scalable vector graphic output from a TeX formula. Change the test class to have the following contents:

package com.mdtexfx.processors.tex;

import be.ugent.caagt.jmathtex.TeXFormula;
import org.jfree.graphics2d.svg.SVGGraphics2D;
import org.junit.jupiter.api.Test;

import java.awt.*;

import static be.ugent.caagt.jmathtex.TeXConstants.ALIGN_CENTER;
import static org.junit.jupiter.api.Assertions.assertTrue;

class TeXProcessorTest {

  @Test
  void test_Formula_TeXInput_SvgOutput() {
    final var formula = new TeXFormula( "e^{i\\pi} + 1 = 0" );
    final var icon = formula.createTeXIcon( ALIGN_CENTER, 20f );
    final var component = new Component() {};
    final var graphics =
      new SVGGraphics2D( icon.getIconWidth(), icon.getIconHeight() );
    icon.paintIcon( component, graphics, 0, icon.getIconHeight() );

    assertTrue( graphics.getSVGDocument().indexOf( "<svg" ) > 0 );
  }
}

Points of interest include:

  • new TeXFormula( "e^{i\\pi} + 1 = 0" ) — Instantiates a new TeX formula object using a class from the JMathTeX library. This is the entry point for rendering TeX.
  • createTeXIcon( ALIGN_CENTER, 20f ) — Creates a graphical icon in a font size of 20 points.
  • new Component() {} — Provides the TeX icon object with a foreground colour. An improvement to the library would be to pass in an instance of Java’s Color class.
  • SVGGraphics2D — Intercepts graphics primitive calls (e.g., draw straight lines, curved lines, polygons, etc.) to build an SVG document object model (DOM). The interception is handled by the FreeSVG library, which we’ll revisit later for performance reasons.
  • paintIcon — Draws the formula onto the provided SVGGraphics2D context. Eventually, we’ll want to replace this interface with code that’s more flexible in how TeX is rendered.
  • assertTrue — Verifies that the document contains an SVG element. As long as the TeX formula doesn’t contain the literal character sequence <svg, this test method is sufficient to prove that the integration works.

(The unit test method naming scheme is based on Roy Osherove’s Naming standards for unit tests.)

The next problem we face is weaving the SVG element into the HTML document so that it appears in the preview pane.

TeX to HTML

The de facto standard for embedding TeX within Markdown documents is to bookend the expression with dollar symbols, permitting neither leading nor trailing spaces, such as $e^{i\pi} + 1 = 0$. Enforcing a lack of spaces prevents matching dollar figures in prose (e.g., the text “$1.00 and $5.00” is not a TeX expression). Optimally, we’d create a flexmark-java extension that identifies TeX expressions then converts them to SVG format while it parses the document. Instead, we’re going to leverage the processor infrastructure already in place to pre-process TeX code. Although this means multiple passes to parse the entire document, the code will be much simpler and, therefore, both easier to maintain and understand.

Conceptually, we’re going to replace all parts of the document that match $...$ with the vectorized version of the TeX formula. For this, we’re going to borrow much of the code from the unit test and introduce a regular expression. For every matching TeX formula, we’ll change it to a vector graphic. When finished, we’ll return the supplied document having all TeX expressions replaced with <svg> elements.

Go back to the TeXProcessor class and change the code to the following:

public class TeXProcessor extends ExecutorProcessor<String> {
  private static final String REGEX = "\\$([^\s][^$]*[^\s])?\\$";
  private static final Pattern sPattern = Pattern.compile( REGEX );
  private static final Component sComponent = new Component() {};

  public TeXProcessor( final Processor<String> successor ) {
    super( successor );
  }

  private String toSvg( final String tex ) {
    final var len = tex.length();
    assert len > 3;

    final var formula = new TeXFormula( tex.substring( 1, len - 1 ) );
    final var icon = formula.createTeXIcon( ALIGN_CENTER, 20f );
    icon.setInsets( new Insets( 6, 0, 0, 4 ) );
    final var width = icon.getIconWidth();
    final var height = icon.getIconHeight();
    final var graphics = new SVGGraphics2D( width, height );
    icon.paintIcon( sComponent, graphics, 0, 0 );
    return graphics.getSVGElement();
  }

  @Override
  public String apply( String markdown ) {
    try {
      return sPattern.matcher( markdown ).replaceAll(
        ( result ) -> toSvg( result.group() ) );
    } catch( final Exception ex ) {
      return markdown;
    }
  }
}

Draw your attention to the following lines in the toSvg method:

  • extends ExecutorProcessor<String> — Reuses the processing framework already in place.
  • "\\$([^\s][^$]*[^\s])?\\$" — Defines a regular expression that matches a dollar symbol (\\$) followed by a non-whitespace character ([^\s]), followed by zero or more non-dollar symbols ([^$]*), and ending with a non-space ([^s]) that’s followed by another dollar symbol (\\$). The parentheses and question mark instruct the regular expression engine to use non-greedy matching, without which multiple TeX expressions would be treated as a single equation that spans all text between the first and last TeX expressions in the document. An exercise left for the reader is to allow escaped dollar symbols (\$) inside the TeX expression.
  • Pattern.compile( REGEX ) — Pre-compiles the regular expression, which is a minor performance optimization.
  • sComponent = new Component() {} — Defines the “foreground colour” once so it need not be recreated each time. Prefixing the variable with lowercase s is a naming scheme to denote that the variable is static and has possible side-effects; in contrast, stateless and side-effect-free constants use all uppercase.
  • new TeXFormula( tex.substring( 1, len - 1 ) ) — Removes the leading and trailing $, which we’ve asserted must be present. Also, the regular expression will ensure that toSvg is only called for valid TeX expressions.
  • graphics.getSVGElement() — Swaps the document object model for a string, which avoids including the XML prolog and SVG doctype element. Neither is needed when the SVG element is embedded in an HTML document.

The last lines of note replace all TeX expressions in the document with the result of calling toSvg on each one, effectively converting all TeX expressions to SVG elements:

return sPattern.matcher( markdown )
               .replaceAll( ( result ) -> toSvg( result.group() ) );

When the user types an incomplete (or invalid) TeX expression, the above lines will throw an exception. In that situation, the catch clause will return the original document. This means that if one TeX expression has an issue, then all the TeX expressions will not be rendered. A solution would be to iterate over all the matching TeX patterns individually, building up the output document using a series of string append calls. Another solution is to avoid using regular expressions altogether by registering a TeX processor with flexmark-java.

Most of the import statements should be determined by the IDE, automatically. For references that cannot be determined, refer to the following snippet:

package com.mdtexfx.processors.tex;

import be.ugent.caagt.jmathtex.TeXFormula;
import com.mdtexfx.processors.ExecutorProcessor;
import com.mdtexfx.processors.Processor;
import org.jfree.graphics2d.svg.SVGGraphics2D;

import java.awt.*;
import java.util.regex.Pattern;

import static be.ugent.caagt.jmathtex.TeXConstants.ALIGN_CENTER;

Let’s return to the TeXProcessorTest class and clean it up. The unit test test_Formula_TeXInput_SvgOutput had code that we pretty much copied into the TeXProcessor. We can now rewrite the test to verify that the higher-level, fully integrated functionality works fine.

Replace the test method with the following code:

@Test
void test_Formula_MarkdownInput_SvgOutput() {
  final var renderer = new HtmlCapture();
  final var context = new ProcessorContext( TEXT_MARKDOWN, renderer );
  final var processor = ProcessorFactory.create( context );
  final var html = processor.apply( "$e^{i\\pi} + 1 = 0$" );

  assertTrue( html.contains( "<svg" ) );
}

When run, the unit test fails because the ProcessorFactory class does not create a TeX processor. Fortunately, the fix entails changing a single line of code. Open ProcessorFactory and change the line that creates an HtmlProcessor to the following:

final var successor = new TeXProcessor( new HtmlProcessor( context ) );

It’s important that the TeX processing is applied after the Markdown document has been converted to HTML. To do otherwise would cause the syntax highlighter to fail when it adds style classes.

Re-run the unit test to confirm that it passes.

TeX Fonts

Even though the unit test passes, we’re not quite finished because running the application—using the full version of Liberica JDK that bundles JavaFX—then typing in an equation shows:

Application Screenshot

Clearly, more work is needed because the fonts to render the formula are not being used. JMathTeX references the Computer Modern fonts; when the library generates an SVG document, the font names are embedded verbatim. Upon rendering the SVG element, WebView cannot resolve the font file name references. As such, we need to instruct the application to load the Computer Modern (cm) font file families when rendering the HTML document using WebView. We’ll load the fonts once because reading files into memory is a computationally expensive operation. First, though, we have to determine the font paths.

The TrueType font files (e.g., cmex10.ttf) are inside the JMathTeX Java archive file. Recall from the first article that our Gradle build script will create an überjar: a combination of all the class files and resource files from every library required by the application to run. We can find the path to the font files from the command-line by invoking the jar command after building, as follows:

./gradlew clean build
jar -tvf build/libs/mdtexfx.jar | grep ttf$

The output provides us with the font file paths, which we’ll load as resources into the application:

 15396 Tue May 01 22:47:54 PDT 2007 be/ugent/caagt/jmathtex/cmex10.ttf
 27120 Tue May 01 22:47:54 PDT 2007 be/ugent/caagt/jmathtex/cmmi10.ttf
 21804 Tue May 01 22:47:54 PDT 2007 be/ugent/caagt/jmathtex/cmr10.ttf
 22132 Tue May 01 22:47:54 PDT 2007 be/ugent/caagt/jmathtex/cmsy10.ttf

Open HtmlPreview, then add the following methods:

  private String toFontFace( final String name ) {
    return format(
      "@font-face{font-family:'%s';src:url('%s') format('truetype');}",
      name, getResource( format( "/be/ugent/caagt/jmathtex/%s.ttf", name ) ) );
  }

  private String getResource( final String path ) {
    return getClass().getResource( path ).toExternalForm();
  }

In effect, the above code generates font-face definitions that adhere to the W3C CSS specification, namely:

@font-face{
  font-family:'cmex10';
  src:url('jar:file:.../mdtexfx.jar!/be/ugent/caagt/jmathtex/cmex10.ttf') format('truetype');
}

After the line of code that binds the height property in the constructor, insert the following snippet to register the fonts:

    mView.getEngine()
         .setUserStyleSheetLocation(
           format( "data:,%s%s%s%s",
                   toFontFace( "cmex10" ),
                   toFontFace( "cmmi10" ),
                   toFontFace( "cmr10" ),
                   toFontFace( "cmsy10" )
           ) );

The key step is using a Data URL to inject references to font files embedded inside a Java archive file. The Java archive file itself is referenced using the jar: protocol URI scheme.

Rebuild, then restart the application. Here are a few different TeX expressions showing some of the TeX functionality that the JMathTeX library supports:

TeX Examples

We can now write TeX expressions in our editor.

Download

You may download the project; download and install the JMathTeX library separately.

Summary

This article introduced TeX and showed one way to integrate a format-agnostic TeX engine into our editor. We reviewed multiple Java-based TeX engines with specific requirements in mind, finally settling on the JMathTeX library. With help from the framework we developed previously, integrating the library meant creating a new processor to transform inline TeX expressions into SVG elements.

The next article profiles the application to improve its performance.

Author image

Dave Jarvis

Senior Software Developer, Special for BellSoft

BellSoft LTD [email protected] BellSoft LTD logo Liberica Committed to Freedom 199 Obvodnogo Kanala Emb. 190020 St. Petersburg RU +7 812-336-35-67 BellSoft LTD 199 Obvodnogo Kanala Emb. 190020 St. Petersburg RU +7 812-336-35-67 BellSoft LTD 111 North Market Street, Suite 300 CA 95113 San Jose US +1 702 213-59-59