Posts

TeXnical Writing Part 2: Markdown

Dec 10, 2020
Dave Jarvis
23.2

Welcome back to developing a Liberica JDK-based app for real-time conversion of mathematical formulas from Markdown to HTML. If the first part of this series was all about building an executable binary, the second one focuses on building a text editor that supports Markdown syntax and a preview pane to show the result.

Introduction

In The Pragmatic Programmer, Hunt and Thomas offer a fundamental software development principle:

      "Every piece of knowledge must have a single, unambiguous, authoritative representation within a system."

Often touted as the DRY principle (don’t repeat yourself), I liken the principle to computers abhor exceptions. That is, introducing differences adds complexity, which compounds over time, leading to systems where the burden of maintenance exceeds the value returned from development efforts. Systems that are easy to maintain implement functionality in general terms rather than hard-code specific behaviour. Well-crafted software minimizes complexity by identifying generalizations of major components before development begins. Many general-purpose solutions to common design problems can be found in Design Patterns by Gamma et al.

Markdown is a specific instance of a plain text markup syntax that provides formatting hints for a computer to use when presenting a document. Software editors for drafting and previewing plain text documents typically support a single markup format, rather than the general case. Other plain text markup formats include AsciiDoc, MediaWiki, reStructuredText, and Textile. Although supporting multiple formats is beyond the scope of this series, we’ll apply the chain-of-responsibility design pattern to keep open the possibility of doing so.

The following figure shows a high-level diagram for typical Markdown editors:

A more general form is shown in the following figure:

Notice how the high-level components are similar to the previous diagram: a text editor, a text processor, and an output format. For our proposed editor, the text editor contains a text string that is fed through some machinery that produces another string.

By leveraging the aforementioned design pattern, our text editor can edit more than merely Markdown. We’ll see how using design patterns results in software that is easily extended to add more source document formats. Consider the following diagram:

In the first processor chain, a Markdown document is converted directly to HTML. That will be our focus. In the second processor chain, a MediaWiki document references variables that are first interpolated to produce a document containing the resolved interpolated values, which is subsequently converted to HTML. As we’ll see, the effort to develop a general solution is on par with a Markdown-specific solution.

You may wish to download the finished files, found at the end of the article.

Processors

The application is fueled by text processing, so we’ll develop the core software components first, as follows:

  • Processor — Defines how objects chain together to transform documents.
  • ExecutorProcessor — Provides a base processor chain implementation.
  • ProcessorFactory — Creates processor chains capable of transforming a given document type into its final format.
  • MarkdownProcessor — Transforms Markdown documents into HTML.

Let’s build these out.

Create Package

Begin where we left off by creating a new package as follows:

  1. Start IntelliJ IDEA.
  2. Open the mdtexfx project.
  3. Expand mdtexfx → src → main → java → com.mdtexfx in the Project panel.
  4. Right-click com.mdtexfx.
  5. Click New → Package.
  6. Set the value to: com.mdtexfx.processors.
  7. Press Enter to create the package.

The package, where we’ll organize our processor-related classes, is created.

Processor

Create a new interface in the processors package as follows:

  1. Right-click processors in the Project panel.
  2. Click New → Java Class.
  3. Set Name to: Processor
  4. Double-click Interface to accept its creation.

Change the interface definition to the following:

package com.mdtexfx.processors;
import java.util.Optional;
import java.util.function.UnaryOperator;

public interface Processor<T> extends UnaryOperator<T> {
  default Optional<Processor<T>> next() {
    return Optional.empty();
  }
}

The key points are that:

  • the generic type T will be a String in our editor implementation;
  • the UnaryOperator communicates that we want to perform some operation—a string transformation—on an input value to compute some output value; and
  • returning the sentinel value of Optional.empty() denotes that, by default, each processor is a terminal link in the chain. Using an empty sentinel avoids introducing a null reference.

ExecutorProcessor

The application’s engine is the ExecutorProcessor class. Create it in the processors package having the following naïve definition:

package com.mdtexfx.processors;
import java.util.Optional;

public class ExecutorProcessor<T> implements Processor<T> {
  private final Processor<T> mNext;
  
  public ExecutorProcessor( final Processor<T> successor ) {
    mNext = successor;
  }
  
  @Override
  public T apply( final T data ) {
    Optional<Processor<T>> handler = next();
    T result = data;
    while( handler.isPresent() ) {
      final Processor<T> processor = handler.get();
      result = processor.apply( result );
      handler = processor.next();
    }
    return result;
  }
  
  @Override
  public Optional<Processor<T>> next() {
    return Optional.ofNullable( mNext );
  }
}

Although this works, the loop that transforms the data does not significantly improve upon its equivalent pre-Java 8 implementation:

while( handler != null ) {
  result = handler.apply( result );
  handler = handler.next();
}

Replacing the null sentinel with an Optional one was a good first step. We can go further by eliminating the call to handler.get(), which will be deprecated in the future. (To understand the reason for its deprecation, read the mailing list post by the method’s author, Brian Goetz; for alternatives, see JDK-8140281.) Change the method and add a new inner class as follows:

  @Override
  public T apply( final T data ) {
    final var result = new MutableReference<>( data );
    Optional<Processor<T>> handler = next();
    while( handler.isPresent() ) {
      handler = handler.flatMap( p -> {
        result.set( p.apply( result.get() ) );
        return p.next();
      } );
    }
    return result.get();
  }
  
 private final class MutableReference {
    private T mObject;
    MutableReference( final T object ) {
      set( object );
    }
    void set( final T object ) {
      mObject = object;
    }
    T get() {
      return mObject;
    }
  }

Note these effective changes:

  • using a final MutableReference instance satisfies the lambda expression, p -> { ... }, which requires final or effectively final variables;
  • calling handler.get() is now handled implicitly whereby the lambda expression receives the target Processor instance using the variable p; and
  • invoking flatMap executes the lamba expression, which transforms the input using the current processor then returns the next Processor link in the chain.

Implementing the loop using the Stream class is another possibility, but be sure to measure the performance. This particular chain-of-responsibility pattern doesn’t lend itself to parallelization, so any such anticipated performance gains are nebulous as best. Similarly, using an AtomicReference class would be less code at the expense of poorer performance and possible misunderstanding over use of a concurrent class.

To prove this, we tested a few different scenarios, which we named as follows:

  • Mutable reference — The baseline apply method presented above.
  • Atomic reference — Replaces the MutableReference with an AtomicReference.
  • Optional get() — Swaps the flatMap call for handler.get().
  • Recursion — Applies the processors’ transformations recursively.
  • Stream iterator — Invokes Stream.iterate with a flatMap and reduce call.

The following table lists the highest scoring trial run (of three runs) for all implementations, against different numbers of processors in the chain:

Benchmark NameProcessorsOperations per μs
Mutable reference249.447
Mutable reference424.493
Mutable reference88.981
Mutable reference166.018
Mutable reference322.686
Mutable reference641.539
Atomic reference224.409
Atomic reference414.065
Atomic reference87.527
Atomic reference163.752
Atomic reference322.112
Atomic reference641.054
Optional get()237.256
Optional get()421.24
Optional get()810.851
Optional get()166.299
Optional get()322.645
Optional get()641.511
Recursion227.009
Recursion415.298
Recursion87.469
Recursion163.332
Recursion322.251
Recursion640.952
Stream iterator26.672
Stream iterator47.964
Stream iterator84.405
Stream iterator161.817
Stream iterator321.138
Stream iterator640.701

We tested the performance using the Java Microbenchmark Harness. All runs produced the same overall result: streams perform poorly, having the slowest number of operations per microsecond; and using an object wrapper yields an optimal solution for our generic processor chain.

ProcessorFactory

Different source documents have different processing chains. The responsibility of mapping document types to specific chains falls to the ProcessorFactory class. In the same processors package, create a new class using the following content:

 
package com.mdtexfx.processors;

public class ProcessorFactory {
  public static Processor<String> create( final ProcessorContext context ) {
    final var successor = new HtmlProcessor( context );

    final var processor = switch( context.getMediaType() ) {
      case UNDEFINED -> new IdentityProcessor( successor, context );
      case TEXT_MARKDOWN -> new MarkdownProcessor( successor, context );
    };

    return new ExecutorProcessor<>( processor );
  }
}
 

There’s a lot going on here and work to do before the code will compile. Broadly, the ProcessorFactory class introduces a few new concepts, including:

  • ProcessorContext — Guards against parameter explosion when creating Processor instances. Different processors have distinct requirements; when requirements change for individual processors, we want to avoid changing ProcessorFactory, in accordance with the single-responsibility principle.
  • IdentityProcessor — Avoids creating a special case when the input format is unknown. The processor’s name borrows from the identity function because its transformation returns the input document without modification. See the download section for the implementation.
  • MediaType — Uses codes defined by the Internet Assigned Numbers Authority (IANA) that describe file formats and format contents. We know we’ll need to associate files with processor chains; leveraging standard definitions permits decoupling knowledge about file name extensions from the ProcessorFactory. Further, it enables writing ProcessorFactory unit tests without having to provide java.io.File instances.

This leaves us with implementing the MarkdownProcessor and HtmlProcessor.

Gradle

Before we can implement the MarkdownProcessor, we need to instruct the compiler where it can find various libraries. Update build.gradle by changing the dependencies section to reference Vladimir Schneider’s efficient and highly configurable flexmark-java library; since we know we’re going to need the WebView class, update the javafx section to include the javafx.web module. Together, both sections should resemble the following:

javafx {
  version = '15'
  modules = ['javafx.controls', 'javafx.web']
  configuration = 'compileOnly'
}

dependencies {
  def v_commons_io = '2.8.0'
  def v_flexmark = '0.62.2'
  def v_junit = '5.6.2'
  runtimeOnly "org.openjfx:javafx-controls:${javafx.version}:linux"
  implementation "commons-io:commons-io:${v_commons_io}"
  implementation "com.vladsch.flexmark:flexmark:${v_flexmark}"
  testImplementation "org.junit.jupiter:junit-jupiter-api:${v_junit}"
  testRuntimeOnly "org.junit.jupiter:junit-jupiter-engine:${v_junit}"
}

The above snippet applies the DRY principle in that the version numbers for third-party libraries are defined but once.

MarkdownProcessor

If the ExecutorProcessor is the application’s engine then the MarkdownProcessor is a wheel. With the infrastructure complete, implementing a processor that converts Markdown documents to HTML is straightforward. We need to include a Markdown parser and then implement the MarkdownProcessor class.

Create the MarkdownProcessor class using the following code:

package com.mdtexfx.processors;
import com.vladsch.flexmark.html.HtmlRenderer;
import com.vladsch.flexmark.parser.Parser;
import com.vladsch.flexmark.util.ast.*;

public class MarkdownProcessor extends ExecutorProcessor<String> {
  private final IParse mParser = Parser.builder().build();
  private final IRender mRenderer = HtmlRenderer.builder().build();
  
 public MarkdownProcessor(
      final Processor<String> successor, final ProcessorContext context ) {
    super( successor );
  }
  
  @Override
  public String apply( final String markdown ) {
    return mRenderer.render( mParser.parse( markdown ) );
  }
}

The work we did up-front has paid off with the simplicity so afforded. Developing an AsciiDoc processor or introducing our own custom processors now has an easy-to-follow template. Integrating flexmark-java introduces the following concepts:

  • IParse — Parses a string containing Markdown into an abstract syntax tree (AST) represented by a node.
  • IRender — Renders a node into an HTML document, represented by a string.

We may be concerned about the performance impact of transforming a Markdown document into an AST—a type of document object model—only to rebuild the HTML from a string into its own document object model. As we’ll see, this is not a bottleneck.

Next, we’ll implement the processors’ context and HTML processor, then wire everything up in the main application.

ProcessorContext

Recall how the factory design pattern helped us create a processor capable of transforming a media type (e.g., MediaType.TEXT_MARKDOWN) into the desired output format. Another common creational design pattern is the builder design pattern. Builder classes are useful when we want to create a new instance of a class that has many possible configurations. We’ll forgo the builder pattern for the ProcessorContext class because, at present, it takes only two parameters, implemented as follows:

public class ProcessorContext {
  private final MediaType mMediaType;
  private final HtmlRenderer mHtmlRenderer;
  
  public ProcessorContext(
      final MediaType mediaType, final HtmlRenderer htmlRenderer ) {
    assert mediaType != null;
    assert htmlRenderer != null;
    mMediaType = mediaType;
    mHtmlRenderer = htmlRenderer;
  }
  
  MediaType getMediaType() {
    return mMediaType;
  }
  
  HtmlRenderer getHtmlRenderer() {
    return mHtmlRenderer;
  }
}
 

Accessor methods can be considered poor design because they violate information hiding by exposing internal data to calling classes. This means that if a class changes its accessor method signatures, then its calling classes will likely have to change as well, which is contrary to the single-responsibility principle. As a compromise, the accessor methods have been declared package protected to restrict their use to only classes within the same package, limiting the scope of ripple effects that may be caused when refactoring the class.

When data classes and sealed types become available, the ProcessorContext class would be suitable candidate for rewriting as a record, which is a restricted form of a class.

MediaType

For the ProcessorContext class to compile, it needs MediaType to be defined. We could import a library that defines a Markdown media type constant, but for the two constants we need, importing an entire library is excessive. Instead, add the following enumeration to a new package named com.mdtexfx.io:

public enum MediaType {
  UNDEFINED( "" ),
  TEXT_MARKDOWN( "text/markdown" );
  private final String mMediaType;
  
  MediaType( final String mediaType ) {
    mMediaType = mediaType;
  }
  
  public String toString() {
    return mMediaType;
  }
}

The reason for placing it in the .io (input/output) package is because in the future we’ll want to associate file name extensions with media types.

HtmlProcessor

Like the MarkdownProcessor before it, the processor infrastructure investment makes writing the HtmlProcessor class a quick endeavour:

package com.mdtexfx.processors;
import com.mdtexfx.html.HtmlRenderer;

public class HtmlProcessor extends ExecutorProcessor<String> {
  private final HtmlRenderer mHtmlRenderer;
  public HtmlProcessor( final ProcessorContext context ) {
    mHtmlRenderer = context.getHtmlRenderer();
  }
  
  @Override
  public String apply( final String html ) {
    return mHtmlRenderer.render( html );
  }
}

Here we’ve opted to use an as yet undefined HtmlRenderer interface that exposes a String render( String ) method signature. This approach allows the implementation details regarding how the HTML document is exported to be controlled by the class that calls into the ProcessorFactory, rather than the ProcessorFactory itself. One disadvantage is that it means an additional class and interface. A big advantage is that our unit tests won’t need to provide a graphical user interface to verify the HTML output.

HtmlRenderer & HtmlPreview

For the HtmlProcessor to function, it needs an implementation-independent mechanism to display HTML. By using an interface, we abide by the guideline of “program to an interface, not an implementation,” a design principle set out and described at length by Erich Gamma, one of the Design Patterns authors.

The HtmlRenderer class defines a single method that hardly deserves mention:

 
package com.mdtexfx.html;
import javafx.scene.Parent;
import javafx.scene.web.WebView;

public interface HtmlRenderer {
  String render( final String html );
}
 

The HtmlPreview class is where we finally cut some code of interest:

 
package com.mdtexfx.html;
import javafx.scene.Parent;
import javafx.scene.web.WebView;

public class HtmlPreview extends Parent implements HtmlRenderer {
  private final WebView mView;
  private static final String HTML_PREFIX = ""
  "
  <
  !DOCTYPE html >
    <
    html lang = 'en' > < head > < title > < /title><meta charset='utf-8'/ > < /head>
    <
    body >
    ""
  ";
  
  private static final String HTML_SUFFIX = "</body></html>";
  private static final int HTML_PREFIX_LENGTH = HTML_PREFIX.length();
  private final StringBuilder mHtmlDocument = new StringBuilder(65536);
  
  public HtmlPreview() {
    mHtmlDocument.append(HTML_PREFIX);
    mView = new WebView();
    getChildren().add(mView);
  }
  
  @Override
  public String render(final String html) {
    mHtmlDocument.setLength(HTML_PREFIX_LENGTH);
    mHtmlDocument.append(html);
    mHtmlDocument.append(HTML_SUFFIX);
    mView.getEngine().loadContent(mHtmlDocument.toString());
    return html;
  }
}
 

Here are the main items to notice:

  • Text blocks. We leverage syntax that allows for multi-line strings, which are blocked between triple double-quotes, “””. HTML header. As a minor optimization, the HTML header is isolated and reused.
  • GUI widget. The HtmlPreview class extends from Parent so that the App class can treat its HtmlPreview instance like any other JavaFX widget. This avoids having to use delegation while also completely encapsulating the WebView class.
  • Law of Demeter violation. Ideally, our code would be oblivious to WebEngine in that we could call mView.setContent( html ) directly. Calling it via WebEngine means that our HtmlPreview class is coupled to WebView internals. This coupling, forced by the WebView API, violates the principle of least knowledge.

App

We can finally wire up the main application to provide a real-time preview of a Markdown document while being edited. Revise the start method of the App class to the following:

@Override
 public void start( final Stage stage ) {
   final var editor = new TextArea();
   final var preview = new HtmlPreview();
   final var context = new ProcessorContext( TEXT_MARKDOWN, preview );
   final var processor = ProcessorFactory.create( context );
   final var border = new BorderPane();
   border.setLeft( editor );
   border.setRight( preview );
   editor.textProperty().addListener( ( c, o, n ) -> processor.apply( n ) );
   final var scene = new Scene( border );
   stage.setScene( scene );
   stage.show();
 }

The key changes are:

  • Introduce a BorderPane having a left-hand editor and right-hand preview.
  • Create a Processor based on a hard-coded Markdown media type.
  • Add a lambda expression that is called every time the text editor changes, where c, o, and n represent the ChangeListener, old editor text, and new editor text, respectively.
  • Invoke the processor chain to update the HTML preview node.

Using the final modifier liberally communicates to readers that the variables’ value are not intended to change. Languages such as Kotlin and Scala introduced keywords to make immutable variables first-class citizens. Scala’s documentation notes that variables should be final (“val”) by default to make code more like algebra and lean towards immutable systems. More immutability implies fewer issues that can arise from otherwise complex state interactions. Some developers suggest that final adds clutter, while others have expressed that it reduces cognitive load.

Regardless of coding style, when the program is run it produces a plain HTML version of the Markdown document being edited. Here’s a screen shot that shows the editor previewing the introductory text:

Download

You may download the complete project.

Summary

We’ve described a number of software development techniques, including:

  • programming to interfaces;
  • the factory method design pattern;
  • the chain-of-responsibility design pattern;
  • the Law of Demeter;
  • the single-responsibility principle; and
  • the DRY principle.

Additionally, we developed a maintainable code base, accomplished in part by:

  • avoiding null assignment statements;
  • few conditional expressions;
  • restricting accessor method scopes; and
  • lots of immutable (final) variables.

The next article will add syntax highlighting to the text editor.

Subcribe to our newsletter

figure

Read the industry news, receive solutions to your problems, and find the ways to save money.

Further reading