Welcome back to developing a Liberica JDK-based app for real-time conversion of mathematical formulas from Markdown to HTML. If the first part of this series was all about building an executable binary, the second one focuses on building a text editor that supports Markdown syntax and a preview pane to show the result.
Introduction
In The Pragmatic Programmer, Hunt and Thomas offer a fundamental software development principle:
"Every piece of knowledge must have a single, unambiguous, authoritative representation within a system."
Often touted as the DRY principle (don’t repeat yourself), I liken the principle to computers abhor exceptions. That is, introducing differences adds complexity, which compounds over time, leading to systems where the burden of maintenance exceeds the value returned from development efforts. Systems that are easy to maintain implement functionality in general terms rather than hard-code specific behaviour. Well-crafted software minimizes complexity by identifying generalizations of major components before development begins. Many general-purpose solutions to common design problems can be found in Design Patterns by Gamma et al.
Markdown is a specific instance of a plain text markup syntax that provides formatting hints for a computer to use when presenting a document. Software editors for drafting and previewing plain text documents typically support a single markup format, rather than the general case. Other plain text markup formats include AsciiDoc, MediaWiki, reStructuredText, and Textile. Although supporting multiple formats is beyond the scope of this series, we’ll apply the chain-of-responsibility design pattern to keep open the possibility of doing so.
The following figure shows a high-level diagram for typical Markdown editors:
A more general form is shown in the following figure:
Notice how the high-level components are similar to the previous diagram: a text editor, a text processor, and an output format. For our proposed editor, the text editor contains a text string that is fed through some machinery that produces another string.
By leveraging the aforementioned design pattern, our text editor can edit more than merely Markdown. We’ll see how using design patterns results in software that is easily extended to add more source document formats. Consider the following diagram:
In the first processor chain, a Markdown document is converted directly to HTML. That will be our focus. In the second processor chain, a MediaWiki document references variables that are first interpolated to produce a document containing the resolved interpolated values, which is subsequently converted to HTML. As we’ll see, the effort to develop a general solution is on par with a Markdown-specific solution.
You may wish to download the finished files, found at the end of the article.
Processors
The application is fueled by text processing, so we’ll develop the core software components first, as follows:
- Processor — Defines how objects chain together to transform documents.
- ExecutorProcessor — Provides a base processor chain implementation.
- ProcessorFactory — Creates processor chains capable of transforming a given document type into its final format.
- MarkdownProcessor — Transforms Markdown documents into HTML.
Let’s build these out.
Create Package
Begin where we left off by creating a new package as follows:
- Start IntelliJ IDEA.
- Open the mdtexfx project.
- Expand mdtexfx → src → main → java → com.mdtexfx in the Project panel.
- Right-click com.mdtexfx.
- Click New → Package.
- Set the value to:
com.mdtexfx.processors
. - Press
Enter
to create the package.
The package, where we’ll organize our processor-related classes, is created.
Processor
Create a new interface in the processors
package as follows:
- Right-click
processors
in the Project panel. - Click New → Java Class.
- Set Name to:
Processor
- Double-click Interface to accept its creation.
Change the interface definition to the following:
package com.mdtexfx.processors;
import java.util.Optional;
import java.util.function.UnaryOperator;
public interface Processor<T> extends UnaryOperator<T> {
default Optional<Processor<T>> next() {
return Optional.empty();
}
}
The key points are that:
- the generic type
T
will be aString
in our editor implementation; - the
UnaryOperator
communicates that we want to perform some operation—a string transformation—on an input value to compute some output value; and - returning the sentinel value of
Optional.empty()
denotes that, by default, each processor is a terminal link in the chain. Using an empty sentinel avoids introducing a null reference.
ExecutorProcessor
The application’s engine is the ExecutorProcessor
class. Create it in the processors
package having the following naïve definition:
package com.mdtexfx.processors;
import java.util.Optional;
public class ExecutorProcessor<T> implements Processor<T> {
private final Processor<T> mNext;
public ExecutorProcessor( final Processor<T> successor ) {
mNext = successor;
}
@Override
public T apply( final T data ) {
Optional<Processor<T>> handler = next();
T result = data;
while( handler.isPresent() ) {
final Processor<T> processor = handler.get();
result = processor.apply( result );
handler = processor.next();
}
return result;
}
@Override
public Optional<Processor<T>> next() {
return Optional.ofNullable( mNext );
}
}
Although this works, the loop that transforms the data does not significantly improve upon its equivalent pre-Java 8 implementation:
while( handler != null ) {
result = handler.apply( result );
handler = handler.next();
}
Replacing the null
sentinel with an Optional
one was a good first step. We can go further by eliminating the call to handler.get()
, which will be deprecated in the future. (To understand the reason for its deprecation, read the mailing list post by the method’s author, Brian Goetz; for alternatives, see JDK-8140281.) Change the method and add a new inner class as follows:
@Override
public T apply( final T data ) {
final var result = new MutableReference<>( data );
Optional<Processor<T>> handler = next();
while( handler.isPresent() ) {
handler = handler.flatMap( p -> {
result.set( p.apply( result.get() ) );
return p.next();
} );
}
return result.get();
}
private final class MutableReference {
private T mObject;
MutableReference( final T object ) {
set( object );
}
void set( final T object ) {
mObject = object;
}
T get() {
return mObject;
}
}
Note these effective changes:
- using a final
MutableReference
instance satisfies the lambda expression,p -> { ... }
, which requires final or effectively final variables; - calling
handler.get()
is now handled implicitly whereby the lambda expression receives the targetProcessor
instance using the variable p; and - invoking
flatMap
executes the lamba expression, which transforms the input using the current processor then returns the nextProcessor
link in the chain.
Implementing the loop using the Stream
class is another possibility, but be sure to measure the performance. This particular chain-of-responsibility pattern doesn’t lend itself to parallelization, so any such anticipated performance gains are nebulous as best. Similarly, using an AtomicReference
class would be less code at the expense of poorer performance and possible misunderstanding over use of a concurrent class.
To prove this, we tested a few different scenarios, which we named as follows:
- Mutable reference — The baseline
apply
method presented above. - Atomic reference — Replaces the
MutableReference
with anAtomicReference
. - Optional get() — Swaps the
flatMap
call forhandler.get()
. - Recursion — Applies the processors’ transformations recursively.
- Stream iterator — Invokes
Stream.iterate
with aflatMap
and reduce call.
The following table lists the highest scoring trial run (of three runs) for all implementations, against different numbers of processors in the chain:
Benchmark Name | Processors | Operations per μs |
---|---|---|
Mutable reference | 2 | 49.447 |
Mutable reference | 4 | 24.493 |
Mutable reference | 8 | 8.981 |
Mutable reference | 16 | 6.018 |
Mutable reference | 32 | 2.686 |
Mutable reference | 64 | 1.539 |
Atomic reference | 2 | 24.409 |
Atomic reference | 4 | 14.065 |
Atomic reference | 8 | 7.527 |
Atomic reference | 16 | 3.752 |
Atomic reference | 32 | 2.112 |
Atomic reference | 64 | 1.054 |
Optional get() | 2 | 37.256 |
Optional get() | 4 | 21.24 |
Optional get() | 8 | 10.851 |
Optional get() | 16 | 6.299 |
Optional get() | 32 | 2.645 |
Optional get() | 64 | 1.511 |
Recursion | 2 | 27.009 |
Recursion | 4 | 15.298 |
Recursion | 8 | 7.469 |
Recursion | 16 | 3.332 |
Recursion | 32 | 2.251 |
Recursion | 64 | 0.952 |
Stream iterator | 2 | 6.672 |
Stream iterator | 4 | 7.964 |
Stream iterator | 8 | 4.405 |
Stream iterator | 16 | 1.817 |
Stream iterator | 32 | 1.138 |
Stream iterator | 64 | 0.701 |
We tested the performance using the Java Microbenchmark Harness. All runs produced the same overall result: streams perform poorly, having the slowest number of operations per microsecond; and using an object wrapper yields an optimal solution for our generic processor chain.
ProcessorFactory
Different source documents have different processing chains. The responsibility of mapping document types to specific chains falls to the ProcessorFactory
class. In the same processors
package, create a new class using the following content:
package com.mdtexfx.processors;
public class ProcessorFactory {
public static Processor<String> create( final ProcessorContext context ) {
final var successor = new HtmlProcessor( context );
final var processor = switch( context.getMediaType() ) {
case UNDEFINED -> new IdentityProcessor( successor, context );
case TEXT_MARKDOWN -> new MarkdownProcessor( successor, context );
};
return new ExecutorProcessor<>( processor );
}
}
There’s a lot going on here and work to do before the code will compile. Broadly, the ProcessorFactory
class introduces a few new concepts, including:
- ProcessorContext — Guards against parameter explosion when creating
Processor
instances. Different processors have distinct requirements; when requirements change for individual processors, we want to avoid changingProcessorFactory
, in accordance with the single-responsibility principle. - IdentityProcessor — Avoids creating a special case when the input format is unknown. The processor’s name borrows from the identity function because its transformation returns the input document without modification. See the download section for the implementation.
- MediaType — Uses codes defined by the Internet Assigned Numbers Authority (IANA) that describe file formats and format contents. We know we’ll need to associate files with processor chains; leveraging standard definitions permits decoupling knowledge about file name extensions from the
ProcessorFactory
. Further, it enables writingProcessorFactory
unit tests without having to providejava.io.File
instances.
This leaves us with implementing the MarkdownProcessor and HtmlProcessor.
Gradle
Before we can implement the MarkdownProcessor, we need to instruct the compiler where it can find various libraries. Update build.gradle
by changing the dependencies section to reference Vladimir Schneider’s efficient and highly configurable flexmark-java library; since we know we’re going to need the WebView class, update the javafx section to include the javafx.web
module. Together, both sections should resemble the following:
javafx {
version = '15'
modules = ['javafx.controls', 'javafx.web']
configuration = 'compileOnly'
}
dependencies {
def v_commons_io = '2.8.0'
def v_flexmark = '0.62.2'
def v_junit = '5.6.2'
runtimeOnly "org.openjfx:javafx-controls:${javafx.version}:linux"
implementation "commons-io:commons-io:${v_commons_io}"
implementation "com.vladsch.flexmark:flexmark:${v_flexmark}"
testImplementation "org.junit.jupiter:junit-jupiter-api:${v_junit}"
testRuntimeOnly "org.junit.jupiter:junit-jupiter-engine:${v_junit}"
}
The above snippet applies the DRY principle in that the version numbers for third-party libraries are defined but once.
MarkdownProcessor
If the ExecutorProcessor is the application’s engine then the MarkdownProcessor is a wheel. With the infrastructure complete, implementing a processor that converts Markdown documents to HTML is straightforward. We need to include a Markdown parser and then implement the MarkdownProcessor
class.
Create the MarkdownProcessor
class using the following code:
package com.mdtexfx.processors;
import com.vladsch.flexmark.html.HtmlRenderer;
import com.vladsch.flexmark.parser.Parser;
import com.vladsch.flexmark.util.ast.*;
public class MarkdownProcessor extends ExecutorProcessor<String> {
private final IParse mParser = Parser.builder().build();
private final IRender mRenderer = HtmlRenderer.builder().build();
public MarkdownProcessor(
final Processor<String> successor, final ProcessorContext context ) {
super( successor );
}
@Override
public String apply( final String markdown ) {
return mRenderer.render( mParser.parse( markdown ) );
}
}
The work we did up-front has paid off with the simplicity so afforded. Developing an AsciiDoc processor or introducing our own custom processors now has an easy-to-follow template. Integrating flexmark-java introduces the following concepts:
- IParse — Parses a string containing Markdown into an abstract syntax tree (AST) represented by a node.
- IRender — Renders a node into an HTML document, represented by a string.
We may be concerned about the performance impact of transforming a Markdown document into an AST—a type of document object model—only to rebuild the HTML from a string into its own document object model. As we’ll see, this is not a bottleneck.
Next, we’ll implement the processors’ context and HTML processor, then wire everything up in the main application.
ProcessorContext
Recall how the factory design pattern helped us create a processor capable of transforming a media type (e.g., MediaType.TEXT_MARKDOWN
) into the desired output format. Another common creational design pattern is the builder design pattern. Builder classes are useful when we want to create a new instance of a class that has many possible configurations. We’ll forgo the builder pattern for the ProcessorContext
class because, at present, it takes only two parameters, implemented as follows:
public class ProcessorContext {
private final MediaType mMediaType;
private final HtmlRenderer mHtmlRenderer;
public ProcessorContext(
final MediaType mediaType, final HtmlRenderer htmlRenderer ) {
assert mediaType != null;
assert htmlRenderer != null;
mMediaType = mediaType;
mHtmlRenderer = htmlRenderer;
}
MediaType getMediaType() {
return mMediaType;
}
HtmlRenderer getHtmlRenderer() {
return mHtmlRenderer;
}
}
Accessor methods can be considered poor design because they violate information hiding by exposing internal data to calling classes. This means that if a class changes its accessor method signatures, then its calling classes will likely have to change as well, which is contrary to the single-responsibility principle. As a compromise, the accessor methods have been declared package protected to restrict their use to only classes within the same package, limiting the scope of ripple effects that may be caused when refactoring the class.
When data classes and sealed types become available, the ProcessorContext
class would be suitable candidate for rewriting as a record
, which is a restricted form of a class.
MediaType
For the ProcessorContext
class to compile, it needs MediaType
to be defined. We could import a library that defines a Markdown media type constant, but for the two constants we need, importing an entire library is excessive. Instead, add the following enumeration to a new package named com.mdtexfx.io
:
public enum MediaType {
UNDEFINED( "" ),
TEXT_MARKDOWN( "text/markdown" );
private final String mMediaType;
MediaType( final String mediaType ) {
mMediaType = mediaType;
}
public String toString() {
return mMediaType;
}
}
The reason for placing it in the .io
(input/output) package is because in the future we’ll want to associate file name extensions with media types.
HtmlProcessor
Like the MarkdownProcessor before it, the processor infrastructure investment makes writing the HtmlProcessor
class a quick endeavour:
package com.mdtexfx.processors;
import com.mdtexfx.html.HtmlRenderer;
public class HtmlProcessor extends ExecutorProcessor<String> {
private final HtmlRenderer mHtmlRenderer;
public HtmlProcessor( final ProcessorContext context ) {
mHtmlRenderer = context.getHtmlRenderer();
}
@Override
public String apply( final String html ) {
return mHtmlRenderer.render( html );
}
}
Here we’ve opted to use an as yet undefined HtmlRenderer interface that exposes a String render( String )
method signature. This approach allows the implementation details regarding how the HTML document is exported to be controlled by the class that calls into the ProcessorFactory, rather than the ProcessorFactory itself. One disadvantage is that it means an additional class and interface. A big advantage is that our unit tests won’t need to provide a graphical user interface to verify the HTML output.
HtmlRenderer & HtmlPreview
For the HtmlProcessor to function, it needs an implementation-independent mechanism to display HTML. By using an interface, we abide by the guideline of “program to an interface, not an implementation,” a design principle set out and described at length by Erich Gamma, one of the Design Patterns authors.
The HtmlRenderer
class defines a single method that hardly deserves mention:
package com.mdtexfx.html;
import javafx.scene.Parent;
import javafx.scene.web.WebView;
public interface HtmlRenderer {
String render( final String html );
}
The HtmlPreview
class is where we finally cut some code of interest:
package com.mdtexfx.html;
import javafx.scene.Parent;
import javafx.scene.web.WebView;
public class HtmlPreview extends Parent implements HtmlRenderer {
private final WebView mView;
private static final String HTML_PREFIX = ""
"
<
!DOCTYPE html >
<
html lang = 'en' > < head > < title > < /title><meta charset='utf-8'/ > < /head>
<
body >
""
";
private static final String HTML_SUFFIX = "</body></html>";
private static final int HTML_PREFIX_LENGTH = HTML_PREFIX.length();
private final StringBuilder mHtmlDocument = new StringBuilder(65536);
public HtmlPreview() {
mHtmlDocument.append(HTML_PREFIX);
mView = new WebView();
getChildren().add(mView);
}
@Override
public String render(final String html) {
mHtmlDocument.setLength(HTML_PREFIX_LENGTH);
mHtmlDocument.append(html);
mHtmlDocument.append(HTML_SUFFIX);
mView.getEngine().loadContent(mHtmlDocument.toString());
return html;
}
}
Here are the main items to notice:
- Text blocks. We leverage syntax that allows for multi-line strings, which are blocked between triple double-quotes, “””. HTML header. As a minor optimization, the HTML header is isolated and reused.
- GUI widget. The HtmlPreview class extends from Parent so that the App class can treat its HtmlPreview instance like any other JavaFX widget. This avoids having to use delegation while also completely encapsulating the
WebView
class. - Law of Demeter violation. Ideally, our code would be oblivious to WebEngine in that we could call
mView.setContent( html )
directly. Calling it via WebEngine means that ourHtmlPreview
class is coupled to WebView internals. This coupling, forced by the WebView API, violates the principle of least knowledge.
App
We can finally wire up the main application to provide a real-time preview of a Markdown document while being edited. Revise the start
method of the App class to the following:
@Override
public void start( final Stage stage ) {
final var editor = new TextArea();
final var preview = new HtmlPreview();
final var context = new ProcessorContext( TEXT_MARKDOWN, preview );
final var processor = ProcessorFactory.create( context );
final var border = new BorderPane();
border.setLeft( editor );
border.setRight( preview );
editor.textProperty().addListener( ( c, o, n ) -> processor.apply( n ) );
final var scene = new Scene( border );
stage.setScene( scene );
stage.show();
}
The key changes are:
- Introduce a BorderPane having a left-hand editor and right-hand preview.
- Create a Processor based on a hard-coded Markdown media type.
- Add a lambda expression that is called every time the text editor changes, where c, o, and n represent the ChangeListener, old editor text, and new editor text, respectively.
- Invoke the processor chain to update the HTML preview node.
Using the final
modifier liberally communicates to readers that the variables’ value are not intended to change. Languages such as Kotlin and Scala introduced keywords to make immutable variables first-class citizens. Scala’s documentation notes that variables should be final (“val”) by default to make code more like algebra and lean towards immutable systems. More immutability implies fewer issues that can arise from otherwise complex state interactions. Some developers suggest that final adds clutter, while others have expressed that it reduces cognitive load.
Regardless of coding style, when the program is run it produces a plain HTML version of the Markdown document being edited. Here’s a screen shot that shows the editor previewing the introductory text:
Download
You may download the complete project.
Summary
We’ve described a number of software development techniques, including:
- programming to interfaces;
- the factory method design pattern;
- the chain-of-responsibility design pattern;
- the Law of Demeter;
- the single-responsibility principle; and
- the DRY principle.
Additionally, we developed a maintainable code base, accomplished in part by:
- avoiding
null
assignment statements; - few conditional expressions;
- restricting accessor method scopes; and
- lots of immutable (
final
) variables.
The next article will add syntax highlighting to the text editor.