Skip to content
Vladimir Schneider edited this page May 1, 2023 · 32 revisions

flexmark icon logo flexmark-java

Java library for parsing and rendering Markdown text according to the CommonMark specification with many extensions. Originally written as a replacement for pegdown parser used in Markdown Navigator plugin for JetBrains line of IDEs.

Provides classes for parsing input to an abstract syntax tree of nodes (AST), visiting and manipulating nodes, and rendering to HTML. It is a rework of commonmark-java to generate AST that represents markdown source elements with full source offset details for every character in the element's AST which allows recreating the original source or using it for syntax highlighting.

It has a similar API to commonmark-java and many added extensions and enhancements to the core with the following features:

  • Small (minimal dependencies) for core library, extension dependencies vary.
  • Fast, about 30x faster than pegdown, 10x faster than intellij-markdown and only about 35%-50% slower than commonmark-java after adding source tracking and other markdown emulation modes.
  • Flexible: custom block parsers, delimiter parsers, manipulate the AST after parsing, customize HTML rendering, roll your own option for everything.
  • Extensible with tables, strikethrough, auto-links and other extensions included out of the box.
  • Complete source position tracking in the AST with all markdown source elements represented and source offset for every non-space character in the source available
  • Universal options API used to to configure parser, renderer and extensions.
  • Practically all behavior of the core parser can be modified, including disabling and replacing core block parsers and default inline processing.

Wiki Topics

Requirements

  • Java 7 or above

  • Android compatibility to be added

  • The project is on Maven: com.vladsch.flexmark

  • The core has no dependencies; for extensions, see below

    The API is still evolving to accommodate new extensions and functionality.

Feature Comparison

Feature flexmark-java commmonmark-java pegdown
Relative parse time (less is better) ✔️ 1x ✔️ 0.6x to 0.7x ❌ 25x average, 20,000x to ∞ for pathological input (2)
All source elements in the AST ✔️ ✔️ ✔️
AST elements with source position ✔️ ✔️ with some errors and idiosyncrasies
AST can be easily manipulated ✔️ AST post processing is an extension mechanism ✔️ AST post processing is an extension mechanism ❌ not an option. No node's parent information, children as List<>.
AST elements have detailed source position for all parts ✔️ ❌ only node start/end
Can disable core parsing features ✔️
Core parser implemented via the extension API ✔️ instanceOf tests for specific block parser and node classes ❌ core exposes few extension points
Easy to understand and modify parser implementation ✔️ ✔️ ❌ one massive PEG parser with complex interactions (2)
Parsing of block elements is independent from each other ✔️ (1) ✔️ (1) ❌ everything in one PEG grammar
Uniform configuration across: parser, renderer and all extensions ✔️ ❌ none beyond extension list int bit flags for core, none for extensions
Parsing performance optimized for use with extensions ✔️ ❌ parsing performance for core, extensions do what they can ❌ performance is not a feature
Feature rich with many configuration options and extensions out of the box ✔️ ❌ limited extensions, no options ✔️
Dependency definitions for processors to guarantee the right order of processing ✔️ ❌ order specified by extension list ordering, error prone ❌ not applicable, core defines where extension processing is added
(1)

pathological input of 10,000 [ parses in 11ms, 10,000 nested [ ] parse in 450ms

(2)

pathological input of 17 [ parses in 650ms, 18 [ in 1300ms

History

The motivation for creating this project was to replace pegdown as the parser for Markdown Navigator plugin for JetBrains line of IDEs. pegdown has many performance issues that cannot be resolved because of its implementation. Additionally, pegdown parsing of markdown elements interact with each other and it can only parse the full file with no ability to mark a spot beyond which no roll back should occur. For some sources this causes pegdown to go into exponential parse times and in a few cases into infinite parsing loops which could only be resolved by changing the grammar to be no longer markdown compatible.

Since I needed to implement a lot of extensions to make this parser a superset of pegdown I wanted to improve the ability of extensions to modify the behaviour of the parser to allow implementation of any markdown dialect through the extension mechanism. I also wanted to remove boiler plate code and make tests in extensions use the commonmark spec.txt format but with the addition of having the AST as part of the test so that the AST could be validated for every construct and every extension.

The end goal is to have a parser that can be easily extended to be compatible with major processor families and individual processor variations:

Despite its name, commonmark is neither a superset nor a subset of other markdown flavors. Rather, it proposes a standard, unambiguous syntax specification for the original, "core" Markdown, thus effectively introducing yet another flavor. While flexmark is by default commonmark compliant, its parser can be tweaked in various ways. The sets of tweaks required to emulate the most commonly used markdown parsers around are available in flexmark as ParserEmulationProfiles.

As the name ParserEmulationProfile implies, it is only the parser which is adjusted to the specific markdown dialect. Applying the profile does not add features beyond those available in commonmark. If you want flexmark to better emulate another processor's behavior, you have to adjust the parser and configure the extensions to support the features of the parser which you want to emulate.