Skip to content

AshurAxelR/JParsedown

Repository files navigation

JParsedown

Lightweight Markdown Parser in Java: library or command line tool

JParsedown Library

JParsedown is a lightweight single-file library for converting Markdown to HTML format. The library is translated from Parsedown PHP library (version 1.8.0-beta-5) and preserves its features:

The library is compliant with Java 7+.

Additinoal features of JParsedown that are not (yet) available in the original Parsedown:

Download

Source file: JParsedown.java

JAR file: jparsedown-1.0.4.jar (50.8 KB)

Usage

JParsedown parsedown = new JParsedown();
System.out.println(parsedown.text("Hello _Parsedown_!")); // prints: <p>Hello <em>Parsedown</em>!</p>

You can also parse inline markdown only:

System.out.println(parsedown.line("Hello _Parsedown_!")); // prints: Hello <em>Parsedown</em>!

Security

See Parsedown Security page.

Header IDs

Github automatically generates anchor IDs for each header in Markdown file to make it easier to reference individual sections and create the table of contents. JParsedown attempts to generate the same IDs, so the itra-page links in rendered HTML page still work like on Github.

For example, ## Header IDs creates the following HTML:

<h2 id="header-ids">Header IDs</h2>

and can be referenced as follows:

[Header IDs](#header-ids)

ID generation in JParsedown follows these rules:

  1. The header text is converted to lower case.
  2. Special HTML characters like &ndash; are removed.
  3. All characters other than letters, numbers, underscore, or whitespaces are removed.
  4. Whitespaces are replaced with dashes -.
  5. ID is URL-encoded to handle Unicode letters.
  6. Duplicate IDs have a dash and a number appended: header-ids, header-ids-1, header-ids-2, etc.

Page Title Detection

JParsedown provides the title string available after calling text() method:

JParsedown parsedown = new JParsedown();
parsedown.text("# My Title\n\nMore text...");
System.out.println(parsedown.title); // prints: My Title

The string contains the best candidate for HTML page title, which is the first highest level header. For example, if the page has no level-1 header, but has several level-2 headers, the first of them will be the title.

If the page does not contain any headers, title will be null.

Note: The Markdown in the title is not stripped or processed.

MD Links Conversion

Github documentation may have links between MD files like [see other file](file.md#anchor). When converting documentation to static HTML pages, it is often desired to convert these links to respective HTML files, i.e. [see other file](file.html#anchor).

JParsedown provides a function setMdUrlReplacement(String) that tells what replacement to use for .md extensions. For example, setMdUrlReplacement(".html") will replace .md in URL links with .html.

The conversion is applied only to relative URLs, i.e. the ones that do not contain colon : character.

Use setMdUrlReplacement(null) to disable conversion (default behaviour).

Performance

Benchmark results:

test file repeat JParsedown Parsedown (PHP) flexmark-java
cheatsheet.md ×100 4.4 ms per item 5.5 ms per item (×1.25) 6.2 ms per item (×1.41)
cheatsheet.md ×1000 2.4 ms per item 5.4 ms per item (×2.25) 2.4 ms per item (×1.00)

The benchmarking does not consider saving and loading times. Only text() function is measured.

At the moment, JParsedown is not properly performance optimised. Speedup against the origial Parsedown is due to Java vs PHP performance difference. Also note how JIT really helps Java with large batches of work.

MD Tool

MD tool is a JParsedown-based command line tool for converting Markdown files into HTML pages.

See MD Tool Readme