Skip to content

Version 0.60.0 Changes

Vladimir Schneider edited this page Feb 8, 2020 · 6 revisions

⚠️ Release of 0.60.0 has breaking changes due to re-organization, renaming, clean up and optimization of implementation classes.

In many cases, the extent of the changes and optimizations made it impossible to use the slow @Deprecated annotation with eventual removal. Manual changes to code will be required to migrate code.

ℹ️ If you encounter difficulty in migrating some code constructs please open an issue with details of the construct to be migrated.

Major improvements include:

  • New implementation of SegmentedSequence using binary offset tree with efficient access, storage and instantiation.

  • New SequenceBuilder class, used to create segmented sequences of arbitrary content without concern for segment ordering or whether they share a common base sequence.

    A segment which cannot be converted to an offset range from the base sequence will be converted to out of base characters, preserving the expected character sequence result.

    The builder will optimize literal characters when they match corresponding base sequence characters with special handling of spaces and EOL characters. This means that adding literal spaces and EOL characters instead of using a subsequence will result in them being efficiently replaced by segments from the original base sequence.

    For convenience, an instance of SequenceBuilder can be obtained from any based sequence through BasedSequence.getBuilder() method.

  • New LineAppendable implementation used for rendering text. Internally, the class builds a list of lines and keeps track of each line's prefix portion, allowing efficient access and manipulation of lines and prefixes in the rendered result.

    The generated BasedSequence result will result in a SegmentedSequence with offsets into the source sequence preserved, allowing mapping offsets in result to original sequence.

    The result lines are stored as separate BasedSequences to maximize preservation of original sequence offset information when the rendering rearranges the lines of the source, as in the case of formatting with reference definition sorting or MarkdownTable sorting.

  • Formatting module is now part of the core library with additional features:

    • Paragraph text wrapping to fit within the set right margin.
    • Source offset tracking from formatted to original markdown
    • Table formatting includes sorting by columns and transpose table methods

Summary

  • Major reorganization and code cleanup of implementation
    • Formatter implementation is now part of core implementation in flexmark module

    • Formatter improved with more options including wrapping text to margins.

      • added ability to track and map source offset(s) to their index in formatted sequence. This feature allows editor caret position preservation across formatting operation.
      • Offset tracking unified using TrackedOffset. Used by MarkdownParagraph for text wrapping and MarkdownTable for table formatting and able to handle caret position during typing and backspace editing operations which are immediately followed by formatting or the edited source.
    • Tests cleaned up to eliminate duplication and hacks

    • flexmark-test-util made reusable for other projects. Having markdown as the source code for tests is too convenient for use only in flexmark-java tests.

    • Optimized SegmentedSequence implementation using binary trees for searching segments and byte efficient segment packing. Parser performance is either slightly improved or not affected but allows using SegmentedSequences for collecting Formatter and HtmlRenderer output to track source location of all text with minimal overhead and double the performance of old implementation.

    • new implementation of LineAppendable replaces LineFormattingAppendable used for text generation in rendering:

      • uses SequenceBuilder to generate BasedSequence result with original source offsets for those character segments which come from the source. This allows round trip source tracking from Source -> AST -> Formatted Source -> Source throughout the library.

        As an added bonus using the appendable makes formatting to it 40% faster than previous implementation and 160 times more efficient in memory use. For the tests below, old implementation allocated 6GB worth of segmented sequences, new implementation 37MB. The % overhead for the new implementation is four times greater than before but that is after a 43 fold reduction in total overhead bytes, old implementation needed 342MB of overhead, new implementation 8MB.

        As a result of increased efficiency, two additional files of about 600kB each can be included in the test run and only add 0.6 sec to the formatter run time.

      Tests run on 1141 markdown files from GitHub projects and some other user samples. Largest was 256k bytes.

      Description Old SegmentedSequence New Segmented Sequence New LineAppendable
      Total wall clock time 13.896 sec 9.672 sec 8.344 sec
      Parse time 2.402 sec 2.335 sec 2.297 sec
      Formatter appendable 0.603 sec 0.602 sec 0.831 sec
      Formatter sequence builder 7.264 sec 3.109 sec 1.772 sec

      The overhead difference is significant. The totals are for all segmented sequences created during the test run of 1141 files. Parser statistics show requirements during parsing and formatting.

      Description Old Parser Old Formatter New Parser New Formatter New LineAppendable
      Bytes for characters of all segmented sequences 917,016 6,029,774,526 917,016 6,029,774,526 37,663,196
      Bytes for overhead of all segmented sequences 1,845,048 12,060,276,408 93,628 342,351,155 8,204,796
      Overhead % 201.2% 200.0% 10.2% 5.7% 21.8%

Module Reorganization

  • Break: split out generic AST utilities from flexmark-util module into separate smaller modules. com.vladsch.flexmark.util no longer contains any files, only separate utility modules with flexmark-utils module being an aggregate of all utilities modules, similar to flexmark-all

    • ast/ classes to flexmark-util-ast
    • builder/ classes to flexmark-util-builder
    • collection/ classes to flexmark-util-collection
    • data/ classes to flexmark-util-data
    • dependency/ classes to flexmark-util-dependency
    • format/ classes to flexmark-util-format
    • html/ classes to flexmark-util-html
    • mappers/ classes to flexmark-util-sequence
    • options/ classes to flexmark-util-options
    • sequence/ classes to flexmark-util-sequence
    • visitor/ classes to flexmark-util-visitor
  • Break: delete deprecated properties, methods and classes

  • Add: org.jetbrains:annotations:15.0 dependency to have @Nullable/@NotNull annotations added for all parameters. When using IntelliJ IDEA for development, it helps to have these annotations for analysis of potential problems and makes it easier to use the library with Kotlin.

  • Break: refactor and cleanup tests to eliminate duplicated code and allow easier reuse of test cases with spec example data.

  • Break: move formatter tests to flexmark-core-test module to allow sharing of formatter base classes in extensions without causing dependency cycles in formatter module.

  • Break: move formatter module into flexmark core. this module is almost always included anyway because most extension have a dependency on formatter for their custom formatting implementations. Having it as part of the core allows relying on its functionality in all modules.

  • Break: move com.vladsch.flexmark.spec and com.vladsch.flexmark.util in flexmark-test-util to com.vladsch.flexmark.test.spec and com.vladsch.flexmark.test.util respectively to respect the naming convention between modules and their packages.

  • Break: NodeVisitor implementation details have changed. If you were overriding NodeVisitor.visit(Node) in the previous version it is now final to ensure compile time error is generated. You will need to change your implementation. See javadoc comment in the NodeVisitor class for instructions.

    ℹ️ com.vladsch.flexmark.util.ast.Visitor is only needed for implementation of NodeVisitor and VisitHandler. If all anonymous implementations of VisitHandler are converted to lambdas, then imports for Visitor can be eliminated.

    • Fix: remove old visitor like adapters and implement ones based on generic classes not linked to flexmark AST node.
    • remove old base classes:
      • com.vladsch.flexmark.util.ast.NodeAdaptedVisitor see javadoc for class
      • com.vladsch.flexmark.util.ast.NodeAdaptingVisitHandler
      • com.vladsch.flexmark.util.ast.NodeAdaptingVisitor

Migrating to 0.60

IntelliJ-IDEA migration migrate flexmark-java 0_50_x to 0_60_0.xml can be used to assist in migrating from 0.50.40 to 0.60 version of the library. It will migrate class name and package changes only.

Changes to arguments and method changes have to be addressed manually.

LineFormattingAppendable

This class is renamed to LineAppendable. Implementation and subclasses are similarly renamed to remove Formatting in the class name.

All formatting flags are now prefixed with F_ and when present, select the given modification of appended text. Previously, ALLOW_LEADING_WHITESPACE and ALLOW_LEADING_EOL were inverted and setting them disabled the text modification.

  • ALLOW_LEADING_WHITESPACE is now F_TRIM_LEADING_WHITESPACE and has inverted meaning.
  • ALLOW_LEADING_EOL is now F_TRIM_LEADING_EOL and has inverted meaning.
  • CONVERT_TABS is now F_CONVERT_TABS
  • COLLAPSE_WHITESPACE is now F_COLLAPSE_WHITESPACE
  • TRIM_TRAILING_WHITESPACE is now F_TRIM_TRAILING_WHITESPACE
  • PASS_THROUGH is now F_PASS_THROUGH
  • TRIM_LEADING_WHITESPACE is now F_TRIM_LEADING_WHITESPACE
  • PREFIX_PRE_FORMATTED is now F_PREFIX_PRE_FORMATTED
  • FORMAT_ALL is now F_FORMAT_ALL

BasedSequence

This interface and the implementation classes were refactored and were reworked for efficient use with SequenceBuilder.

  • CharPredicate class is now used to provide character sets instead of CharSequence to provide consistent and efficient character tests. Methods with CharSequence arguments which were used for selecting character sets, are now CharPredicate.

    The simplest way to change the method call is to use CharPredicate.anyOf(CharSequence) to convert a character sequence to predicate.

  • some methods were renamed to better reflect their operation. In these cases the old name methods are deprecated and default implementation invokes the new methods.

SegmentedSequence

This class was renamed to SegmentedSequenceFull, which contains the old, inefficient implementation. It is not recommended that the old class be used due to its inefficient and in some cases buggy implementation.

The new SegmentedSequence is an abstract class with concrete implementation by SegmentedSequenceFull and SegmentedSequenceTree. The latter is an efficient implementation using binary search tree.

The right way to create an instance of SegmentedSequence is to use an instance of SequenceBuilder to build a sequence then use SequenceBuilder.toSequence() to return an instance of SegmentedSequenceTree if the result requires a segmented sequence or a subsequence of underlying BasedSequence if the single segment.