Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems and future plans on token system #75

Open
otakustay opened this issue Jan 13, 2020 · 0 comments
Open

Problems and future plans on token system #75

otakustay opened this issue Jan 13, 2020 · 0 comments
Labels

Comments

@otakustay
Copy link
Owner

Token system is a key feature of react-diff-view and is the source of its powerfulness.

I looked into current implementation of token system recently and discovered some vulnerabilities about it, mostly about performance.

Too many clones

To ensure token system works safely by default, we introduce a clone function to copy an entire path, that is clone every node inside this path. This function is used in almost every key functions like normalizeToLines and other manipulation utilities.

Actually, while we guarantee that token node is immutable, we can get rid of all these clones, boosting performance considerably.

In current situation we don't have a statement saying "you cannot mutate token node" and removing clone operations can introduce some potential breaking changes, but I think it should be safe in most cases.

Too many slices

Previously we represent a path as an array of token nodes, the last node in this array is always a text node:

[
  {type: 'element'},
  {type: 'element'},
  {type: 'mark'},
  {type: 'text', value: 'Hello World'}
]

In nearly all cases when we need to manipulate this path, we were not changing the text, we frequently insert a node before text node, the implement could be:

const parents = path.slice(0, -1);
const text = path[path.length - 1];
const newPath = [...parents, customNode, text];

We notice a .slice(0, -1) inside this code which can cause some dropdown on performance.

To optimize this, we can separate node path and its containing text into a tuple like:

[
  [
    {type: 'element'},
    {type: 'element'},
    {type: 'mark'},
  ],
  'Hello World'
]

We can insert a node before text more efficiently:

const [parents, text] = path;
const newPath = [[...parents, customNode], text];

In this solution we also unwrap text node into a single string to reduce the creation and garbage collection of node objects.

This is a breaking change since any custom enhancers written previously will fail to work on the new data structure.

Too dynamic nodes

For the lifetime of react-diff-view, we didn't have a strictly defined type for token node, that is, users can put any properties into a node and use them in custom renderToken function.

This is easy to use, but a dynamic node causes more than expected node clones and a poor performance node merge algorithm.

We decide to later define the node more strictly, leaving only type and a dynamic properties for it, the clone and compare can be much simpler:

const cloneNode = node => {
  return {
    type: node.type,
    properties: node.properties,
  }
};

const areNodesEquals = (x, y) => (x.type === y.type && shallowEquals(x.properties, y.properties));

This is also a breaking change since a more strict structure is forced.

Reuse token system

As a result of performance inspection and reusability consideration, I developed a tiny library source-tokenizer to handle these, on the initial implement we get about 25% performance boost:

highlighting a json file with 11029 lines, 10 times
legacy: 22.384s
current: 16.038s

I'm preparing to develop a react-source-view component based on this library to render source codes, react-diff-view which focused on render diff contents will have a major version to use this library in 2020, an upgrade guide will be published then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant