Skip to content

Highlight code in Markdown files using tree-sitter and remark

License

Notifications You must be signed in to change notification settings

s0/remark-tree-sitter

Repository files navigation

remark-tree-sitter

Build Status Total alerts Language grade: JavaScript

Highlight code in Markdown files using tree-sitter and remark. Powered by tree-sitter-hast.

Installation

npm install remark-tree-sitter

or

yarn add remark-tree-sitter

Usage

This plugin uses the same mechanism and data as Atom for syntax highlighting, So to highlight a particular language, you need to either:

For more information on how this mechanism works, check out the documentation for tree-sitter-hast.

Any code blocks that are encountered for which there is not a matching language will be ignored.

Example

The following example is also in the examples directory and can be run directly from there. It uses @atom-languages/language-typescript to provide the TypeScript grammar and

npm install to-vfile vfile-reporter remark remark-tree-sitter remark-html @atom-languages/language-typescript

examples/example.js

const vfile = require('to-vfile')
const report = require('vfile-reporter')
const remark = require('remark')
const treeSitter = require('remark-tree-sitter')
const html = require('remark-html')

remark()
  .use(treeSitter, {
    grammarPackages: ['@atom-languages/language-typescript']
  })
  .use(html)
  .process(vfile.readSync('example.md'), (err, file) => {
    console.error(report(err || file))
    console.log(String(file))
  })

Output:

example.md: no issues found
<pre><code class="tree-sitter language-typescript"><span class="source ts"><span class="storage type function">function</span> <span class="entity name function">foo</span><span class="punctuation definition parameters begin bracket round">(</span><span class="punctuation definition parameters end bracket round">)</span> <span class="punctuation definition function body begin bracket curly">{</span>
  <span class="keyword control">return</span> <span class="constant numeric">1</span><span class="punctuation terminator statement semicolon">;</span>
<span class="punctuation definition function body end bracket curly">}</span></span></code></pre>

Atom language packages

To use an Atom language package, like any package you first need to install it using npm install or yarn add. Unfortunately most APM packages are not made available on NPM, so I've started to make some of them available under the NPM organization @atom-languages. Here's a list of packages with which languages they provide highlighting for.

API

remark.use(treeSitter, options)

Note that options is required, and either grammarPackages or grammars needs to be provided. (Both can be provided, and grammars specified in grammars will overide those loaded in grammarPackages).

options.grammarPackages

An array of all Atom language packages that should be loaded.

Example:

remark().use(treeSitter, {
    grammarPackages: ['@atom-languages/language-typescript']
  })

The language names that code blocks must then use to refer to a language is based on the filenames in the atom package. For example the above package has the files: tree-sitter-flow.cson, tree-sitter-tsx.cson, tree-sitter-typescript.cson... so this will make the languages flow, tsx and typescript available for use within code blocks.

If you want to make loaded languages available to use via different names, you can use options.languageAliases.

options.grammars

An object mapping language keys objects containing grammar and scopeMappings.

Anything specified here will overwrite the languages loaded by options.grammarPackages.

For more information on scopeMappings, check out the documentation for tree-sitter-hast.

Example:

See a working example at examples/example-grammars.js.

remark().use(treeSitter, {
    grammars: {
      typescript: {
        grammar: typescriptGrammar,
        scopeMappings: typescriptScopeMappings
      },
      'custom-language': {
        grammar: customLanguageGrammar,
        scopeMappings: customLanguageScopeMappings
      }
    }
  })

You can then use both the typescript and custom-language languages in code blocks:

```custom-language
some code
```

```typescript
let foo = 'bar';
```

If you want to make loaded languages available to use via different names, you can use options.languageAliases.

options.classWhitelist

Sometimes including the full list of classes applied by the scope mappings can be too much, and you'd like to only include those that you have stylesheets for.

To do this, you can pass in a whitelist of classes that you actually care about.

Example: The following configuration...

remark().use(treeSitter, {
    grammarPackages: ['@atom-languages/language-typescript'],
    classWhitelist: ['storage', 'numeric']
  })

...will convert the following markdown...

```typescript
function foo() {
  return 1;
}
```

...to this:

<pre><code class="tree-sitter language-typescript"><span><span class="storage">function</span> foo() {
  return <span class="numeric">1</span>;
}</span></code></pre>

options.languageAliases

TODO: options.languageAliases is not implemented yet

TODO:

  • Add unit tests for grammars option

Related