Skip to content

Commit

Permalink
enh(parser) multi-class support in a single mode (#3081)
Browse files Browse the repository at this point in the history
Co-authored-by: Vladimir Jimenez <allejo@me.com>
  • Loading branch information
joshgoebel and allejo committed Apr 4, 2021
1 parent 3ef0f1b commit 05a2770
Show file tree
Hide file tree
Showing 10 changed files with 187 additions and 16 deletions.
8 changes: 7 additions & 1 deletion CHANGES.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,14 @@
## Version next

Parser:

- enh(parser) support multi-class matchers (#3081) [Josh Goebel][]
- enh(parser) Detect comments based on english like text, rather than keyword list [Josh Goebel][]

Grammars:

- chore(properties) disable auto-detection #3102 [Josh Goebel][]
- fix(properties) fix incorrect handling of non-alphanumeric keys #3102 [Egor Rogov][]
- enh(parser) Detect comments based on english like text, rather than keyword list [Josh Goebel][]
- enh(shell) add alias ShellSession [Ryan Mulligan][]
- enh(shell) consider one space after prompt as part of prompt [Ryan Mulligan][]
- fix(nginx) fix bug with $ and @ variables [Josh Goebel][]
Expand Down
35 changes: 32 additions & 3 deletions docs/mode-reference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -154,16 +154,46 @@ for one thing like string in single or double quotes.
begin
^^^^^

- **type**: regexp
- **type**: regexp or array of regexp

Regular expression starting a mode. For example a single quote for strings or two forward slashes for C-style comments.
If absent, ``begin`` defaults to a regexp that matches anything, so the mode starts immediately.


You can also pass an array when you need to individually highlight portions of the match with different classes:

::

{
begin: [
/function!/,
/\s+/,
hljs.IDENT_RE
],
className: {
1: "keyword",
3: "title"
},
}

This would highlight ``function!`` as a ``keyword`` while highlighting the name
of the function as ``title``. The space(s) between would be matched, but not
highlighted.

Note: Internally, each regular expression in the array becomes a capture group
inside a larger concatenated regex. *These regular expressions many NOT include
capture groups of their own yet.* If your regexes uses groups at all, they
**must** be non-capturing, i.e. ``(?:regex)``.

For more info see issue `#3095 <https://github.com/highlightjs/highlight.js/issues/3095>`_.




match
^^^^^

- **type**: regexp
- **type**: regexp or array of regexp

This is simply syntactic sugar for a ``begin`` when no ``end`` expression is
necessary. It may not be used with ``begin`` or ``end`` keys (that would make
Expand Down Expand Up @@ -532,4 +562,3 @@ handle pairs of ``/* .. */`` to correctly find the ending ``?>``::

Without ``skip: true`` every comment would cause the parser to drop out back
into the HTML mode.

31 changes: 25 additions & 6 deletions src/highlight.js
Original file line number Diff line number Diff line change
Expand Up @@ -247,10 +247,29 @@ const HLJS = function(hljs) {
}

/**
* @param {Mode} mode - new mode to start
* @param {CompiledMode} mode
* @param {RegExpMatchArray} match
*/
function startNewMode(mode) {
if (mode.className) {
function emitMultiClass(mode, match) {
let i = 1;
while (match[i]) {
const klass = language.classNameAliases[mode.className[i]] || mode.className[i];
const text = match[i];
if (klass) { emitter.addKeyword(text, klass); } else { emitter.addText(text); }
i++;
}
}

/**
* @param {CompiledMode} mode - new mode to start
* @param {RegExpMatchArray} match
*/
function startNewMode(mode, match) {
if (mode.isMultiClass) {
// at this point modeBuffer should just be the match
modeBuffer = "";
emitMultiClass(mode, match);
} else if (mode.className) {
emitter.openNode(language.classNameAliases[mode.className] || mode.className);
}
top = Object.create(mode, { parent: { value: top } });
Expand Down Expand Up @@ -340,7 +359,7 @@ const HLJS = function(hljs) {
modeBuffer = lexeme;
}
}
startNewMode(newMode);
startNewMode(newMode, match);
// if (mode["after:begin"]) {
// let resp = new Response(mode);
// mode["after:begin"](match, resp);
Expand Down Expand Up @@ -373,7 +392,7 @@ const HLJS = function(hljs) {
}
}
do {
if (top.className) {
if (top.className && !top.isMultiClass) {
emitter.closeNode();
}
if (!top.skip && !top.subLanguage) {
Expand All @@ -385,7 +404,7 @@ const HLJS = function(hljs) {
if (endMode.endSameAsBegin) {
endMode.starts.endRe = endMode.endRe;
}
startNewMode(endMode.starts);
startNewMode(endMode.starts, match);
}
return origin.returnEnd ? 0 : lexeme.length;
}
Expand Down
12 changes: 9 additions & 3 deletions src/languages/vim.js
Original file line number Diff line number Diff line change
Expand Up @@ -99,12 +99,18 @@ export default function(hljs) {
begin: /[bwtglsav]:[\w\d_]+/
},
{
className: 'function',
beginKeywords: 'function function!',
begin: [
/\b(?:function|function!)/,
/\s+/,
hljs.IDENT_RE
],
className: {
1: "keyword",
3: "title"
},
end: '$',
relevance: 0,
contains: [
hljs.TITLE_MODE,
{
className: 'params',
begin: '\\(',
Expand Down
27 changes: 27 additions & 0 deletions src/lib/ext/multi_class.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
/* eslint-disable no-throw-literal */
import * as logger from "../../lib/logger.js";
import * as regex from "../regex.js";

const MultiClassError = new Error();

/**
*
* @param {CompiledMode} mode
*/
export function MultiClass(mode) {
if (!Array.isArray(mode.begin)) return;

if (mode.skip || mode.excludeBegin || mode.returnBegin) {
logger.error("skip, excludeBegin, returnBegin not compatible with multi-class");
throw MultiClassError;
}

if (typeof mode.className !== "object") {
logger.error("className must be object");
throw MultiClassError;
}

const items = mode.begin.map(x => regex.concat("(", x, ")"));
mode.begin = regex.concat(...items);
mode.isMultiClass = true;
}
4 changes: 3 additions & 1 deletion src/lib/mode_compiler.js
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ import * as regex from './regex.js';
import { inherit } from './utils.js';
import * as EXT from "./compiler_extensions.js";
import { compileKeywords } from "./compile_keywords.js";
import { MultiClass } from "./ext/multi_class.js";

// compilation

Expand Down Expand Up @@ -284,7 +285,8 @@ export function compileLanguage(language, { plugins }) {
[
// do this early so compiler extensions generally don't have to worry about
// the distinction between match/begin
EXT.compileMatch
EXT.compileMatch,
MultiClass
].forEach(ext => ext(mode, parent));

language.compilerExtensions.forEach(ext => ext(mode, parent));
Expand Down
1 change: 1 addition & 0 deletions test/api/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -15,4 +15,5 @@ describe('hljs', function() {
require('./unregisterLanguage');
require('./starters');
require('./underscoreIdent');
require('./multiClassMatch');
});
80 changes: 80 additions & 0 deletions test/api/multiClassMatch.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
'use strict';

const hljs = require('../../build');

describe('multi-class matchers', () => {
before(() => {
const grammar = function() {
return {
contains: [
{
begin: ["a", "b", "c"],
className: {
1: "a",
3: "c"
},
contains: [
{
match: "def",
className: "def"
}
]
},
{
className: "carrot",
begin: /\^\^\^/,
end: /\^\^\^/,
contains: [
{
begin: ["a", "b", "c"],
className: {
1: "a",
3: "c"
}
}
]
},
{
match: [
/func/,
/\(\)/,
/{.*}/
],
className: {
1: "keyword",
2: "params",
3: "body"
}
}
]
};
};
hljs.registerLanguage("test", grammar);
});
after(() => {
hljs.unregisterLanguage("test");
});
it('should support begin', () => {
const code = "abcdef";
const result = hljs.highlight(code, { language: 'test' });

result.value.should.equal(`<span class="hljs-a">a</span>b<span class="hljs-c">c</span><span class="hljs-def">def</span>`);
result.relevance.should.equal(2);
});
it('basic functionality', () => {
const code = "func(){ test }";
const result = hljs.highlight(code, { language: 'test' });

result.value.should.equal(`<span class="hljs-keyword">func</span><span class="hljs-params">()</span><span class="hljs-body">{ test }</span>`);
result.relevance.should.equal(1);
});
it('works inside a classified parent mode', () => {
const code = "^^^what abc now^^^";
const result = hljs.highlight(code, { language: 'test' });

result.value.should.equal(
`<span class="hljs-carrot">^^^what ` +
`<span class="hljs-a">a</span>b<span class="hljs-c">c</span>` +
` now^^^</span>`);
})
});
2 changes: 1 addition & 1 deletion test/markup/vim/default.expect.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
<span class="hljs-keyword">set</span> autoindent

<span class="hljs-comment">&quot; switch on highlighting</span>
<span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">UnComment</span><span class="hljs-params">(fl, ll)</span></span>
<span class="hljs-keyword">function</span> <span class="hljs-title">UnComment</span><span class="hljs-params">(fl, ll)</span>
<span class="hljs-keyword">while</span> idx &gt;= <span class="hljs-variable">a:ll</span>
<span class="hljs-keyword">let</span> srclines=<span class="hljs-built_in">getline</span>(idx)
<span class="hljs-keyword">let</span> dstlines=<span class="hljs-keyword">substitute</span>(srclines, <span class="hljs-variable">b:comment</span>, <span class="hljs-string">&quot;&quot;</span>, <span class="hljs-string">&quot;&quot;</span>)
Expand Down
3 changes: 2 additions & 1 deletion types/index.d.ts
Original file line number Diff line number Diff line change
Expand Up @@ -206,7 +206,8 @@ type CompiledMode = Omit<Mode, 'contains'> &
endRe: RegExp
illegalRe: RegExp
matcher: any
isCompiled: true
isCompiled: true,
isMultiClass?: boolean,
starts?: CompiledMode
parent?: CompiledMode
}
Expand Down

0 comments on commit 05a2770

Please sign in to comment.