fix(parser) complete fix for resuming matches from same index #2678

joshgoebel · 2020-09-09T06:59:09Z

Resolves #2649.

Both our prior definitions of resuming were "off the mark"... resuming properly actually means you have to briefly must have two parallel lines of regex search.

You have to resume the prior multi-match at the SAME [index] position ("resume" meaning only matching expressions that haven't been tried yet). It's possible one of those remaining expressions could still match at index.
You also have to begin a full multi-match at the next [index+1] position. This is to allow ALL the potential regex expressions a chance to match starting at index+1...
Then you "merge" the results... meaning whichever matched first (smaller index) is the next actual match.

The prior solution ignored the need for number 2 above... so we resumed matching but ONLY looking for the remaining expressions (however far in the future) so we would miss a lot of other quite valid expressions.

Or another way to think of this might be a "rotating match".

Normally our regex matching is looking for say [A,B or C]:

match [A,B,C] at offset 0

But after we match B and decide to ignore it we really want to rotate the matches:
[C,A,B(but not the same B)]. So we do this:

// we don't need to worry about A or B at 0, we already ruled them out
resume match [C] at offset +0
full match [A,B,C] at offset +1

Technically that last C is doing a little extra work potentially (in some cases it might match the exact same thing the first matcher does), but it shouldn't be too bad, as this is already an edge case.

An actual Java real-life case:

ImmutablePair.of(Stuff.class, "bar");
23

For simplicity our rules in Java are: [string, "class" (begin keyword), number]. Class having the magic "not proceeded by "." internal rule (which means the "class" match here will be ignored)... At that point technically we still need to "complete" the existing matcher and see if it might still match number... but if we do alone (without considering the full ruleset) then we end up skipping the string "bar" and only highlighting 23 (the first number we find)...

joshgoebel · 2020-09-09T09:21:32Z

I'm thinking perhaps we could simplify further in that I'm pretty sure the only RESUME match we care about would HAVE to be at position 0... so if the first matcher came back with non index === 0 match I think we could perhaps just ignore it completely in favor of the second.

egor-rogov · 2020-09-09T19:06:33Z

src/highlight.js

-        // need to advance one position and revert to full scanning before we
-        // decide there are truly no more matches at all to be had
-        if (!match && top.matcher.resumingScanAtSamePosition()) {
-          advanceOne();


advanceOne() is no longer needed.
Otherwise it's all looks good!

package.json

fix(parser) complete fix for resuming matches from same index

6a37f3c

joshgoebel marked this pull request as draft September 9, 2020 08:15

joshgoebel added 3 commits September 9, 2020 04:29

wip

50ed057

clean up nesting

3836d7e

remove unnecesary code

be54ced

joshgoebel requested review from allejo and egor-rogov September 9, 2020 09:06

joshgoebel marked this pull request as ready for review September 9, 2020 09:06

joshgoebel mentioned this pull request Sep 9, 2020

(parser) continueScanAtSamePosition can break highlighting if no other matches are found #2649

Closed

joshgoebel added 2 commits September 9, 2020 05:27

simplify check to only position 0

992984e

avoid extra work

5bb88db

egor-rogov approved these changes Sep 9, 2020

View reviewed changes

allejo approved these changes Sep 17, 2020

View reviewed changes

joshgoebel added 2 commits September 17, 2020 17:24

Merge branch 'master' into resume-is-so-complex

4db8f6d

Merge branch 'master' into resume-is-so-complex

c91fb35

joshgoebel commented Sep 17, 2020

View reviewed changes

package.json Outdated Show resolved Hide resolved

Update package.json

a30a039

joshgoebel merged commit b45e211 into highlightjs:master Sep 17, 2020

joshgoebel added a commit that referenced this pull request Sep 21, 2020

fix(parser) complete fix for resuming matches from same index (#2678)

3c87587

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(parser) complete fix for resuming matches from same index #2678

fix(parser) complete fix for resuming matches from same index #2678

joshgoebel commented Sep 9, 2020 •

edited

joshgoebel commented Sep 9, 2020

egor-rogov Sep 9, 2020

fix(parser) complete fix for resuming matches from same index #2678

fix(parser) complete fix for resuming matches from same index #2678

Conversation

joshgoebel commented Sep 9, 2020 • edited

joshgoebel commented Sep 9, 2020

egor-rogov Sep 9, 2020

Choose a reason for hiding this comment

joshgoebel commented Sep 9, 2020 •

edited