Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(parser) complete fix for resuming matches from same index #2678

Merged
merged 9 commits into from
Sep 17, 2020

Conversation

joshgoebel
Copy link
Member

@joshgoebel joshgoebel commented Sep 9, 2020

Resolves #2649.

Both our prior definitions of resuming were "off the mark"... resuming properly actually means you have to briefly must have two parallel lines of regex search.

  1. You have to resume the prior multi-match at the SAME [index] position ("resume" meaning only matching expressions that haven't been tried yet). It's possible one of those remaining expressions could still match at index.
  2. You also have to begin a full multi-match at the next [index+1] position. This is to allow ALL the potential regex expressions a chance to match starting at index+1...
  3. Then you "merge" the results... meaning whichever matched first (smaller index) is the next actual match.

The prior solution ignored the need for number 2 above... so we resumed matching but ONLY looking for the remaining expressions (however far in the future) so we would miss a lot of other quite valid expressions.

Or another way to think of this might be a "rotating match".

Normally our regex matching is looking for say [A,B or C]:

match [A,B,C] at offset 0

But after we match B and decide to ignore it we really want to rotate the matches:
[C,A,B(but not the same B)]. So we do this:

// we don't need to worry about A or B at 0, we already ruled them out
resume match [C] at offset +0
full match [A,B,C] at offset +1

Technically that last C is doing a little extra work potentially (in some cases it might match the exact same thing the first matcher does), but it shouldn't be too bad, as this is already an edge case.


An actual Java real-life case:

ImmutablePair.of(Stuff.class, "bar");
23

For simplicity our rules in Java are: [string, "class" (begin keyword), number]. Class having the magic "not proceeded by "." internal rule (which means the "class" match here will be ignored)... At that point technically we still need to "complete" the existing matcher and see if it might still match number... but if we do alone (without considering the full ruleset) then we end up skipping the string "bar" and only highlighting 23 (the first number we find)...

@joshgoebel joshgoebel marked this pull request as draft September 9, 2020 08:15
@joshgoebel
Copy link
Member Author

I'm thinking perhaps we could simplify further in that I'm pretty sure the only RESUME match we care about would HAVE to be at position 0... so if the first matcher came back with non index === 0 match I think we could perhaps just ignore it completely in favor of the second.

// need to advance one position and revert to full scanning before we
// decide there are truly no more matches at all to be had
if (!match && top.matcher.resumingScanAtSamePosition()) {
advanceOne();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

advanceOne() is no longer needed.
Otherwise it's all looks good!

package.json Outdated Show resolved Hide resolved
@joshgoebel joshgoebel merged commit b45e211 into highlightjs:master Sep 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

(parser) continueScanAtSamePosition can break highlighting if no other matches are found
3 participants