Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

impl: fix prefix literal matching bug #984

Merged
merged 1 commit into from Apr 21, 2023
Merged

impl: fix prefix literal matching bug #984

merged 1 commit into from Apr 21, 2023

Conversation

BurntSushi
Copy link
Member

This commit fixes a bug where it was possible to report a match where none existed. Basically, in the current regex crate, it just cannot deal with a mixture of look-around assertions in the prefix of a pattern and prefix literal optimizations. Before 1.8, this was handled by simply refusing to extract literals in that case. But in 1.8, with a rewrite of the literal extractor, literals are now extracted for patterns like this:

(?i:(?:\b|_)win(?:32|64|dows)?(?:\b|_))

So in 1.8, since it was still using the old engines that can't deal with this, I added some extra logic to throw away any extracted prefix literals if a look-around assertion occurred in the prefix of the pattern. The problem is that the logic I used was "always occurs in the prefix of the pattern" instead of "may occur in the prefix of the pattern." In the pattern above, it's the latter case. So it slipped by and the regex engine tried to use the prefix literals to accelerat the search. This in turn caused mishandling of the \b and led to a false positive match.

The specific reason why the current regex engines can't deal with this is because they weren't designed to handle searches that took the surrounding context into account when resolving look-around assertions. It was a pretty big oversight on my part many years ago.

The new engines we'll be migrating to Real Soon Now don't have this problem and can deal with the prefix literal optimizations while correctly handling look-around assertions in the prefix.

Fixes #981

@BurntSushi BurntSushi force-pushed the ag/fix-981 branch 2 times, most recently from 4a75247 to cad47d7 Compare April 21, 2023 11:39
This commit fixes a bug where it was possible to report a match where
none existed. Basically, in the current regex crate, it just cannot deal
with a mixture of look-around assertions in the prefix of a pattern and
prefix literal optimizations. Before 1.8, this was handled by simply
refusing to extract literals in that case. But in 1.8, with a rewrite of
the literal extractor, literals are now extracted for patterns like
this:

    (?i:(?:\b|_)win(?:32|64|dows)?(?:\b|_))

So in 1.8, since it was still using the old engines that can't deal with
this, I added some extra logic to throw away any extracted prefix
literals if a look-around assertion occurred in the prefix of the
pattern. The problem is that the logic I used was "always occurs in the
prefix of the pattern" instead of "may occur in the prefix of the
pattern." In the pattern above, it's the latter case. So it slipped by
and the regex engine tried to use the prefix literals to accelerat the
search. This in turn caused mishandling of the `\b` and led to a false
positive match.

The specific reason why the current regex engines can't deal with this
is because they weren't designed to handle searches that took the
surrounding context into account when resolving look-around assertions.
It was a pretty big oversight on my part many years ago.

The new engines we'll be migrating to Real Soon Now don't have this
problem and can deal with the prefix literal optimizations while
correctly handling look-around assertions in the prefix.

Fixes #981
@BurntSushi BurntSushi merged commit f003d72 into master Apr 21, 2023
10 checks passed
@BurntSushi BurntSushi deleted the ag/fix-981 branch April 21, 2023 19:55
@BurntSushi BurntSushi restored the ag/fix-981 branch July 5, 2023 12:01
@BurntSushi BurntSushi deleted the ag/fix-981 branch July 5, 2023 12:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Regex matching changed in 1.8.0
1 participant