Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request support for reporting partial matches. #1014

Closed
hagbard opened this issue Jun 19, 2023 · 2 comments
Closed

Request support for reporting partial matches. #1014

hagbard opened this issue Jun 19, 2023 · 2 comments
Labels

Comments

@hagbard
Copy link

hagbard commented Jun 19, 2023

Some regex engines support the ability to determine that, in cases where input wasn't matched, that the input was however a valid prefix of something which could match the expression.

In Java this is implemented via the "hitEnd()" method which reports when the previous match operation failed only due to a lack of input.

This is useful when using regular expressions for things like incrementally validating user input, since it lets you differentiate between:

  • This is invalid because it could never be matched (issue a warning in the UI)
  • This isn't valid yet, but more input might make it valid.

This could be implemented in one of several ways but I think the easiest would be a new is_partial_match() function alongside is_match().

This would probably be best if it returns true for both complete and partial matches (exact name tbd).

In fact your docs should a good example of a case where partial matching could be useful:

let re = Regex::new("[0-9]{3}-[0-9]{3}-[0-9]{4}").unwrap();
let mat = re.find("phone: 111-222-3333").unwrap();

Wouldn't it be nice to be able to report to the user that 111-222-33 was an incomplete number rather than just failing to match it at all?

@BurntSushi
Copy link
Member

BurntSushi commented Jun 19, 2023

Wouldn't it be nice to be able to report to the user that 111-222-33 was an incomplete number rather than just failing to match it at all?

For a pattern like [0-9]{3}-[0-9]{3}-[0-9]{4}, your is_partial_match routine would, I imagine, always return true. The example you quoted doesn't benefit at all from the partial matching you've conceived of here, because it's looking for a phone number in mixed data. Presumably what you'd actually want is ^[0-9]{3}-[0-9]{3}-[0-9]{4}$. That is, a partial match seemingly only makes sense when the pattern is anchored. For an unanchored pattern, it behaves as it if starts with a (?s-u:.)*?, which means that any partial match routine is always going to say, "yeah, it's possible there is a match somewhere else."

There's also likely some API design that would need to be worked out to do this.

Overall, I'd like to see someone prototype this out-of-crate once regex-automata 0.3 is released. See #656 for more details there. One problem in particular that is on my mind is that partial match support will require changing search signatures from Result<Option<Match>, MatchError> to something else, like Result<Result<Match, NoMatchError>, MatchError>. Which is pretty annoying to deal with and is a very large change.

@BurntSushi BurntSushi closed this as not planned Won't fix, can't repro, duplicate, stale Jun 19, 2023
@riking
Copy link

riking commented Jul 14, 2023

It's fairly obvious how to implement this with a regex-automata 0.3 DFA: drive the DFA for all the input you have, and if you haven't reached a halt state yet, more input is acceptable. Feed the End-Of-Input token once the stream is exhausted and do the final check for match.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants