Improve XRegExp.matchRecursive handling of unbalanced delimiters #96

farfromrefug · 2015-08-26T14:14:15Z

Hi,

First thanks for your great library.
I am using matchRecursive to find JSON messages on a stream.
Every time i receive a message, i push it a string stack.
Then i use matchRecursive to try and find messages. If i find some, i slice those messages from the stack.
Now sometimes some messages get lost and i end up with "unbalanced" stack, like this:
{"toto":"test","bd":{"toto":"test","bd":{"toto":"test","bd":{"toto":"test","bd":1,"asdfg":true}{"toto":"test","bd":{"toto":"test","bd":1,"asdfg":true}{"toto":"test","bd":1,"asdfg":{"toto":2}}

If i use your library like so i cant get the message in this because it's unbalanced.
And this is correct. there is no outputs if you look at this message.

Though there is a simple way to find the messages. Reverse the string and inverse the left and right delimiters.
It works and matchRecursive actually found the message. However it still reports unbalanced and throw at the end.
I easily fixed this locally by only throwing if ouptuts.length===0 like this:

if (output.length === 0) {
                    throw new Error("string contains unbalanced delimiters");
                } else {
                    return output;
                }

That way i can still get the matched strings even if at the end it is unbalanced.
This is very useful.

I was wondering if you would be willing to commit that in the master branch?
I think it would be a great addition (could become an option).

The text was updated successfully, but these errors were encountered:

slevithan · 2015-08-26T23:26:43Z

Yes, I'd be totally willing to include an option that allows XRegExp.matchRecursive to skip past unbalanced delimiters. In fact, I'd prefer to make that the default. Perhaps the new option could be called requireBalance, and when set to true it would revert to the current handling where scanning past an unbalanced delimiter throws an error.

Unfortunately, I don't expect to get to this for XRegExp 3.0 final, unless it's provided as a pull request.

The challenge is supporting all of matchRecursive's existing features and providing robust tests proving the change is safe. Right now matchRecursive is under-tested. See https://github.com/slevithan/xregexp/blob/master/tests/spec/s-addons-matchrecursive.js which currently just tests the examples from the readme, rather than testing individual features, combinations, and assumptions.

Features that would need to continue working correctly after such a change is introduced include:

Sticky and non-sticky mode, via flag y.
Global and non-global mode, via flag g.
The escapeChar option.
Extended information mode, via the valueNames option. The unbalanced delimiters should appear within the "between" values that are returned. As basic examples, if your delimiters are { and }, and your subject is 'a{b{c}d', I think a{b and d should be "between" values, and c should be the match. Similarly, if your subject is 'a{b}c}d', I think a and c}d should be "between" values, and b should be the match.
Maintain behavior (or document reasonable behavior) for when left and right delimiters match some or all of the same strings.
Maintain behavior (or document reasonable behavior) for when left and or right delimiters sometimes or always match empty strings.

If this is something you're able to help with, that would be awesome and much appreciated.

farfromrefug · 2015-08-28T07:31:31Z

Hi,

I might look at this and create a PR. But all your changes requirements wont work.
Especially handling 'a{b{c}d'. This one wont report results and cant do so because the unbalanced happens before any match is found.

I don't want to make any hard change in your code so my PR will mostly only include the code i posted before.
What i can do is make it an option though. I think i can also look at handling 'a{b}c}d' and c}d beetween value.

farfromrefug · 2015-08-28T07:41:16Z

here s the PR #99

slevithan · 2015-08-29T04:41:38Z

I might look at this and create a PR. But all your changes requirements wont work.
Especially handling 'a{b{c}d'. This one wont report results and cant do so because the unbalanced happens before any match is found.

The handling I described is the handling of all regex engines that support recursive matching (Perl, PCRE, and .NET). What happens in the case of something like 'a{b{c}d' is that, when the match attempt starting at the first { reaches the end of the string and discovers that the { is unbalanced, the match attempt fails, and the engine moves on to start a new match attempt at the next character position, and so on. IMO this is also how recursive matching using XRegExp.matchRecursive can and should work, at least by default. Admittedly, this might require significant changes to the function's current implementation.

Aside: It might also make sense to add another option (named something like includeUnterminatedMatches) that would change the handling for unbalanced left delimiters, so that they're not skipped past but instead continue to the end of the subject string. I'm not sure what the use cases for this would be, though. You could get even more flexible with an additional option like unterminatedMatchesEndAtNextLeftDelimiter.

It seems I might have misunderstood your original message, though. Rather than supporting all aspects of recursive matching within unbalanced strings, it seems you're looking to make a narrower change that is fairly specific to your situation (ending a global match early and not throwing on unbalanced delimiters if you've already found at least one balanced match). If that's the case, it might be better to just make the change in your local copy. I'm not sure how common the case you're describing is, and partial handling might cause more confusion and problems than it solves.

More feedback is welcome, and I'll leave this issue open for the general feature of recursive matching within unbalanced strings.

farfromrefug · 2015-08-29T12:19:18Z

@slevithan I want to make it work but i am not sure we can when the unbalance starts before any match found.
And in fact the problem is not that no match is found. But instead that openTokens is already >0 when the first match is found.
The problem then becomes the finding of deep delimiters like {{}}
I wrapped my head around it and i don't see a way to discover them.
Having the openTokens reach 0 seems to be the only way to find them and it will never happen in the case of 'a{b{c}d'

But dont worry if we find a way to handle that i will change my PR to support it :D
And yes i kind of stop to there because in a sense it was enough for me and because i did not see any solution. Well for me there was one is handling the reverse string as explained in my first post

Best

olavim · 2017-04-02T10:09:49Z

+1
I'm trying to find JSON objects from an arbitrary string using matchRecursive. If the string has unmatched curly brackets, I don't get any results since matchRecursive throws and doesn't return anything.

dvargas92495 · 2021-07-22T01:31:56Z

Is there anyone tackling this right now? if not, I'm more than willing to put up a PR for this as I would like to just ignore unbalanced delimeters for a project I'm working on. Am cool with it being default behavior or opt in

josephfrazier · 2021-07-22T15:56:51Z

I'm not aware of anyone working on it, go for it! Happy to try to help out along the way

slevithan · 2021-07-22T17:22:59Z

Yeah, that would be great!

Am cool with it being default behavior or opt in

It should start out as opt-in since changing the default behavior would require a major version change.

slevithan#96)

slevithan · 2021-08-03T18:39:00Z

Thanks to @dvargas92495 for the sweet/clean implementation in #332. It came out even better than expected, and XRegExp.matchRecursive now supports three handling options for unbalanced delimiters via the new unbalanced option: 'error' (default), 'skip', and 'skip-lazy'.

This will go out in the next minor release.

slevithan · 2021-08-03T19:51:03Z

Aside: There's lots of potential for additional future handling modes for unbalanced delimiters if there are strong use cases. Things like:

'stop-matching-at-first-unbalanced'
'skip-inner-unbalanced' (reverse the 'skip' mode's handling of skipping outer unbalanced delimiters)
'include-trailing-unterminated-match' (treat end-of-string as closing any open delimiters)
'unclosed-left-delimiters-end-at-next-left'

Or whatever else is actually useful for reasonably broad use cases.

slevithan · 2021-08-04T22:46:27Z

@dvargas92495 and others: This is now published on npm as v5.1.0.

slevithan added change enhancement labels Aug 26, 2015

slevithan mentioned this issue Aug 29, 2015

return existing results if unbalanced #99

Closed

slevithan changed the title ~~unbalanced: returns output already found~~ Improve XRegExp.matchRecursive handling of unbalanced delimiters May 15, 2016

slevithan mentioned this issue Nov 8, 2019

Recursive doesn't work as expected #272

Closed

3cp mentioned this issue May 15, 2020

Can we have some position information in Unbalanced delimiter error? #293

Closed

slevithan mentioned this issue Jul 14, 2021

Question about using matchRecursive and in an unbalanced string #331

Closed

dvargas92495 added a commit to dvargas92495/xregexp that referenced this issue Jul 29, 2021

Improve XRegExp.matchRecursive handling of unbalanced delimiters (Closes

661f676

slevithan#96)

dvargas92495 mentioned this issue Jul 29, 2021

Improve XRegExp.matchRecursive handling of unbalanced delimiters (Clo… #332

Merged

slevithan closed this as completed in 312a1e6 Aug 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve XRegExp.matchRecursive handling of unbalanced delimiters #96

Improve XRegExp.matchRecursive handling of unbalanced delimiters #96

farfromrefug commented Aug 26, 2015

slevithan commented Aug 26, 2015

farfromrefug commented Aug 28, 2015

farfromrefug commented Aug 28, 2015

slevithan commented Aug 29, 2015

farfromrefug commented Aug 29, 2015

olavim commented Apr 2, 2017 •

edited

Loading

dvargas92495 commented Jul 22, 2021

josephfrazier commented Jul 22, 2021

slevithan commented Jul 22, 2021

slevithan commented Aug 3, 2021

slevithan commented Aug 3, 2021 •

edited

Loading

slevithan commented Aug 4, 2021

Improve XRegExp.matchRecursive handling of unbalanced delimiters #96

Improve XRegExp.matchRecursive handling of unbalanced delimiters #96

Comments

farfromrefug commented Aug 26, 2015

slevithan commented Aug 26, 2015

farfromrefug commented Aug 28, 2015

farfromrefug commented Aug 28, 2015

slevithan commented Aug 29, 2015

farfromrefug commented Aug 29, 2015

olavim commented Apr 2, 2017 • edited Loading

dvargas92495 commented Jul 22, 2021

josephfrazier commented Jul 22, 2021

slevithan commented Jul 22, 2021

slevithan commented Aug 3, 2021

slevithan commented Aug 3, 2021 • edited Loading

slevithan commented Aug 4, 2021

olavim commented Apr 2, 2017 •

edited

Loading

slevithan commented Aug 3, 2021 •

edited

Loading