Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

interp: fix multi-line shell pattern match equality #866

Merged
merged 1 commit into from
Jun 7, 2022

Conversation

lollipopman
Copy link
Contributor

@lollipopman lollipopman commented May 25, 2022

commit be03afe added the multi-line
flag to the regexp generated for == and =! testing. However, this
flag is confusing as it allows for a regular expression to match one
line out of a multi-line input.

(m) multi-line mode: ^ and $ match begin/end line in addition to
begin/end text (default false)

This means we are now matching against just one line in a multi-line
string, rather than the entire text. Instead, to match Bash's behavior,
we add the s flag to the regexp:

(s) let . match \n (default false)

With this a * in a shell pattern will match \n.

Glob patterns should always match newline characters, so add the s
flag by default when creating a pattern. Alternately we could add the
s flag only when the . meta-character is needed, but that seemed
more fragile. This behavior worked by accident with file globs because
rather than matching with .* we match with [^/]* which does match
\n.

@lollipopman
Copy link
Contributor Author

@ihar-orca I would love a review, since you submitted the original patch

@lollipopman lollipopman force-pushed the fix-multiline-match branch 6 times, most recently from 3c72dec to 6f30262 Compare May 26, 2022 18:39
@ihar-orca
Copy link
Contributor

@lollipopman looks really good!

Copy link
Owner

@mvdan mvdan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also thinking about the added API - it seems like we do need the added API, because in some cases we want the partial matching to happen. Or perhaps not - if one needs partial matching, perhaps foobar could be fed in as *foobar*, then it will still work as expected.

@lollipopman
Copy link
Contributor Author

Also thinking about the added API - it seems like we do need the added API, because in some cases we want the partial matching to happen. Or perhaps not - if one needs partial matching, perhaps foobar could be fed in as *foobar*, then it will still work as expected.

I had not thought about inverting the need for whole string matches, by converting each pattern into a whole string match as you suggest. I tried doing some rough benchmarks in Go and whole string matches appear significantly slower on larger texts, as I would expect.

func BenchmarkPartialString(b *testing.B) {
	b.ReportAllocs()
	out, err := exec.Command("man", "bash").Output()
	if err != nil {
		log.Fatal(err)
	}
	manBash := string(out)
	partial := regexp.MustCompile("COPYRIGHT")
	for i := 0; i < b.N; i++ {
		if !partial.MatchString(manBash) {
			b.Fatal(err)
		}
	}
}
func BenchmarkWholeString(b *testing.B) {
	b.ReportAllocs()
	out, err := exec.Command("man", "bash").Output()
	if err != nil {
		log.Fatal(err)
	}
	manBash := string(out)
	wholeString := regexp.MustCompile("^(?s).*COPYRIGHT.*$")
	for i := 0; i < b.N; i++ {
		if !wholeString.MatchString(manBash) {
			b.Fatal(err)
		}
	}
}

@mvdan
Copy link
Owner

mvdan commented Jun 7, 2022

Ah, good point - "whole" matching is slower, so having it opt-in makes sense even if most use cases within shell will want to opt into the option.

Copy link
Owner

@mvdan mvdan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice patch :) Thanks for bearing with me as I review this - it took me a while as I've been travelling.

More than happy to review more patches from you if they are of this quality.

commit be03afe added the multi-line
flag to the regexp generated for `==` and `=!` testing. However, this
flag is confusing as it allows for a regular expression to match one
line out of a multi-line input.

    (m) multi-line mode: ^ and $ match begin/end line in addition to
    begin/end text (default false)

This means we are now matching against just one line in a multi-line
string, rather than the entire text. Instead, to match Bash's behavior,
we add the `s` flag to the regexp:

    (s) let . match \n (default false)

With this a `*` in a shell pattern will match `\n`.

Glob patterns should always match newline characters, so add the `s`
flag when the `.` metacharacter is used. This behavior worked by
accident with file globs because rather than matching with `.*` we match
with `[^/]*` which does match `\n`.
@lollipopman lollipopman force-pushed the fix-multiline-match branch from 6f30262 to 5bb30cf Compare June 7, 2022 18:28
@lollipopman
Copy link
Contributor Author

Very nice patch :) Thanks for bearing with me as I review this - it took me a while as I've been travelling.

More than happy to review more patches from you if they are of this quality.

Thanks, hopefully I will find the time to keep them coming!

@mvdan mvdan merged commit 7164cd1 into mvdan:master Jun 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants