Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Match debug impl makes it look like matches are bigger than they really are #904

Closed
HerrMuellerluedenscheid opened this issue Sep 3, 2022 · 2 comments

Comments

@HerrMuellerluedenscheid
Copy link

What version of regex are you using?

regex = "1.6.0"

This is the text against which I'm matching:

From 65ac1b403b1b2df05bbbe06c4795f9da24da4a07 Mon Sep 17 00:00:00 2001
From: xxx xxx <xxx.yyy@gmail.com>
Date: Sat, 3 Sep 2022 02:33:38 +0200
Subject: [PATCH 1/1] test patch

Signed-off-by: xxx xxx <xxx.yyy@gmail.com>
---
 README.md | 1 +
 1 file changed, 1 insertion(+)

diff --git a/README.md b/README.md
index e85faf1..698f024 100644
--- a/README.md
+++ b/README.md
@@ -1,3 +1,4 @@
+PATCHED
 Snitch - Intrusion Notification
 ===============================
 
-- 
2.36.1

This is the regex: From ((.|\n)*)\n-{3}\n

That should match until the three dash separator --- on the 7th line as it does here: https://regex101.com/r/9balQ3/1

This is how I call regex in my module:

static COMMIT_HEADER: &str = r"From ((.|\n)*)\n-{3}\n";
 
...
    let re = Regex::new(COMMIT_HEADER).unwrap();
    println!(" {:?}", re.find(std::str::from_utf8(&buff).unwrap()));

Unfortunately, regex matches the entire string down to the very list byte.

I guess I'm doing something wrong but its suprising that other tools match what I would expect.

Best regards

@HerrMuellerluedenscheid
Copy link
Author

My bad! In case someone else runs into a similar confusion: the get from a capture returns the string including the start and stop byte where the expression matched.

@BurntSushi
Copy link
Member

In the future, it is much more helpful to provide a running program. And since the regex crate is available in the playground, it's even nicer to use that. But here's a program that I believe captures what you said:

use regex::Regex;

static COMMIT_HEADER: &str = r"From ((.|\n)*)\n-{3}\n";

fn main() {
    let re = Regex::new(COMMIT_HEADER).unwrap();
    let haystack = "\
From 65ac1b403b1b2df05bbbe06c4795f9da24da4a07 Mon Sep 17 00:00:00 2001
From: xxx xxx <xxx.yyy@gmail.com>
Date: Sat, 3 Sep 2022 02:33:38 +0200
Subject: [PATCH 1/1] test patch

Signed-off-by: xxx xxx <xxx.yyy@gmail.com>
---
 README.md | 1 +
 1 file changed, 1 insertion(+)

diff --git a/README.md b/README.md
index e85faf1..698f024 100644
--- a/README.md
+++ b/README.md
@@ -1,3 +1,4 @@
+PATCHED
 Snitch - Intrusion Notification
 ===============================

--
2.36.1
";
    println!(" {:?}", re.find(haystack));
}

Playground link: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=182138a3fb05ac34d7744c0096dca04e

And its output is:

Some(Match { text: "From 65ac1b403b1b2df05bbbe06c4795f9da24da4a07 Mon Sep 17 00:00:00 2001\nFrom: xxx xxx <xxx.yyy@gmail.com>\nDate: Sat, 3 Sep 2022 02:33:38 +0200\nSubject: [PATCH 1/1] test patch\n\nSigned-off-by: xxx xxx <xxx.yyy@gmail.com>\n---\n README.md | 1 +\n 1 file changed, 1 insertion(+)\n\ndiff --git a/README.md b/README.md\nindex e85faf1..698f024 100644\n--- a/README.md\n+++ b/README.md\n@@ -1,3 +1,4 @@\n+PATCHED\n Snitch - Intrusion Notification\n ===============================\n \n-- \n2.36.1\n", start: 0, end: 222 })

So I'm guessing the text field value here led you astray? The issue here is that the Debug impl for Match is not great (#514 is for improving it). But if we change the println! statement to

println!("{:?}", re.find(haystack).unwrap().as_str());

Then its output is:

"From 65ac1b403b1b2df05bbbe06c4795f9da24da4a07 Mon Sep 17 00:00:00 2001\nFrom: xxx xxx <xxx.yyy@gmail.com>\nDate: Sat, 3 Sep 2022 02:33:38 +0200\nSubject: [PATCH 1/1] test patch\n\nSigned-off-by: xxx xxx <xxx.yyy@gmail.com>\n---\n"

Which I believe lines up with your expectations.

@BurntSushi BurntSushi changed the title Match is not in line with other tools Match debug impl makes it look like matches are bigger than they really are Sep 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants