Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fuzzy Matching Not Working For Me #517

Open
Commando-Brando opened this issue Dec 22, 2023 · 1 comment
Open

Fuzzy Matching Not Working For Me #517

Commando-Brando opened this issue Dec 22, 2023 · 1 comment

Comments

@Commando-Brando
Copy link

Commando-Brando commented Dec 22, 2023

Hey everyone I am trying to test a regex pattern using regex package and currently the current test for the test parameters is failing:

("<ACCESSION-NUM>0000915913-23-000148", None),  # three typos (should fail)

Full Test:

import pytest
import regex

# Define the regex pattern with fuzzy matching
accession_fuzzy_regex_pattern = r"<(?:ACCESSION-NUMBER){e<=2}>\s*(\d+-\d+-\d+)"


# Test cases
@pytest.mark.parametrize(
    ("test_input", "expected"),
    [
        (
            "<ACCESSION-NUMBER>0000915913-23-000148",
            "0000915913-23-000148",
        ),  # exact match
        ("<ACCESSION-NUMBE>0000915913-23-000148", "0000915913-23-000148"),  # one typo
        ("<ACCESSION-NUMB>0000915913-23-000148", "0000915913-23-000148"),  # two typos
        ("<ACCESSION-NUM>0000915913-23-000148", None),  # three typos (should fail)
        (
            "<ACCSSION-NUMBER>0000915913-23-000148",
            None,
        ),  # missing one character (should fail)
        ("<ACCESSION-NUMBER>0000915913-23-", None),  # incomplete number (should fail)
        ("<ACCESSION-NUMBER>", None),  # no number (should fail)
        ("0000915913-23-000148", None),  # no tag (should fail)
    ],
)
def test_accession_number_regex(test_input, expected):
    """Test the accession number regex pattern."""
    match = regex.search(accession_fuzzy_regex_pattern, test_input)
    if match:
        group_one = match.group(1)
        assert group_one == expected
    else:
        assert expected is None

The failing test case (#4) has 3 typos so it should not match but for some reason it matches and so does deleted more characters.

@mrabarnett
Copy link
Owner

With regex 2023.10.3 (the current version) I'm getting:

"<ACCESSION-NUM>0000915913-23-000148"   # no match
"<ACCSSION-NUMBER>0000915913-23-000148" # match

as expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants