Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug - Lexer matches incorrect token #279

Open
jeertmans opened this issue Jan 18, 2023 · 2 comments
Open

Bug - Lexer matches incorrect token #279

jeertmans opened this issue Jan 18, 2023 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@jeertmans
Copy link
Collaborator

jeertmans commented Jan 18, 2023

Hello! For the context, I am writing a TeX file parser, and I use Logos==0.12.1 to create a lexer.

You can find a (reduced) version of my Token implementation:

use logos::Logos;

#[derive(Logos, Debug)]
pub enum Token {
    #[token(r"\")]
    Backslash,
    #[token(r"\\")]
    DoubleBackslash,
    #[token(r"\begin")]
    EnvironmentBegin,
    #[token(r"\end")]
    EnvironmentEnd,
    //#[token(r"\begin{document}")] // <- the part that creates problems
    DocumentBegin,                  // <-
    #[regex(r"\\[a-zA-Z]+")]
    MacroName,
    #[error]
    Error,
}

When I run my test suite, see reduced version below, it works fine. However, when I add the enum variant DocumentBegin, which should not be matches by anything in within the test suite, I get an error saying that \begin now matches token Backslash. This does not make sense to me, since Blackslash should only match \\, and nothing else.

If I comment out the Backslash token, then \begin matches MacroName, but not EnvironmentBegin.

After reading the documentation, testing on different versions of Logos, I still cannot understand why this does not work as expected.

Can someone help me about this?

Test suite

#[cfg(test)]
mod tests {
    use super::*;
    use logos::Logos;

    macro_rules! assert_token_positions {
        ($source:expr, $token:pat, $($pos:expr),+ $(,)?) => {
            let source = $source;

            let positions: Vec<std::ops::Range<usize>> = vec![$($pos),*];
            let spanned_token: Vec<_> = Token::lexer(source)
                .spanned()
                .filter(|(token, _)| matches!(token, $token))
                .collect();


            let strs: Vec<_> = Token::lexer(source)
                .spanned()
                .map(|(token, span)| (token, source[span].to_string()))
                .collect();

            assert_eq!(
                spanned_token.len(), positions.len(),
                "The number of tokens found did not match the expected number of positions {strs:?}"
            );

            for (pos, (token, span)) in positions.into_iter().zip(spanned_token) {
                assert_eq!(
                    pos,
                    span,
                    "Token {token:#?} was found, but expected at {pos:?}"
                );
            }
        };
    }

    #[test]
    fn token_backslash() {
        assert_token_positions!(r"Should match \+, but not \\+", Token::Backslash, 13..14,);
    }
    #[test]
    fn token_double_backslash() {
        assert_token_positions!(
            r"Should match \\, but not \",
            Token::DoubleBackslash,
            13..15,
        );
    }
    #[test]
    fn token_environment_begin() {
        assert_token_positions!(r"\begin{equation}", Token::EnvironmentBegin, 0..6,);
    }
    #[test]
    fn token_environment_end() {
        assert_token_positions!(r"\end{equation}", Token::EnvironmentEnd, 0..4,);
    }
    #[test]
    fn token_macro_name() {
        assert_token_positions!(
            r"\sin\cos\text{some text}\alpha1234",
            Token::MacroName,
            0..4,
            4..8,
            8..13,
            24..30,
        );
    }
}

Test outputs

Without DocumentBegin variant

running 5 tests
test tests::token_environment_begin ... ok
test tests::token_backslash ... ok
test tests::token_environment_end ... ok
test tests::token_double_backslash ... ok
test tests::token_macro_name ... ok

test result: ok. 5 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

With DocumentBegin variant

running 5 tests
test tests::token_double_backslash ... ok
test tests::token_backslash ... ok
test tests::token_environment_end ... ok
test tests::token_macro_name ... ok
test tests::token_environment_begin ... FAILED

failures:

---- tests::token_environment_begin stdout ----
thread 'tests::token_environment_begin' panicked at 'assertion failed: `(left == right)`
  left: `0`,
 right: `1`: The number of tokens found did not match the expected number of positions [(Backslash, "\\begin"), (Error, "{"), (Error, "e"), (Error, "q"), (Error, "u"), (Error, "a"), (Error, "t"), (Error, "i"), (Error, "o"), (Error, "n"), (Error, "}")]', src/main.rs:80:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


failures:
    tests::token_environment_begin

test result: FAILED. 4 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

With DocumentBegin variant, without Backslash variant

NOTE: token_backslash test is now ignored (since we removed the variant)

running 5 tests
test tests::token_backslash ... ignored
test tests::token_environment_end ... ok
test tests::token_macro_name ... ok
test tests::token_environment_begin ... FAILED
test tests::token_double_backslash ... ok

failures:

---- tests::token_environment_begin stdout ----
thread 'tests::token_environment_begin' panicked at 'assertion failed: `(left == right)`
  left: `0`,
 right: `1`: The number of tokens found did not match the expected number of positions [(MacroName, "\\begin"), (Error, "{"), (Error, "e"), (Error, "q"), (Error, "u"), (Error, "a"), (Error, "t"), (Error, "i"), (Error, "o"), (Error, "n")]', src/main.rs:81:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


failures:
    tests::token_environment_begin

test result: FAILED. 3 passed; 1 failed; 1 ignored; 0 measured; 0 filtered out; finished in 0.00s
@maciejhirsz
Copy link
Owner

Thanks for the very detailed report. I think I've some idea of what's happening here, namely:

  1. Lexer reads \ which is a potential match.
  2. Lexer reads b and continues to match whole \begin which is a potential match.
  3. Lexer checks for {, doesn't find it, so it settles on the last correctly matched pattern, which it incorrectly assumes to be 1 instead of 2.

I'll see what I can do about it today.

@maciejhirsz maciejhirsz added the bug Something isn't working label Feb 26, 2023
@maciejhirsz maciejhirsz self-assigned this Feb 26, 2023
@jeertmans
Copy link
Collaborator Author

Happy to see you back @maciejhirsz :D

I hope you will find how to solve this! Do not hesitate if I can help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants