Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Logos is unable to recover because of whitespace #358

Open
Rafaeltheraven opened this issue Jan 8, 2024 · 3 comments
Open

[BUG] Logos is unable to recover because of whitespace #358

Rafaeltheraven opened this issue Jan 8, 2024 · 3 comments
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@Rafaeltheraven
Copy link

Not exactly sure how to title this issue or how to describe this bug. I'll present you the following enum:

#[logos(skip r"/\*([^*]|\*[^/])+\*/|[ \t\n\f]+")]
pub enum Metadata {
	#[regex(r#"[a-zA-Z][a-zA-Z0-9]*[\s]*:[\s]*[^:;][^;]*"#, |lex| split_to_twople(lex.slice(), ':'))]
	Map((String, String)),

	#[token("include")]
	Include,

        #[token(";")]
        Semi,

        #[regex(r"[a-zA-Z]*(::[a-zA-Z]*)+", |lex| lex.slice().to_string())]
        Path(String),
}

Given the input string: "include some::path" this will make the lexer return an error.

Some explanation of the enum. The regex for Map describes a key: value pair. The key must be alphanumeric. The value can be anything. Each pair is terminated with a semicolon. The paths are Rust style paths import paths. You can ignore the split_to_twople method. It returns a specific error in case of failure, as opposed to the generic error that is being returned by the issue. We can be sure that's not where the error lies.

Through experimenting, I have found that removing the [\s]* to the left of the : in the Map regex fixes the issue, but as a result, you are no longer able to do key : value, which I would like to still lex.

It seems like somewhere, after it has tried the whitespace and then fails on the : it is unable to recover and try other valid lexes (in this case, include).

@jeertmans
Copy link
Collaborator

Hello!

Did you try without the whitespace? Your skip regex is quite complicated, but I guess it should match white spaces. Maybe you can try with a simple skip whitespace regex (so remove other tokens) and see if it works.

In general, you regexes seem overly complicated for what they should achieve. Also, I am not sure you should put \s inside [] for white spaces matching. Maybe simplifying your regexes can first help.

@Rafaeltheraven
Copy link
Author

The skip regex matches white space, as well as comments (// and /* */ style). IIRC I got it from one of the issues on this repo. Simplifying it down to just [ \t\n\f]+ does not resolve the problem.

I agree that my regexes are much too complicated, and I've resolved to doing more simple lexing and solving this problem on the parser level. However, the issue still stands that (at least it seems to me) this should be possible and there is some bug somewhere that causes this to fail.

Finally, yeah you don't have to put \s inside [], but I also don't think it changes anything (as opposed to just matching \s it matches anything in the set [\s] which is just \s). I think I started doing that because of the alternative [:space:] notation which does require [].

@Rafaeltheraven
Copy link
Author

Update, simplifying the regex to [a-zA-Z][a-zA-Z0-9]*\s*: gives the exact same issues.

@jeertmans jeertmans added bug Something isn't working help wanted Extra attention is needed labels Feb 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants