[BUG] Logos is unable to recover because of whitespace #358

Rafaeltheraven · 2024-01-08T16:51:15Z

Not exactly sure how to title this issue or how to describe this bug. I'll present you the following enum:

#[logos(skip r"/\*([^*]|\*[^/])+\*/|[ \t\n\f]+")]
pub enum Metadata {
	#[regex(r#"[a-zA-Z][a-zA-Z0-9]*[\s]*:[\s]*[^:;][^;]*"#, |lex| split_to_twople(lex.slice(), ':'))]
	Map((String, String)),

	#[token("include")]
	Include,

        #[token(";")]
        Semi,

        #[regex(r"[a-zA-Z]*(::[a-zA-Z]*)+", |lex| lex.slice().to_string())]
        Path(String),
}

Given the input string: "include some::path" this will make the lexer return an error.

Some explanation of the enum. The regex for Map describes a key: value pair. The key must be alphanumeric. The value can be anything. Each pair is terminated with a semicolon. The paths are Rust style paths import paths. You can ignore the split_to_twople method. It returns a specific error in case of failure, as opposed to the generic error that is being returned by the issue. We can be sure that's not where the error lies.

Through experimenting, I have found that removing the [\s]* to the left of the : in the Map regex fixes the issue, but as a result, you are no longer able to do key : value, which I would like to still lex.

It seems like somewhere, after it has tried the whitespace and then fails on the : it is unable to recover and try other valid lexes (in this case, include).

The text was updated successfully, but these errors were encountered:

jeertmans · 2024-01-10T07:24:10Z

Hello!

Did you try without the whitespace? Your skip regex is quite complicated, but I guess it should match white spaces. Maybe you can try with a simple skip whitespace regex (so remove other tokens) and see if it works.

In general, you regexes seem overly complicated for what they should achieve. Also, I am not sure you should put \s inside [] for white spaces matching. Maybe simplifying your regexes can first help.

Rafaeltheraven · 2024-01-10T10:29:27Z

The skip regex matches white space, as well as comments (// and /* */ style). IIRC I got it from one of the issues on this repo. Simplifying it down to just [ \t\n\f]+ does not resolve the problem.

I agree that my regexes are much too complicated, and I've resolved to doing more simple lexing and solving this problem on the parser level. However, the issue still stands that (at least it seems to me) this should be possible and there is some bug somewhere that causes this to fail.

Finally, yeah you don't have to put \s inside [], but I also don't think it changes anything (as opposed to just matching \s it matches anything in the set [\s] which is just \s). I think I started doing that because of the alternative [:space:] notation which does require [].

Rafaeltheraven · 2024-01-10T11:05:31Z

Update, simplifying the regex to [a-zA-Z][a-zA-Z0-9]*\s*: gives the exact same issues.

jeertmans added bug Something isn't working help wanted Extra attention is needed labels Feb 13, 2024

jeertmans mentioned this issue Feb 19, 2024

Regex seems to always take priority #377

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Logos is unable to recover because of whitespace #358

[BUG] Logos is unable to recover because of whitespace #358

Rafaeltheraven commented Jan 8, 2024

jeertmans commented Jan 10, 2024

Rafaeltheraven commented Jan 10, 2024

Rafaeltheraven commented Jan 10, 2024

[BUG] Logos is unable to recover because of whitespace #358

[BUG] Logos is unable to recover because of whitespace #358

Comments

Rafaeltheraven commented Jan 8, 2024

jeertmans commented Jan 10, 2024

Rafaeltheraven commented Jan 10, 2024

Rafaeltheraven commented Jan 10, 2024