Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assign any other text to a "default" token. #325

Open
talwat opened this issue Jun 29, 2023 · 10 comments
Open

Assign any other text to a "default" token. #325

talwat opened this issue Jun 29, 2023 · 10 comments

Comments

@talwat
Copy link

talwat commented Jun 29, 2023

Basically, I was wondering if it's possible to have it so that when a piece of text doesn't match any other tokens, it is assigned a default token.

For example:

 #[derive(Logos, Debug, PartialEq)]
 #[logos(skip r"[ \t\n\f]+")]
 enum Token {
     #[token("fast")]
     Fast,

     #[token(".")]
     Period,

     #[default()]
     Text,
 }

And any text that isn't fast or . would be assigned to Text. I have tried doing this with a regex but it didn't work, perhaps I'm missing something? I'm new to using logos, so maybe I'm just thinking about this incorrectly, but this library has a shocking lack of examples (eg. I wasn't sure how to use a for loop with the lexer)

@jeertmans
Copy link
Collaborator

Hello! This used to be the case with prior version of Logos, but now the lexer returns an error Err(T::default()), instead of Ok(Token...). More about that here: https://logos.maciej.codes/attributes/logos.html#custom-error-type.

With the current Logos version, you could do something like this (not tested):

#[derive(Default, Debug, Clone, PartialEq)]
enum LexingError {
    /* Add any other variant */
    #[default]
    InvalidToken,
}

#[derive(Debug, Logos, PartialEq)]
#[logos(error = LexingError)]
#[logos(skip r"[ \t\n\f]+")]
enum Token {
    #[token("fast")]
    Fast,

    #[token(".")]
    Period,

    Text,
}

fn main {
    let mut lex = Token::lexer("Some random fast sentence.");

    lex.map(|res| {
        if Err(LexingError::InvalidToken) == res {
            return Ok(Token::Text);
        }
        res
    }).for_each(|res| { /* do stuff */ });
}

@talwat
Copy link
Author

talwat commented Jun 29, 2023

Could there be an option in the future where you can set a default/fallback?

@jeertmans
Copy link
Collaborator

Yes I think so :)

If you know how to create derive macros, I think you could easily implement this

@talwat
Copy link
Author

talwat commented Jun 29, 2023

If you know how to create derive macros, I think you could easily implement this

I'm actually still a bit of a beginner to rust, and I'm using logos for a project of mine (It was recommended to me by someone on a discord server)

But other than this bit this it's very nice, just a little confusing at times since there aren't many examples to follow for certain things.

@jeertmans
Copy link
Collaborator

Did you check the handbook? I tried to make the documentation a bit more comprehensive :)

@talwat
Copy link
Author

talwat commented Jun 30, 2023

Did you check the handbook? I tried to make the documentation a bit more comprehensive :)

It's much better, but the problem is when you try to use the slice method on the lexer, it works with the old example of spamming next but doesn't with a for loop since rust calls into_iter implicitly, so that's the problem I had.

@talwat
Copy link
Author

talwat commented Jun 30, 2023

What you did in the json example however is something I hadn't thought of:

while let Some(token) = lexer.next()

and while it isn't as readable as a simple for in which is the ideal scenario, it's not bad and does work for me.

Adding these examples really helped, thank you. :)

@talwat
Copy link
Author

talwat commented Jul 2, 2023

@jeertmans Would it be possible in the future to create a new macro where you have an option to set a default/fallback?

@talwat
Copy link
Author

talwat commented Jul 7, 2023

Hello! This used to be the case with prior version of Logos, but now the lexer returns an error Err(T::default()), instead of Ok(Token...). More about that here: https://logos.maciej.codes/attributes/logos.html#custom-error-type.

With the current Logos version, you could do something like this (not tested):

#[derive(Default, Debug, Clone, PartialEq)]
enum LexingError {
    /* Add any other variant */
    #[default]
    InvalidToken,
}

#[derive(Debug, Logos, PartialEq)]
#[logos(error = LexingError)]
#[logos(skip r"[ \t\n\f]+")]
enum Token {
    #[token("fast")]
    Fast,

    #[token(".")]
    Period,

    Text,
}

fn main {
    let mut lex = Token::lexer("Some random fast sentence.");

    lex.map(|res| {
        if Err(LexingError::InvalidToken) == res {
            return Ok(Token::Text);
        }
        res
    }).for_each(|res| { /* do stuff */ });
}

I tried this out of curiousity, and it doesn't work either because it will mark every single character as it's own token. So instead of something like [Text, Fast, Text, Period] it's [Text, Text, Text, Text, Text, ..., Fast, Text, Text, Text, Text, Text, ..., Period]

@jeertmans
Copy link
Collaborator

Yes, this has to be expected somehow. Maybe you should implement your own iterator that takes a Lexer and "merge" consecutive InvalidToken errors into one token. You could do that very easily using the Peekable trait.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants