Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] optional end_of_input token just before Lexer starts returning None #328

Open
legeana opened this issue Jul 18, 2023 · 3 comments

Comments

@legeana
Copy link

legeana commented Jul 18, 2023

It would be really convenient to have an ability to inject custom EndOfInput token, just before the lexer starts to return None.

  #[logos(error = LexerError)]
  #[logos(extras = LineTracker)]
  #[logos(skip r"#.*")] // comments
  pub enum Token {
      #[end_of_input]
      EndOfInput,
      #[token("\n")]
      Newline,
  }

For some shell-like grammars where statements terminated by a newline having EndOfInput, or even injecting the Newline itself at the end, can make parsing unterminated trailing statements much easier, because you can define a Statement = Command+ (Newline | EndOfInput).

Without this feature I just made a wrapper that returns one additional token after Logos returned None.

@jeertmans
Copy link
Collaborator

Hello, thanks for your suggestion!

Performance wise, I don't see any preference over using the Iterator::chain method:

#[derive(Debug)]
enum Token {
    A,
    B,
    C,
    EOF,
}

fn main() {

    use Token::*;

    let mut lexer = vec![A, B, C, A, B, C]
        .into_iter()
        .chain(Some(EOF));
    
    while let Some(token) = lexer.next() {
        println!("{:?}", token);
    }
}

I understand that this requires to manually add the last token using chain, but I don't think Logos can actually do something better than that :-/

@maciejhirsz
Copy link
Owner

I think we could handle that, I'll keep that in mind when I get to coding!

@legeana
Copy link
Author

legeana commented Jul 19, 2023

Hello, thanks for your suggestion!

Performance wise, I don't see any preference over using the Iterator::chain method:

#[derive(Debug)]
enum Token {
    A,
    B,
    C,
    EOF,
}

fn main() {

    use Token::*;

    let mut lexer = vec![A, B, C, A, B, C]
        .into_iter()
        .chain(Some(EOF));
    
    while let Some(token) = lexer.next() {
        println!("{:?}", token);
    }
}

I understand that this requires to manually add the last token using chain, but I don't think Logos can actually do something better than that :-/

My feeling is if you use chain you lose the logos::Lexer type, so you can't easily access lexer.span(), lexer.slice() and lexer.extras anymore: pub struct Chain<A, B> { /* private fields */ }. Having this function as part of logos makes a difference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants