Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merging 2 reserved nodes: This is a bug: Report it #336

Open
snowfoxsh opened this issue Aug 29, 2023 · 4 comments
Open

Merging 2 reserved nodes: This is a bug: Report it #336

snowfoxsh opened this issue Aug 29, 2023 · 4 comments

Comments

@snowfoxsh
Copy link

This is the code that generates the bug. System: macOS M1. Ventura Version 13.0.
Context. I was getting some stack stack overflow exceptions before this error was thrown.

neofetch

                    'c.          storm@|redacted|
                 ,xNMM.          ----------------------------------------------
               .OMMMMo           OS: macOS 13.0 22A380 arm64
               OMMM0,            Host: MacBookPro18,2
     .;loddo:' loolloddol;.      Kernel: 22.1.0
   cKMMMMMMMMMMNWMMMMMMMMMM0:    Uptime: 4 days, 22 hours, 15 mins
 .KMMMMMMMMMMMMMMMMMMMMMMMWd.    Packages: 57 (brew)
 XMMMMMMMMMMMMMMMMMMMMMMMX.      Shell: fish 3.6.1
;MMMMMMMMMMMMMMMMMMMMMMMM:       Resolution: 1728x1117
:MMMMMMMMMMMMMMMMMMMMMMMM:       DE: Aqua
.MMMMMMMMMMMMMMMMMMMMMMMMX.      WM: Quartz Compositor
 kMMMMMMMMMMMMMMMMMMMMMMMMWd.    WM Theme: Blue (Light)
 .XMMMMMMMMMMMMMMMMMMMMMMMMMMk   Terminal: iTerm2
  .XMMMMMMMMMMMMMMMMMMMMMMMMK.   Terminal Font: MesloLGS-NF-Regular 13
    kMMMMMMMMMMMMMMMMMMMMMMd     CPU: Apple M1 Max
     ;KMMMMMMMWXXWMMMMMMMk.      GPU: Apple M1 Max
       .cooc,.    .,coo:.        Memory: 6325MiB / 65536MiB

Error:

error: proc-macro derive panicked
 --> src/lexer.rs:5:10
  |
5 | #[derive(Logos, Debug, PartialEq)]
  |          ^^^^^
  |
  = help: message: Merging two reserved nodes! This is a bug, please report it:
          
          https://github.com/maciejhirsz/logos/issues
use std::num::{ParseFloatError, ParseIntError};
use std::str::FromStr;
use logos::{Lexer, Logos};

#[derive(Logos, Debug, PartialEq)]
#[logos(skip r"[ \t\f]+")]
pub enum Token {
    // ident
    #[regex("[A-Za-z][A-Za-z0-9]*", as_string)]
    Ident(String),

    // string literal
    // #[regex("'([^']*)'", as_literal)]
    // StringLiteral(String),

    #[regex("(?:[0-9]+)*[0-9]?.[0-9]+", |lex| lex.slice().parse().ok(), priority = 2)]
    Float(f64),

    #[regex("[0-9]+", |lex| lex.slice().parse().ok())]
    Int(i32),

    // #[regex("[0-9]+", |lex| lex.slice().parse().ok(), priority = 2)]
    // A(i32),

    #[token("\n")]
    NewLine,

    #[token("<-")]
    Assign,

    // blocks
    #[token("(")]
    LeftParen,

    #[token(")")]
    RightParen,

    #[token("{")]
    LeftBrace,

    #[token("}")]
    RightBrace,

    #[token("[")]
    LeftBracket,

    #[token("]")]
    RightBracket,

    // math operators
    #[token("+")]
    Plus,

    #[token("-")]
    Minus,

    #[token("*")]
    Star,

    #[token("/")]
    Slash,

    #[token("%")]
    #[token("mod")]
    #[token("MOD")]
    Mod,

    // logical operators
    #[token("=")]
    #[token("==")]
    Equals,

    #[token("!=")]
    NotEquals,

    #[token(">")]
    Greater,

    #[token(">=")]
    GreaterEquals,

    #[token("<")]
    Less,

    #[token("<=")]
    LessEquals,

    // selection keywords
    #[token("if")]
    #[token("IF")]
    If,
    #[token("else")]
    #[token("ELSE")]
    Else,

    #[token("repeat")]
    #[token("REPEAT")]
    Repeat,

    #[token("times")]
    #[token("TIMES")]
    Times,

    #[token("until")]
    #[token("UNTIL")]
    Until,

    #[token("for")]
    #[token("FOR")]
    For,

    #[token("each")]
    #[token("EACH")]
    Each,

    #[token("in")]
    #[token("IN")]
    In,

    // procedure keywords
    #[token("procedure")]
    #[token("PROCEDURE")]
    Procedure,

    #[token("return")]
    #[token("RETURN")]
    Return,

    // cmp keywords
    #[token("not")]
    #[token("NOT")]
    Not,

    #[token("and")]
    #[token("AND")]
    And,

    #[token("or")]
    #[token("OR")]
    Or,
}
@jeertmans
Copy link
Collaborator

Hello! Thanks for reporting this bug!

Could you try reducing the number of enum variants, until you identify the variant that causes this issue? And report a MWE here.

@snowfoxsh
Copy link
Author

I will do that tonight :)

@5225225
Copy link

5225225 commented Feb 14, 2024

use logos::Logos;

#[derive(Logos)]
pub enum Token {
    #[regex("(0+)*x?.0+", |_| ())]
    Float,
}

fn main() {}

Minimized this. Couldn't get the regex to be any smaller than it is now, but trimmed down the enum variants.

@jeertmans
Copy link
Collaborator

Interesting, thanks for your help @5225225!

I managed to reduce it even further:

#[derive(Logos)]
pub enum Token {
    #[regex("(0+)*.0+")]
    Float,
}

Quite interestingly, this pattern fails to compile, but its equivalent (because Logos does not capture groups) succeeds:

#[derive(Logos)]
pub enum Token {
    #[regex("0*.0+")]
    Float,
}

Here are their respective Hir (used internally by Logos):

fn main() {
    let hir = regex_syntax::Parser::new().parse("(0+)*.0+").unwrap();

    println!("{:#?}", hir);
}
Concat(
    [
        Repetition(
            Repetition {
                min: 0,
                max: None,
                greedy: true,
                sub: Capture(
                    Capture {
                        index: 1,
                        name: None,
                        sub: Repetition(
                            Repetition {
                                min: 1,
                                max: None,
                                greedy: true,
                                sub: Literal(
                                    "0",
                                ),
                            },
                        ),
                    },
                ),
            },
        ),
        Class(
            {
                '\0'..='\t',
                '\u{b}'..='\u{10ffff}',
            },
        ),
        Repetition(
            Repetition {
                min: 1,
                max: None,
                greedy: true,
                sub: Literal(
                    "0",
                ),
            },
        ),
    ],
)
fn main() {
    let hir = regex_syntax::Parser::new().parse("0*.0+").unwrap();

    println!("{:#?}", hir);
}
Concat(
    [
        Repetition(
            Repetition {
                min: 0,
                max: None,
                greedy: true,
                sub: Literal(
                    "0",
                ),
            },
        ),
        Class(
            {
                '\0'..='\t',
                '\u{b}'..='\u{10ffff}',
            },
        ),
        Repetition(
            Repetition {
                min: 1,
                max: None,
                greedy: true,
                sub: Literal(
                    "0",
                ),
            },
        ),
    ],
)

But to me, this is weird that the capture is not in the first repetition, but in the second...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants