Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thread 'rustc' has overflowed it's stack #313

Open
ethindp opened this issue Jun 7, 2023 · 5 comments
Open

Thread 'rustc' has overflowed it's stack #313

ethindp opened this issue Jun 7, 2023 · 5 comments

Comments

@ethindp
Copy link

ethindp commented Jun 7, 2023

Not really sure what's wrong with my regexes. I tested them in Python and they worked fine, so I'm pretty sure they're valid under the rules of the regex crate too. (I also don't get any info about what Rustc was doing when it overflowed.) Here's my full Lexer enum:

use logos::Logos;

#[derive(Logos, Debug, PartialEq, PartialOrd, Eq, Ord, Clone, Copy)]
pub enum Token {
    #[regex(r#"[\p{Zl}\p{Zp}\p{Zs}\x0A\x0B\x0C\x0D\x85]"#)]
    Whitespace,
    #[regex(r#"[\p{Cf}]"#)]
    Format,
    #[regex(r#"[\p{Lu}\p{Ll}\p{Lt}\p{Lm}\p{Lo}\p{Nl}][\p{Lu}\p{Ll}\p{Lt}\p{Lm}\p{Lo}\p{Nl}\p{Mn}\p{Mc}\p{Nd}\p{Pc}]*"#)]
    Identifier,
    #[regex(r#"[0-9]([0-9_]?)*(\.[0-9]([0-9_]?)*)?([eE][+-]?[0-9]([0-9_]?)*)?"#)]
    DecimalLiteral,
    #[regex(r#"([0-9]?)*[0-9_]#([0-9A-Fa-f]?)[0-9A-Fa-f_]*(\.([0-9A-Fa-f]?)[0-9A-Fa-f_]*)?#([eE][+-]?([0-9]?)[0-9_]*)?"#)]
    BasedLiteral,
    #[regex(r#"'[\pL\pM\pN\pP\pS\p{Zs}]'"#)]
    CharacterLiteral,
    #[regex(r#""(("")|[^"])*""#)]
    StringLiteral,
    #[regex(r#"--[^\n]*"#)]
    Comment,
    #[regex(r#"[&-/:->@\[\]|]"#)]
    SimpleDelimiter,
    #[regex(r#"(?:\*\*|\.\.|[/:]=|<[<->]|=>|>[=>])"#)]
    CompoundDelimiter,
    // Keywords
    #[regex(r#"(?i)abort"#)]
    Abort,
    #[regex(r#"(?i)abs"#)]
    Abs,
    #[regex(r#"(?i)abstract"#)]
    Abstract,
    #[regex(r#"(?i)accept"#)]
    Accept,
    #[regex(r#"(?i)access"#)]
    Access,
    #[regex(r#"(?i)aliased"#)]
    Aliased,
    #[regex(r#"(?i)all"#)]
    All,
    #[regex(r#"(?i)and"#)]
    And,
    #[regex(r#"(?i)array"#)]
    Array,
    #[regex(r#"(?i)at"#)]
    At,
    #[regex(r#"(?i)begin"#)]
    Begin,
    #[regex(r#"(?i)body"#)]
    Body,
    #[regex(r#"(?i)case"#)]
    Case,
    #[regex(r#"(?i)constant"#)]
    Constant,
    #[regex(r#"(?i)declare"#)]
    Declare,
    #[regex(r#"(?i)delay"#)]
    Delay,
    #[regex(r#"(?i)delta"#)]
    Delta,
    #[regex(r#"(?i)digits"#)]
    Digits,
    #[regex(r#"(?i)do"#)]
    Do,
    #[regex(r#"(?i)else"#)]
    Else,
    #[regex(r#"(?i)elsif"#)]
    Elsif,
    #[regex(r#"(?i)end"#)]
    End,
    #[regex(r#"(?i)entry"#)]
    Entry,
    #[regex(r#"(?i)exception"#)]
    Exception,
    #[regex(r#"(?i)exit"#)]
    Exit,
    #[regex(r#"(?i)for#"#)]
    For,
    #[regex(r#"(?i)function"#)]
    Function,
    #[regex(r#"(?i)generic"#)]
    Generic,
    #[regex(r#"(?i)goto"#)]
    Goto,
    #[regex(r#"(?i)if"#)]
    If,
    #[regex(r#"(?i)in"#)]
    In,
    #[regex(r#"(?i)interface"#)]
    Interface,
    #[regex(r#"(?i)is"#)]
    Is,
    #[regex(r#"(?i)limited"#)]
    Limited,
    #[regex(r#"(?i)loop"#)]
    Loop,
    #[regex(r#"(?i)mod"#)]
    Mod,
    #[regex(r#"(?i)new"#)]
    New,
    #[regex(r#"(?i)not"#)]
    Not,
    #[regex(r#"(?i)null"#)]
    Null,
    #[regex(r#"(?i)of"#)]
    Of,
    #[regex(r#"(?i)or#"#)]
    Or,
    #[regex(r#"(?i)others"#)]
    Others,
    #[regex(r#"(?i)out"#)]
    Out,
    #[regex(r#"(?i)overriding"#)]
    Overriding,
    #[regex(r#"(?i)package"#)]
    Package,
    #[regex(r#"(?i)parallel"#)]
    Parallel,
    #[regex(r#"(?i)pragma"#)]
    Pragma,
    #[regex(r#"(?i)private"#)]
    Private,
    #[regex(r#"(?i)procedure"#)]
    Procedure,
    #[regex(r#"(?i)protected"#)]
    Protected,
    #[regex(r#"(?i)raise"#)]
    Raise,
    #[regex(r#"(?i)range"#)]
    Range,
    #[regex(r#"(?i)record"#)]
    Record,
    #[regex(r#"(?i)rem"#)]
    Rem,
    #[regex(r#"(?i)renames"#)]
    Renames,
    #[regex(r#"(?i)requeue"#)]
    Requeue,
    #[regex(r#"(?i)return"#)]
    Return,
    #[regex(r#"(?i)reverse"#)]
    Reverse,
    #[regex(r#"(?i)select"#)]
    Select,
    #[regex(r#"(?i)separate"#)]
    Separate,
    #[regex(r#"(?i)some"#)]
    Some,
    #[regex(r#"(?i)subtype"#)]
    Subtype,
    #[regex(r#"(?i)synchronized"#)]
    Synchronized,
    #[regex(r#"(?i)tagged"#)]
    Tagged,
    #[regex(r#"(?i)task"#)]
    Task,
    #[regex(r#"(?i)terminate"#)]
    Terminate,
    #[regex(r#"(?i)then"#)]
    Then,
    #[regex(r#"(?i)type"#)]
    Type,
    #[regex(r#"(?i)until"#)]
    Until,
    #[regex(r#"(?i)use"#)]
    Use,
    #[regex(r#"(?i)when"#)]
    When,
    #[regex(r#"(?i)while"#)]
    While,
    #[regex(r#"(?i)with"#)]
    With,
    #[regex(r#"(?i)xor#"#)]
    Xor,
    // Other unicode categories (for CST only), these will trigger errors when parsed
    #[regex(r#"[\pS]"#)]
    Symbol,
}

Any thoughts? (I know that these are really complex regexes, but the EBNF rules they came from are also quite complex.)

@jeertmans
Copy link
Collaborator

Hello,

Logos does not support the whole set of valid regexes, so this may explain. A few already closed (or open) issues are on similar topics, so I suggest you check them.

Logos’ documentation on that topic is quite sparse for the moment, but I hope someday it will be clear what is supported and what isn’t :)

@ethindp
Copy link
Author

ethindp commented Jun 7, 2023

@jeertmans Is there a way around this for now? I'd like to avoid writing my own lexer if I can help it, particularly if my handwritten one would be suboptimal compared to what this would generate.

@jeertmans
Copy link
Collaborator

Did you identify the regex(es) that caused the problem?

@Artikae
Copy link

Artikae commented Jun 21, 2023

For reference, this is the offending pattern. ([0-9]?)*. Logos hangs on some repetitions which contain empty patterns.

@jeertmans
Copy link
Collaborator

What is the purpose of ([0-9]?)*? If I remember correctly, Logos does not support subgroups (it ignores them, I think), so it would be equivalent to [0-9]*.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants