Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic when stateful lexer's non-Root rule has optional group but captures nothing #324

Open
7sDream opened this issue Mar 15, 2023 · 3 comments

Comments

@7sDream
Copy link

7sDream commented Mar 15, 2023

Above example shows a common design: assume we have some keyword (x in this example), so it can not be parsed as a Ident in whole program, we want use a special symbol % to remove this limit: if prefix with %, allow use a keyword as ident.

When testing it, aa + bb is parsed succefully, x + y will failed because x is a keyword, not a ident, as excepted.

Next it parse %x + y, and panic.

If we change pattern of ident to [[:alpha:]][[:alnum:]]* (Just remove the inner group, or add ?: make it non-capture), it will works fine.

Test code
package main

import (
    "fmt"

    "github.com/alecthomas/participle/v2"
    "github.com/alecthomas/participle/v2/lexer"
)

type Plus struct {
    Lhs string `parser:"@Ident"`
    Op  string `parser:"@'+'"`
    Rhs string `parser:"@Ident"`
}

func main() {
    parser := participle.MustBuild[Plus](
	    participle.Lexer(lexer.MustStateful(lexer.Rules{
		    "Root": {
			    {"whitespace", ` +`, nil},
			    {"Op", `\+`, nil},
			    {"Keyword", `x`, nil},
			    {"Ident", `[[:alpha:]]([[:alnum:]])*`, nil},
			    {"percent", `%`, lexer.Push("Percent")},
		    },
		    "Percent": {
			    {"Ident", `[[:alpha:]]([[:alnum:]])*`, lexer.Pop()},
		    },
	    })),
    )

    ast, err := parser.ParseString("input", "aa + bb")
    fmt.Printf("ast: %#v, err: %#v\n\n", ast, err)

    ast, err = parser.ParseString("input", "x + y")
    fmt.Printf("ast: %#v, err: %#v\n\n", ast, err)

    ast, err = parser.ParseString("input", "%x + y")
    fmt.Printf("ast: %#v, err: %#v\n\n", ast, err)
}

Maybe caused by here, the pattern of Ident call FindStringSubmatchIndex with data x + y will return a [0, 1, -1, -1], the last two -1 means the inner group ([[:alnum:]]) never captures:

The simplest way to fixes this may be add a guard:

if match[i] >= 0 {
	groups = append(groups, l.data[match[i]:match[i+1]])
}

But I'm not sure how this group intergrate with Action interface, may be it needs some type to indicate the empty capture? So I just open this issue, not a PR.

@alecthomas
Copy link
Owner

I think this is probably occurring because Participle allows you to use references like \1 in the destination state, to refer to captured groups from the parent state. Because you have a capture group that doesn't capture anything, it's failing. It needs to cater to that case though.

@alecthomas
Copy link
Owner

Mm actually sorry, I misread. That might not be it.

Try using a non-capturing group and see if that fixes it: (?:[[:alnum:]]*)

@7sDream
Copy link
Author

7sDream commented Mar 16, 2023

If we change pattern of ident to [[:alpha:]][[:alnum:]]* (Just remove the inner group, or add ?: make it non-capture), it will works fine.

Yes, change pattern to non-capture will fix this, actually, it is the workaround I'm using in my real project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants