Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Custom Rules #365

Open
meln5674 opened this issue Sep 23, 2023 · 9 comments
Open

Feature Request: Custom Rules #365

meln5674 opened this issue Sep 23, 2023 · 9 comments

Comments

@meln5674
Copy link

Background: I am working on an experimental language where, for the most part, simple regular expression rules are capable of lexing the source, but after certain tokens, the next token requires complex logic to identify before returning to the regular expression rules.

Problem: As far as I can tell, this is impossible without defining my own lexer.Definition and lexer.Lexer instances from scratch, and it is not possible to re-use the existing functionality from lexer.StatefulDefinition, without simply copy-pasting it.

Proposed Solution: Extend StatefulDefinition's Rule (or make StatefulDefinition a subset of a more comprehensive API) to anything that is sufficiently "regex-like", that is, can accept the parent state's name and captured groups, as well as the input data, then either terminate the lex, report no match, or report the end point of the next token and the action to take.

I have a very rudimentary proof-of-concept here, which is not backwards-compatible, breaks all of the tests, and isn't particularly well-written, but nonetheless works.

Would you be interested in working together to implement this in a way consistent with the current API, or would you prefer I maintain my own fork?

@alecthomas
Copy link
Owner

I'd like to see an example of some of the syntax you're referring to first.

@meln5674
Copy link
Author

Unfortunately, I can't give concrete examples, as the project isn't open source (yet). Without giving too much away, consider a heredoc-like syntax where the A) the inner language is not expressible as a regular grammar, and B) the heredoc terminator is only accepted if it is located in certain points within the sub-language, otherwise, it consumed as part of the sub-language, and there must be another terminator located elsewhere. As a result, once the heredoc starts, there has to be custom logic to figure out where it ends, and then to validate that what's in between is even allowable, and if not, lexing (not parsing) terminates. If it weren't for point (B), a .* with a backreference could probably capture it, but without knowing if that opaque string is valid or not means it can't be correctly checked as a token or not, and capturing too early may result in an invalid lex.

@alecthomas
Copy link
Owner

I'm not necessarily opposed to the stateful lexer being extensible, but I won't accept a backward breaking change. From briefly looking at the your code, I would suggest looking at extending Action to support your use case.

That said, without any concrete examples/tests showing use-cases, I won't accept it either.

@meln5674
Copy link
Author

Of course. Like I mentioned, this was a quick "What if?", and any actual PR I would submit would be backward compatible, with documentation, test coverage, and no regressions.

Given that none of the methods of Action are exported, I'm not sure I follow you suggestion, and even looking at the unexported method, I don't see a simple way to have it generate additional tokens, but perhaps I misunderstand. Are you suggesting to export Action's method, and modify it to optionally return tokens as well as modify the state?

@alecthomas
Copy link
Owner

alecthomas commented Sep 24, 2023

I'm proposing you extend the private Action interface to support your requirements, or add another optional interface similar to how RulesActions works. Then expose that functionality via a public function similar to the existing ones, such as Pop, etc.

@alecthomas
Copy link
Owner

Ah, and rules must also be serialisable to JSON.

@alecthomas
Copy link
Owner

alecthomas commented Sep 24, 2023

Before you do anything, you should extract a representative example (obfuscated if necessary) and include it in this issue.

And perhaps an example of how you would use this proposed new functionality to lex it.

@meln5674
Copy link
Author

On the serialization note, is that just for diagnostic purposes, or does it need to be able to round-trip? My goal is to be able to inject an arbitrary function to execute, like in the linked fork, which obviously wouldn't be able to round-trip without having some sort of global lookup table to register functions to on initialization.

@alecthomas
Copy link
Owner

It needs to be able to round-trip, but I think for this case it could just return an error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants