Stable v2 release (API changes) #108

alecthomas · 2020-09-07T11:40:57Z

Now that Participle has proven its initial concept, I think it's time to clean up the API. This will be a backwards incompatible change.

Work has started in the v1 branch.

Consolidate on Stateful lexer (1444519)
Optimise performance of the lexer (Runelookup to avoid testing regexps that have no chance to match #111)
Make specifying filename explicit. This removes confusion and ambiguity. (cf6162a)
Get rid of unquoting hacks in text/scanner lexer. (4f53af9)
Clean up error functions. (895f942)
Eliminate internal unquoting and single quote munging from text/scanner based lexer. (4f53af9)
Extend the concept of Pos/EndPos to support capturing the full range of tokens the node matched, including Elide()ed tokens. (2ace05e)
Refactor Mapper to eliminate the need for DropToken. (f82f615)
Capture directly into fields of type lexer.Token and []lexer.Token. (3b1f151)

Maybe:

Extend participle.Elide() support so that elided tokens can be captured explicitly by name (but also see next point).
~~Support streaming tokens from an io.Reader - currently the full input text is read.~~
- ~~Refactor PeekingLexer so it doesn't consume all tokens up front.~~

Once the API is stable, some additional changes would be welcome:

Optimise the parser.
Code generation for lexing (e2b420f).
Code generation for parsing.
Improve error reporting.
Error tolerant parsing.
LSP support? Can this be generalised?
Generate syntax definition files for Textmate etc.?!

Regarding streaming, I'm not convinced this is a worth the considerable extra complexity it will add to the implementation. For comparison, pigeon also does not support streaming.

Additionally, to support the ability to capture raw tokens into the AST, participle will need to potentially buffer all tokens anyway, effectively eliminating the usefulness of streaming. It also vastly increases the complexity of the lexers, requiring three paths (io.Reader, string and []byte), PeekingLexer, etc.

This increased complexity is mainly due to the lookahead branching, and the lexer needs to have a similar implementation to the rewinder RuneReader code (https://play.golang.org/p/uZQySClYrxR). This is because for each branch the state of the lexer has to be stored but also, additionally, as each branch progresses it needs to preserve any new tokens that are buffered so that if the branch is not accepted the parent can remain consistent.

There's also a non-trivial amount of overhead introduced for reading each token, as opposed to the current PeekingLexer which is just an array index.

The text was updated successfully, but these errors were encountered:

ceymard · 2020-09-07T11:43:23Z

Alright, feature request time

Find a way to get the original text of a match
Allow Tokens to be requested even though they're marked as elided
Have Lexer work over a Reader (with buffering) to allow for parsing huge files

hinshun · 2020-09-08T03:15:10Z

@ceymard Perhaps done better in participle, but currently we use io.TeeReader before we pass into the participle parser to keep the original text. We use this to construct error reporting and source mapping:

ceymard · 2020-09-08T07:42:20Z

@hinshun I'm doing doing something similar at the moment ; I just wish for something to get a match easily, without having to resort to that kind of trick.

This speeds up parsing by 5-10%: benchmark old ns/op new ns/op delta BenchmarkEBNFParser-12 143589 129605 -9.74% BenchmarkParser-12 395397 375403 -5.06% BenchmarkParticipleThrift-12 202280 191766 -5.20% BenchmarkParser-12 7724639 7114586 -7.90% See #108.

This includes tokens elided by Elide(), but not tokens elided by the Lexer. See #108.

alecthomas · 2020-09-20T05:54:35Z

This functionality is now included natively. Any node with a field Tokens []lexer.Token will now be populated with the full set of tokens used to parse that node. There's an example in the tests here.

alecthomas · 2020-09-20T22:55:23Z

You can also now capture directly into a field of type lexer.Token rather than string (for example).

hinshun · 2020-09-21T04:16:18Z

Do the Tokens include the ones from nested structs if the nested structs also have Tokens []lexer.Token?

alecthomas · 2020-09-21T06:20:48Z

Yes they do.

ceymard · 2020-09-21T06:55:50Z

Do they include the elided ones as well ?

alecthomas · 2020-09-21T07:03:46Z

Yep!

This speeds up parsing by 5-10%: benchmark old ns/op new ns/op delta BenchmarkEBNFParser-12 143589 129605 -9.74% BenchmarkParser-12 395397 375403 -5.06% BenchmarkParticipleThrift-12 202280 191766 -5.20% BenchmarkParser-12 7724639 7114586 -7.90% See #108.

This includes tokens elided by Elide(), but not tokens elided by the Lexer. See #108.

This speeds up parsing by 5-10%: benchmark old ns/op new ns/op delta BenchmarkEBNFParser-12 143589 129605 -9.74% BenchmarkParser-12 395397 375403 -5.06% BenchmarkParticipleThrift-12 202280 191766 -5.20% BenchmarkParser-12 7724639 7114586 -7.90% See #108.

This includes tokens elided by Elide(), but not tokens elided by the Lexer. See #108.

alecthomas changed the title ~~Stable v1 API changes~~ Stable v1 release (API changes) Sep 7, 2020

alecthomas modified the milestone: v1 Sep 7, 2020

alecthomas added a commit that referenced this issue Sep 20, 2020

Support capturing all tokens into the AST.

2ace05e

This includes tokens elided by Elide(), but not tokens elided by the Lexer. See #108.

alecthomas added a commit that referenced this issue Nov 26, 2020

Support capturing all tokens into the AST.

f9c3ae4

This includes tokens elided by Elide(), but not tokens elided by the Lexer. See #108.

alecthomas added a commit that referenced this issue Nov 26, 2020

Support capturing all tokens into the AST.

08fcbdd

This includes tokens elided by Elide(), but not tokens elided by the Lexer. See #108.

alecthomas changed the title ~~Stable v1 release (API changes)~~ Stable v2 release (API changes) Nov 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stable v2 release (API changes) #108

Stable v2 release (API changes) #108

alecthomas commented Sep 7, 2020 •

edited

ceymard commented Sep 7, 2020

hinshun commented Sep 8, 2020 •

edited

ceymard commented Sep 8, 2020

alecthomas commented Sep 20, 2020

alecthomas commented Sep 20, 2020

hinshun commented Sep 21, 2020

alecthomas commented Sep 21, 2020

ceymard commented Sep 21, 2020

alecthomas commented Sep 21, 2020

Stable v2 release (API changes) #108

Stable v2 release (API changes) #108

Comments

alecthomas commented Sep 7, 2020 • edited

ceymard commented Sep 7, 2020

hinshun commented Sep 8, 2020 • edited

ceymard commented Sep 8, 2020

alecthomas commented Sep 20, 2020

alecthomas commented Sep 20, 2020

hinshun commented Sep 21, 2020

alecthomas commented Sep 21, 2020

ceymard commented Sep 21, 2020

alecthomas commented Sep 21, 2020

alecthomas commented Sep 7, 2020 •

edited

hinshun commented Sep 8, 2020 •

edited