Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation and a couple questions #62

Open
ethindp opened this issue Feb 26, 2023 · 1 comment
Open

Documentation and a couple questions #62

ethindp opened this issue Feb 26, 2023 · 1 comment

Comments

@ethindp
Copy link

ethindp commented Feb 26, 2023

So this library looks really cool and I'd love to use it. However, I'm unsure about the syntax (i.e., I know the general PEG syntax, but what extensions, if any, does this library have? How does the syntax it uses differ from normal PEG (which is really just an extension of ABNF/EBNF), etc.). Also, how does it cope with Unicode? A language I'm struggling to write a compiler for requires Unicode for identifiers, so I need some way of handling that without destroying the world in the process. :D

The examples look pretty neat, but they aren't really enough in terms of describing how the parser works or its limitations. For example, how is whitespace handled? Is it an implicit thing? (The examples would have me believe that this is the case, but it's worth asking here.)

@TheLartians
Copy link
Owner

Hey, thanks for raising the issue (somehow I'm only seeing this now). I definitely agree that this project (which was one of my first) needs way better documentation. Unfortunately I'm currently very short on time for open-source projects, so I need to prioritise projects with more active users.

As for unicode support, I'm pretty sure this library only supports parsing one byte at I time, so if you need unicode you would have to add your own support as a specialised character parser. For UTF-8 encoded strings I think for many use-cases unicode will implicitly work anyways, but I there are probably a bunch of edge-cases that I'm not considering atm.

Whitespaces (or any other separator symbols) can be set as valid tokens that can be parsed between any two rules (essentially transforming the grammar). This still needs to be set explicitly by calling g.setSeparator(<rule>) on the grammar object, where <rule> would define a parser rule for whitespace characters, e.g. g["Whitespace"] << "[\t ]".

Hope this is still somehow relevant to you or whoever stumbles upon it. Good luck with your compiler!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants