Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An example showing significant whitespace? #643

Open
Geordi7 opened this issue Dec 22, 2023 · 3 comments
Open

An example showing significant whitespace? #643

Geordi7 opened this issue Dec 22, 2023 · 3 comments

Comments

@Geordi7
Copy link

Geordi7 commented Dec 22, 2023

I'm having difficulty creating a parser for a language like Pug, I haven't tried using an external lexer, but I have a sneaking suspicion it is necessary.

Can you provide an example which shows how to do it?

@TekuConcept
Copy link

significant whitespace

As in multiple contiguous whitespace characters?

OMS -> [\s]:* # optional multi-line whitespace
RMS -> [\s]:+ # required multi-line whitespace

@Geordi7
Copy link
Author

Geordi7 commented Apr 2, 2024

No, as in scopes delimited by indented sections of text (al a pug python haskell, etc.)

@TekuConcept
Copy link

as in scopes delimited by indented sections of text (al a pug python haskell, etc.)

Ah, so indent / dedent... that will be a context-aware parsing solution.

Use local state

You could get away with creating and updating a local context in the grammar post-processing step, eg.

LINES
    -> LINES RBS LINE {% d => {
        // where d[0] is the state object
        d => updateState(d)
    } %}
    |  LINE {% d => createState(d) %}
    
RBS -> OWS LF OMS # required break space
OMS -> [\s]:*     # optional multi-line space
OWS -> [ \t\r]:*  # optional white space
LF -> "\n"

This technique, however, will pose a few challenges and limitations, but it's one way to go about this without creating your own lexer.

Use a custom lexer

This may perhaps be the more trivial way of parsing indent / dedent - as your sneaking suspicion was hinting to. (Haven't tried it myself yet.) I found the following on moo's issue tracker for context-aware indent / dedent parsing: no-context/moo#55 with the last link (moo-indentation-lexer) being the one you probably want.

Then according to the nearley docs:

@{%
    const moo = require("moo")
    const IndentationLexer = require('moo-indentation-lexer')

    // Create a lexer from rules
    const mooLexer = moo.compile({ ... })
    // Create an indentation-aware lexer using the lexer
    const lexer = new IndentationLexer({ lexer: mooLexer })
%}

# Pass your lexer object using the @lexer option:
@lexer lexer

BLOCK -> HEADING %indent STATEMENTS %dedent

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants