An example showing significant whitespace? #643

Geordi7 · 2023-12-22T07:04:03Z

I'm having difficulty creating a parser for a language like Pug, I haven't tried using an external lexer, but I have a sneaking suspicion it is necessary.

Can you provide an example which shows how to do it?

TekuConcept · 2024-03-30T19:05:26Z

significant whitespace

As in multiple contiguous whitespace characters?

OMS -> [\s]:* # optional multi-line whitespace
RMS -> [\s]:+ # required multi-line whitespace

Geordi7 · 2024-04-02T10:18:53Z

No, as in scopes delimited by indented sections of text (al a pug python haskell, etc.)

TekuConcept · 2024-04-02T23:57:40Z

as in scopes delimited by indented sections of text (al a pug python haskell, etc.)

Ah, so indent / dedent... that will be a context-aware parsing solution.

Use local state

You could get away with creating and updating a local context in the grammar post-processing step, eg.

LINES
    -> LINES RBS LINE {% d => {
        // where d[0] is the state object
        d => updateState(d)
    } %}
    |  LINE {% d => createState(d) %}
    
RBS -> OWS LF OMS # required break space
OMS -> [\s]:*     # optional multi-line space
OWS -> [ \t\r]:*  # optional white space
LF -> "\n"

This technique, however, will pose a few challenges and limitations, but it's one way to go about this without creating your own lexer.

Use a custom lexer

This may perhaps be the more trivial way of parsing indent / dedent - as your sneaking suspicion was hinting to. (Haven't tried it myself yet.) I found the following on moo's issue tracker for context-aware indent / dedent parsing: no-context/moo#55 with the last link (moo-indentation-lexer) being the one you probably want.

Then according to the nearley docs:

@{%
    const moo = require("moo")
    const IndentationLexer = require('moo-indentation-lexer')

    // Create a lexer from rules
    const mooLexer = moo.compile({ ... })
    // Create an indentation-aware lexer using the lexer
    const lexer = new IndentationLexer({ lexer: mooLexer })
%}

# Pass your lexer object using the @lexer option:
@lexer lexer

BLOCK -> HEADING %indent STATEMENTS %dedent

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

An example showing significant whitespace? #643

An example showing significant whitespace? #643

Geordi7 commented Dec 22, 2023

TekuConcept commented Mar 30, 2024

Geordi7 commented Apr 2, 2024

TekuConcept commented Apr 2, 2024

An example showing significant whitespace? #643

An example showing significant whitespace? #643

Comments

Geordi7 commented Dec 22, 2023

TekuConcept commented Mar 30, 2024

Geordi7 commented Apr 2, 2024

TekuConcept commented Apr 2, 2024

Use local state

Use a custom lexer