This Raku package provides a (monadic) system of Functional Parsers (FPs).
The package design and implementation follow closely the article "Functional parsers" by Jeroen Fokker, [JF1]. That article can be used as both a theoretical- and a practical guide to FPs.
The package provides both FPs and Extended Backus-Naur Form (EBNF) parsers and interpreters. The reasons for including the EBNF functionalities are:
- EBNF parsing is discussed in [JF1]
- EBNF parsers and interpreters are very good examples of FPs application
- FPs packages implementations in Lua, Mathematica, and R. See these blog posts and [AAp1, AAp2].
Remark: In this document Mathematica and Wolfram Language (WL) are used as synonyms.
- "Functional parsers" article using Haskell, [JF1].
-
Interesting and insightful blog post "List-based parser combinators in Haskell and Raku".
- The corresponding Raku code repository [WVp1] is not (fully) productized.
-
Perl package "Parser::Combinators", [WVp2].
From Zef ecosystem:
zef install FunctionalParsers;
From GitHub:
zef install https://github.com/antononcube/Raku-FunctionalParsers.git
Here is a list of motivations for implementing this package:
- Word-based backtracking
- Elevate the "tyranny" of Raku grammars
- Easier transfer of Raku grammars into other languages
- Monadic parser construction
- Quick, elegant implementation
I had certain assumptions about certain slow parsing with Raku using regexes. For example, is not that easy to specify backtracking over sequences of words (instead of characters) in grammars. To check my assumptions I wrote the basic eight FPs (which is quick to do.) After my experiments, I could not help myself making a complete package.
The "first class citizen" treatment of grammars is one of the most important and unique features of Raku. It is one of the reason why I treat Raku as a "secret weapon."
But that uniqueness does not necessarily facilitate easy utilization or transfer in large software systems. FPs, on the other hand, are implemented in almost every programming language. Hence, making or translating grammars with- or to FPs would provide greater knowledge transfer and integration of Raku-derived solutions.
Having a monadic way of building parsers or grammars is very appealing. (To some people.) Raku's operator making abilities can be nicely utilized.
Remark: The monad of FPs produces Abstract Syntax Trees (ASTs) that are simple lists. I prefer that instead of using specially defined types (as, say, in [WV1, WVp1].) That probably, comes from too much usage of LISP-family programming languages. (Like Mathematica and R.)
The Raku code implementing FPs was quick to write and looks concise and elegant.
(To me at least. I would not be surprised if that code can be simplified further.)
I considered names like "Parser::Combinator", "Parser::Functional", etc. Of course, looked up names of similar packages.
Ultimately, I decided to use "FunctionalParsers" because:
- Descriptive name that corresponds to the title of the article by Jeroen Fokker, [JF1].
- The package has not only parser combinators, but also parser transformers and modifiers.
- The connections with corresponding packages in other languages are going to be more obvious.
- For example, I have used the name "FunctionalParsers" for similar packages in other programming languages (Lua, R, WL.)
I considered to name the directory with EBNF interpreters "Context" or "Contexts", but since "Actions" is used a lot I chose that name.
Remark: In [JF1] the term "contexts" is used.
Make a parser for a family of (two) simple sentences:
use FunctionalParsers :ALL;
my &p1 = (symbol('numerical') «|» symbol('symbolic')) «&» symbol('integration');
# -> @x { #`(Block|6352212853176) ... }
Here we parse sentences adhering to the grammar of the defined parser:
.say for ("numerical integration", "symbolic integration")>>.words.map({ $_ => &p1($_)});
# (numerical integration) => ((() (numerical integration)))
# (symbolic integration) => ((() (symbolic integration)))
These sentences are not parsed:
("numeric integration", "symbolic summation")>>.words.map({ $_ => &p1($_)});
# ((numeric integration) => () (symbolic summation) => ())
Several notation alternatives are considered for the infix operations corresponding to the different combinators and transformers. Here is a table with different notation styles:
Description | set | double | n-ary |
---|---|---|---|
sequential combination | (&) | «&» | ⨂ |
left sequential pick | (<&) | «& | ◁ |
right sequential pick | (&>) | &» | ▷ |
alternatives combination | (⎸) | «⎸» | ⨁ |
function application | (^) | «o | ⨀ |
Consider the parsers:
my &p1 = apply( {1}, symbol('one'));
my &p2 = apply( {2}, symbol('two'));
my &p3 = apply( {3}, symbol('three'));
my &p4 = apply( {4}, symbol('four'));
my &pM = symbol('million');
my &pTh = symbol('things');
# -> @x { #`(Block|6352245759584) ... }
Here are spec examples for each style of infix operators:
# set
my &p = (&p1 (|) &p2 (|) &p3 (|) &p4) (&) (&pM (^) {10**6}) (&) &pTh;
&p('three million things'.words.List).head.tail;
# (3 (1000000 things))
# double
(&p1 «|» &p2 «|» &p3 «|» &p4) «&» &pM «o {10**6} «&» &pTh;
# n-ary
(&p1 ⨁ &p2 ⨁ &p3 ⨁ &p4) ⨂ {10**6} ⨀ &pM ⨂ &pTh
Remark: The arguments of the apply operator ⨀
are "reversed" when compared to the arguments of the operators (^)
and «o
.
For ⨀
the function to be applied is the first argument.
Here is an EBNF grammar:
my $ebnfCode = q:to/END/;
<digit> = '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' ;
<integer> = <digit> , { <digit> } ;
<top> = <integer> ;
END
# <digit> = '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' ;
# <integer> = <digit> , { <digit> } ;
# <top> = <integer> ;
Here generation is the corresponding functional parsers code:
use FunctionalParsers::EBNF;
.say for fp-ebnf-parse($ebnfCode, actions => 'Raku::Code').head.tail;
# my &pDIGIT = alternatives(symbol('0'), symbol('1'), symbol('2'), symbol('3'), symbol('4'), symbol('5'), symbol('6'), symbol('7'), symbol('8'), symbol('9'));
# my &pINTEGER = sequence(&pDIGIT, many(&pDIGIT));
# my &pTOP = &pINTEGER;
For more detailed examples see "Parser-code-generation.md".
Here is an EBNF grammar:
my $ebnfCode2 = q:to/END/;
<top> = <who> , <verb> , <lang> ;
<who> = 'I' | 'We' ;
<verb> = 'love' | 'hate' | { '♥️' } | '🤮';
<lang> = 'Julia' | 'Perl' | 'Python' | 'R' | 'WL' ;
END
# <top> = <who> , <verb> , <lang> ;
# <who> = 'I' | 'We' ;
# <verb> = 'love' | 'hate' | { '♥️' } | '🤮';
# <lang> = 'Julia' | 'Perl' | 'Python' | 'R' | 'WL' ;
Here is generation of random sentences with the grammar above:
.say for fp-random-sentence($ebnfCode2, 12);
# We love R
# We hate WL
# I love Perl
# I hate Julia
# We 🤮 R
# We R
# We Julia
# I hate R
# We ♥️ WL
# I ♥️ ♥️ WL
# We 🤮 WL
# We love Julia
The function fp-ebnf-parse
can produce
Mermaid-JS diagrams
corresponding to grammars with the target "MermaidJS::Graph".
Here is an example:
my $ebnfCode3 = q:to/END/;
<top> = <a> | <b> ;
<a> = 'a' , { 'A' } , [ '1' ];
<b> = 'b' , ( 'B' | '2' );
END
fp-ebnf-parse($ebnfCode3, target=>"MermaidJS::Graph", dir-spec => 'LR').head.tail
graph LR
alt1((or))
NT:top["top"]
seq12((and))
T:A("A")
NT:b["b"]
T:B("B")
T:2("2")
T:a("a")
NT:a["a"]
T:b("b")
rep7((*))
alt14((or))
opt9((?))
T:1("1")
seq5((and))
alt1 --> NT:a
alt1 --> NT:b
NT:top --> alt1
rep7 --> T:A
opt9 --> T:1
seq5 --> |1|T:a
seq5 --> |2|rep7
seq5 --> |3|opt9
NT:a --> seq5
alt14 --> T:B
alt14 --> T:2
seq12 --> |1|T:b
seq12 --> |2|alt14
NT:b --> seq12
Here is a legend:
- The non-terminals are shown with rectangles
- The terminals are shown with round rectangles
- The "conjunctions" are shown in disks
Remark: The Markdown cell above has the parameters result=asis, output-lang=mermaid, output-prompt=NONE
which allow for direct diagram rendering of the obtained Mermaid code in various Markdown viewers (GitHub, IntelliJ, etc.)
Compare the following EBNF grammar and corresponding diagram with the ones above:
my $ebnfCode4 = q:to/END/;
<top> = <a> | <b> ;
<a> = 'a' , { 'A' } , [ '1' ] ;
<b> = 'b' , 'B' | '2' ;
END
fp-grammar-graph($ebnfCode4, dir-spec => 'LR')
graph LR
T:A("A")
rep7((*))
NT:a["a"]
T:B("B")
T:2("2")
seq5((and))
alt12((or))
NT:top["top"]
NT:b["b"]
T:a("a")
seq13((and))
alt1((or))
opt9((?))
T:b("b")
T:1("1")
alt1 --> NT:a
alt1 --> NT:b
NT:top --> alt1
rep7 --> T:A
opt9 --> T:1
seq5 --> |1|T:a
seq5 --> |2|rep7
seq5 --> |3|opt9
NT:a --> seq5
seq13 --> |1|T:b
seq13 --> |2|T:B
alt12 --> seq13
alt12 --> T:2
NT:b --> alt12
The package provides a Command Line Interface (CLI) script for parsing EBNF. Here is its usage message:
fp-ebnf-parse --help
# Usage:
# fp-ebnf-parse <ebnf> [-t|--actions=<Str>] [-n|--parser-name=<Str>] [-p|--rule-name-prefix=<Str>] [-m|--rule-name-modifier=<Str>] [-s|--style=<Str>] -- Generates parser code for a given EBNF grammar.
# fp-ebnf-parse <file> [-t|--actions=<Str>] [-n|--parser-name=<Str>] [-p|--rule-name-prefix=<Str>] [-m|--rule-name-modifier=<Str>] [-s|--style=<Str>] -- Generates parser code for a given EBNF grammar file.
#
# <ebnf> EBNF text.
# -t|--actions=<Str> Actions ('t' for 'target'.) [default: 'Raku::Class']
# -n|--parser-name=<Str> Parser name. [default: 'MyParser']
# -p|--rule-name-prefix=<Str> Rule names prefix. [default: 'p']
# -m|--rule-name-modifier=<Str> Rule names modifier. [default: 'WhateverCode']
# -s|--style=<Str> EBNF style, one of 'G4', 'Inverted', 'Standard', 'Relaxed', or 'Whatever'. [default: 'Whatever']
# <file> EBNF file name.
If mermaid-cli is installed here is an example UNIX shell pipeline with it:
fp-ebnf-parse ./resources/Arithmetic.ebnf -s=relaxed -t=mermaid > diag.md && mmdc -i diag.md -o diag.png -w 1200 && open diag.png
The infix operators have to be reviewed and probably better sets of symbols would be chosen. The challenge is to select operators that are "respected" by the typical Raku IDEs. (I only experimented with Emacs and Comma IDE.)
All EBNF parser functions in FunctionalParsers::EBNF
have apply-transformers that use the attributes of
a dedicated object:
unit module FunctionalParsers::EBNF;
...
our $ebnfActions = FunctionalParsers::EBNF::Actions::Raku::AST.new;
....
By assigning instances of different classes to $ebnfActions
we get different parsing interpretations.
Looking at the Raku EBNF interpreter classes it can be easily seen that each can inherit from a common abstract class. But since the EBNF parsing methods (or attributes that callables) are approximately a dozen one-liners, it seems more convenient to have all class method- and attribute definitions on “one screen.”
graph TD
FPs[[EBNF<br/>Functional Parsers]]
RakuAST[Raku::AST]
RakuClass[Raku::Class]
RakuCode[Raku::Code]
RakuGrammar[Raku::Grammar]
WLCode[WL::Code]
WLGrammar[WL::Grammar]
JavaFuncJ[Java::FuncJ]
JavaANTLR[Java::ANTLR]
Input[/- EBNF code<br/>- Properties/]
PickTarget[Assign context]
Parse[Parse]
QEVAL{Evaluate?}
EVAL[[EVAL]]
Code>Code]
Context[Context object]
Result{{Result}}
Input --> PickTarget
PickTarget -.- WL
PickTarget -.- Java
PickTarget -..- Raku
PickTarget -.-> Context
PickTarget --> Parse
Parse -.- FPs
Context -.-> FPs
Parse --> QEVAL
Parse -.-> Code
QEVAL --> |yes|EVAL
Code -.-> EVAL
EVAL ---> Result
QEVAL ---> |no|Result
subgraph Raku
RakuAST
RakuClass
RakuCode
RakuGrammar
end
subgraph WL
WLCode
WLGrammar
end
subgraph Java
JavaFuncJ
JavaANTLR
end
- TODO Parsing EBNF refactoring & additional features
- DONE Parse any combination of sequence operators
- Initially, only these were parsed:
'a' <& 'b' <& 'c' | 'a' &> 'd';
'a' , 'b' , 'c' | 'a' &> 'd';
- These are also parsed:
'a' , 'b' &> 'c'
'a' <& 'b' &> 'c'
- Initially, only these were parsed:
- DONE Class-based parsers
- DONE From characters
- DONE From tokens
- DONE Themed parsers
- DONE Inheritance based implementation
- DONE "Simpler"
- DONE ANTLR / G4
- DONE Whatever
- TODO "Named" tokens
-
'_?StringQ'
or'_String'
-
'_WordString'
,'_LetterString'
, and'_IdentifierString'
-
'_?NumberQ'
and'_?NumericQ'
-
'_Integer'
-
'Range[*from*, *to*]'
-
- DONE Parse any combination of sequence operators
- TODO Interpreters of EBNF
- DONE Java
- DONE "funcj.parser"
- TODO GraphViz
- TODO DOT
- DONE MermaidJS
- TODO Scala
- TODO built-in
- TODO parsley
- MAYBE Python
- TODO Raku
- DONE AST
- DONE Class
- DONE Code
- DONE Grammar
- TODO Tokenizer (of character sequences)
- Other EBNF styles
- TODO WL
- DONE FunctionalParsers, [AAp1, AAp2]
- [P] TODO GrammarRules
- Implemented to a point, not tested in WL.
- Graph
- Via MermaidJS classes.
- DONE Java
- TODO Translators
- TODO FPs code into EBNF
- Very cool to have, but seems to be a lot of work.
- DONE Raku grammars to FPs
- See the class "Grammar::TokenProcessing::Actions::EBNF" of the package "Grammar::TokenProcessing".
- DONE Stand-alone grammar-graph translation function.
fp-grammar-graph
- TODO FPs code into EBNF
- TODO Extensions
- DONE First-matched alternation
- The standard
alterations
parser is "longest alternation" (in Raku's terms.)
- The standard
- TODO Extra parsers
- DONE
pInteger
- DONE
pNumber
- DONE
pWord
- DONE
pLetterWord
- DONE
pIdentifier
- TODO
pNumberRange
- Other?
- DONE
- TODO Zero-width assertions implementation
- TODO Lookahead
- TODO Lookbehind
- DONE First-matched alternation
- DONE Random sentence generation
- DONE Basic class code
- DONE Preventing infinite recursion
- DONE "Named" tokens interpretation
-
'_?StringQ'
or'_String'
-
'_WordString'
,'_LetterString'
, and'_IdentifierString'
-
'_?NumberQ'
and'_?NumericQ'
-
'_Integer'
-
'Range[*from*, *to*]'
-
- TODO Documentation
- DONE README
- DONE Parser code generation
- TODO Raku
- DONE Class
- TODO Code
- DONE Grammar
- DONE WL
- DONE FunctionalParsers
- DONE GrammarRules
- TODO Java
- TODO Raku
- TODO Random sentences generation
- TODO Mermaid flowchart
- TODO Mermaid class diagram?
- TODO Videos
- TODO Introduction
- TODO TRC-2023 presentation
[JF1] Jeroen Fokker, "Function Parsers", (1997), Conference: Advanced Functional Programming, First International Spring School on Advanced Functional Programming Techniques-Tutorial Text. 10.1007/3-540-59451-5_1.
[WV1] Wim Vanderbauwhede, "List-based parser combinators in Haskell and Raku", (2020), Musings of an Accidental Computing Scientist at codeberg.page.
[AAp1] Anton Antonov, "FunctionalParsers.m", (2014), MathematicaForPrediction at GitHub.
[AAp2] Anton Antonov, "FunctionalParsers" WL paclet, (2023), Wolfram Language Paclet Repository.
[WVp1] Wim Vanderbauwhede, List-based parser combinator library in Raku, (2020), GitHub/wimvanderbauwhede.
[WVp2] Wim Vanderbauwhede, Parser::Combinators Perl package, (2013-2015), GitHub/wimvanderbauwhede.