Skip to content

A syntax parser based on the LLLR method

License

Notifications You must be signed in to change notification settings

KarboniteKream/syn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

syn

A syntax parser based on the LLLR method.

Requirements

  • Rust 1.56.0 or later

Usage

syn <INPUT> -g GRAMMAR [-p lllr] [-o OUTPUT]

The optional argument -o specifies the desired output file for a graph in the DOT language. This is only available with the LR parser.

Grammar

Grammar files are defined using the TOML format.

Header

The header contains the following entries:

  • name: Name of the grammar.
  • description: An optional description of the grammar. Defaults to the canonical path to the grammar file.
  • start_symbol: Start symbol of the grammar. Defaults to first rule in [rules].

Example:

name = "grammar"
description = "Example grammar for README"
start_symbol = "S"

Rules

The production rules are described in the [rules] table. A production can either be a single string, or an array of strings, each representing the possible rules for the specific grammar symbol. When parsing the grammar file, a single string is converted to an array with one element.

To represent an ϵ production, use an empty string. The symbols and rules can be in any order.

Example:

[rules]
# S → A B 'c' | 'a' A B 'b'
S = [
    "A B c",
    "a A B b",
]

# A → 'a' | ϵ
A = [
    "a",
    "",
]

# B → 'b'
B = "b"

Tokens

Regular expressions to match tokens during lexical analysis are described in the [tokens] table. The patterns need to be properly escaped and written in a way that allows partial matching for the incremental lexical analysis. You can specify a list of strings to match with normal text instead.

Matching precedence is defined by the order of the regular expressions.

Example:

[tokens]
a = [
    "true",
    "false",
]

b = "'[A-Z\\x61-\\x7A_]*('|$)"
c = "[0-9]+"

Ignored tokens

Regular expressions in the [ignore] table define tokens that are ignored during syntax analysis. The patterns need to follow the rules for the [tokens] table.

Example:

[ignore]
whitespace = "[ \t\r\n]*"
comment = "#.*(\n|$)"

Actions

The [actions] table specifies which action to prefer when a Shift/Reduce conflict occurs. This avoids issues like the dangling else. Allowed values are shift and reduce.

Example:

[actions]
a = "shift"