Skip to content

loloicci/nimly

Repository files navigation

nimly

github_workflow nimble

Lexer Generator and Parser Generator as a Macro Library in Nim.

With nimly, you can make lexer/parser by writing definition in formats like lex/yacc. nimly generates lexer and parser by using macro in compile-time, so you can use nimly not as external tool of your program but as a library.

niml

niml is a macro to generate a lexer.

macro niml

macro niml makes a lexer. Almost all part of constructing a lexer is done in compile-time. Example is as follows.

## This makes a LexData object named myLexer.
## This lexer returns value with type ``Token`` when a token is found.
niml myLexer[Token]:
  r"if":
    ## this part converted to procbody.
    ## the arg is (token: LToken).
    return TokenIf()
  r"else":
    return TokenElse()
  r"true":
    return TokenTrue()
  r"false":
    return TokenFalse()
  ## you can use ``..`` instead of ``-`` in ``[]``.
  r"[a..zA..Z\-_][a..zA..Z0..9\-_]*":
    return TokenIdentifier(token)
  ## you can define ``setUp`` and ``tearDown`` function.
  ## ``setUp`` is called from ``open``, ``newWithString`` and
  ## ``initWithString``.
  ## ``tearDown`` is called from ``close``.
  ## an example is ``test/lexer_global_var.nim``.
  setUp:
    doSomething()
  tearDown:
    doSomething()

Meta charactors are as following:

  • \: escape character
  • .: match with any charactor
  • [: start of character class
  • |: means or
  • (: start of subpattern
  • ): end of subpattern
  • ?: 0 or 1 times quantifier
  • *: 0 or more times quantifire
  • +: 1 or more times quantifire
  • {: {n,m} is n or more and m or less times quantifire

In [], meta charactors are as following

  • \: escape character
  • ^: negate character (only in first position)
  • ]: end of this class
  • -: specify character range (.. can be used instead of this)

Each of followings is recognized as character set.

  • \d: [0..9]
  • \D: [^0..9]
  • \s: [ \t\n\r\f\v]
  • \S: [^ \t\n\r\f\v]
  • \w: [a..zA..Z0..9_]
  • \w: [^a..zA..Z0..9_]

nimy

nimy is a macro to generate a LALR(1) parser.

macro nimy

macro nimy makes a parser. Almost all part of constructing a parser is done in compile-time. Example is as follows.

## This makes a LexData object named myParser.
## first cloud is the top-level of the BNF.
## This lexer recieve tokens with type ``Token`` and token must have a value
## ``kind`` with type enum ``[TokenTypeName]Kind``.
## This is naturally satisfied when you use ``patty`` to define the token.
nimy myParser[Token]:
  ## the starting non-terminal
  ## the return type of the parser is ``Expr``
  top[Expr]:
    ## a pattern.
    expr:
      ## proc body that is used when parse the pattern with single ``expr``.
      ## $1 means first position of the pattern (expr)
      return $1

  ## non-terminal named ``expr``
  ## with returning type ``Expr``
  expr[Expr]:
    ## first pattern of expr.
    ## ``LPAR`` and ``RPAR`` is TokenKind.
    LPAR expr RPAR:
      return $2

    ## second pattern of expr.
    ## ``PLUS`` is TokenKind.
    expr PLUS expr
      return $2

You can use following EBNF functions:

  • XXX[]: Option (0 or 1 XXX). The type is seq[xxx] where xxx is type of XXX.
  • XXX{}: Repeat (0 or more XXX). The type is seq[xxx] where xxx is type of XXX.

Example of these is in next section.

Example

tests/test_readme_example.nim is an easy example.

import unittest
import patty
import strutils

import nimly

## variant is defined in patty
variant MyToken:
  PLUS
  MULTI
  NUM(val: int)
  DOT
  LPAREN
  RPAREN
  IGNORE

niml testLex[MyToken]:
  r"\(":
    return LPAREN()
  r"\)":
    return RPAREN()
  r"\+":
    return PLUS()
  r"\*":
    return MULTI()
  r"\d":
    return NUM(parseInt(token.token))
  r"\.":
    return DOT()
  r"\s":
    return IGNORE()

nimy testPar[MyToken]:
  top[string]:
    plus:
      return $1

  plus[string]:
    mult PLUS plus:
      return $1 & " + " & $3

    mult:
      return $1

  mult[string]:
    num MULTI mult:
      return "[" & $1 & " * " & $3 & "]"

    num:
      return $1

  num[string]:
    LPAREN plus RPAREN:
      return "(" & $2 & ")"

    ## float (integer part is 0-9) or integer
    NUM DOT[] NUM{}:
      result = ""
      # type of `($1).val` is `int`
      result &= $(($1).val)
      if ($2).len > 0:
        result &= "."
      # type of `$3` is `seq[MyToken]` and each elements are NUM
      for tkn in $3:
        # type of `tkn.val` is `int`
        result &= $(tkn.val)

test "test Lexer":
  var testLexer = testLex.newWithString("1 + 42 * 101010")
  testLexer.ignoreIf = proc(r: MyToken): bool = r.kind == MyTokenKind.IGNORE

  var
    ret: seq[MyTokenKind] = @[]

  for token in testLexer.lexIter:
    ret.add(token.kind)

  check ret == @[MyTokenKind.NUM, MyTokenKind.PLUS, MyTokenKind.NUM,
                 MyTokenKind.NUM, MyTokenKind.MULTI,
                 MyTokenKind.NUM, MyTokenKind.NUM, MyTokenKind.NUM,
                 MyTokenKind.NUM, MyTokenKind.NUM, MyTokenKind.NUM]

test "test Parser 1":
  var testLexer = testLex.newWithString("1 + 42 * 101010")
  testLexer.ignoreIf = proc(r: MyToken): bool = r.kind == MyTokenKind.IGNORE

  var parser = testPar.newParser()
  check parser.parse(testLexer) == "1 + [42 * 101010]"

  testLexer.initWithString("1 + 42 * 1010")

  parser.init()
  check parser.parse(testLexer) == "1 + [42 * 1010]"

test "test Parser 2":
  var testLexer = testLex.newWithString("1 + 42 * 1.01010")
  testLexer.ignoreIf = proc(r: MyToken): bool = r.kind == MyTokenKind.IGNORE

  var parser = testPar.newParser()
  check parser.parse(testLexer) == "1 + [42 * 1.01010]"

  testLexer.initWithString("1. + 4.2 * 101010")

  parser.init()
  check parser.parse(testLexer) == "1. + [4.2 * 101010]"

test "test Parser 3":
  var testLexer = testLex.newWithString("(1 + 42) * 1.01010")
  testLexer.ignoreIf = proc(r: MyToken): bool = r.kind == MyTokenKind.IGNORE

  var parser = testPar.newParser()
  check parser.parse(testLexer) == "[(1 + 42) * 1.01010]"

Install

  1. nimble install nimly

Now, you can use nimly with import nimly.

vmdef.MaxLoopIterations Problem

During compiling lexer/parser, you can encounter errors with interpretation requires too many iterations. You can avoid this error to use the compiler option maxLoopIterationsVM:N which is available since nim v1.0.6.

See #11 to detail.

Contribute

  1. Fork this
  2. Create new branch
  3. Commit your change
  4. Push it to the branch
  5. Create new pull request

Changelog

See changelog.rst.

Developing

You can use nimldebug and nimydebug as a conditional symbol to print debug info.

example: nim c -d:nimldebug -d:nimydebug -r tests/test_readme_example.nim