Skip to content

glebec/functional-math-compiler

Repository files navigation

Functional JS Arithmetic Expression Compiler

Compiler exercise for education. Parsing infix arithmetic expressions via LL(1) recursive descent, adhering to functional approaches.

Features

  • Pure functions only: no side effects (apart from STDOUT), no mutations, all const etc.
  • Uses daggy for tagged unions (aka sum types)
    • Enables type checking e.g. Token.Number.is(x) and Token.is(x)
    • Enables simple pattern matching / catamorphisms e.g. someToken.cata({ Mul: () => true, Div: () => false }) with automatic errors in case of match failure
  • Uses Immutable.js List for functional collections
    • Enables efficient unshift, push, and slice
    • Immutability provides strong reasoning benefits

Use

yarn # installs
yarn test # passes
yarn start '-9 * 2 / -(3 + 7) + ((-4 * 1/2) - -21)' # outputs RPN string
yarn start '-9 * 2 / -(3 + 7) + ((-4 * 1/2) - -21)' --eval # outputs num

Supports

  • Integers
  • Negation
  • Addition
  • Subtraction
  • Multiplication
  • Division
  • Grouping (parens)

Contents

Lexer

Converts raw input string to an Immutable.js List of Daggy types (tokens):

  • Number (string of digits)
  • Lparen
  • Rparen
  • Mul
  • Div
  • Add
  • Sub

Parser

Converts list of tokens to a parse tree, aka concrete syntax tree:

  • Epsilon
  • Factor (sign, child)
  • F2 (op, factor, childF2)
  • T2 (op, term, childT2)
  • Term (factor, childF2)
  • Expression (term, childT2)

Note that a CST / PT preserves all or almost all of the syntax as represented by the language's grammar, whereas an Abstract Syntax Tree (AST) reduces the information to the bare minimum necessary for a given use case. Currently our CST simplifies parenthetical expressions, though we could make it preserve parens.

Generator

Dispatches based on node type to recursively process the parse tree. Two generators included:

  • an infix -> RPN compiler
  • a numerical evaluator

The latter generator can be chosen from the command line by appending the --eval flag.

Grammar

A grammar is the set of production rules for a language; that is, substitutions which eventually generate any valid string in the language. A simple grammar for expressions (starting from E) is:

E -> (E)
E -> E + E
E -> E * E
E -> number

Unfortunately, while the above grammar is valid (producing sentences in the language), it is also ambiguous (could be represented by different parse trees, affecting the semantic meaning of the expression). The following is an unambiguous grammar:

E -> E + T
E -> T
T -> T ∗ F
T -> F
F -> (E)
F -> number

Unfortunately, this grammar is left-recursive, which makes it impossible to parse via LL recursive descent (one of the easiest parsers to implement). The following refactors the grammar to be parsed via an LL(1) algorithm:

E  ->   T T2
T2 -> + T T2
T2 -> ε
T  ->   F F2
F2 -> * F F2
F2 -> ε
F  -> (E)
F  -> number

A formal EBNF notation of a similar grammar is included in the docs folder, with the augmentation that we allow for multiplication, subtraction, and negative factors.

Resources