Skip to content

NYU programming languages homework 1 project: build an EBNF-to-BNF Bison rule translator using Flex and Bison.

Notifications You must be signed in to change notification settings

dizys/pl-homework-1

Repository files navigation

Programming Languages Homework 1

An EBNF-to-BNF Bison rule translator built using Flex and Bison. Actions are completely ignored during translation.

Supported extended expressions:

  • <A> or A*: zero or more occurrences of A
  • A+: one or more occurrences of A
  • [A]: zero or one occurrences of A
  • (A||B): grouping, in which an or operator should appear as || instead of a single |

1. Project structure

project
├─.vscode
│  └─settings.json                  VSCode workspace settings
│
├─dist                          Build artifacts (created after build)
│  ├─bin
│  │  └─trans                           Executable binary of translator
│  │
│  ├─c_src                          Folder to hold generated C codes
│  │  ├─trans.tab.c                     Parser C code generated by Bison
│  │  ├─trans.tab.h                     Parser C header generated by Bison
│  │  └─trans.yy.c                      Tokenizer C code generated by Flex
│  │
│  └─package                        Folder to hold package for submission
│
├─examples                      Bison-with-EBNF Examples used to test the translator
│  └─...
│
├─src                           Translater source folder
│  ├─trans.l                        Tokenizer source in Flex
│  ├─trans.y                        Parser source in Bison
│  ├─treenode.cpp                   AST treenode source in C++
│  └─treenode.h                     AST treenode header file
│
├─load_modules.sh               Shell script for fast loading Flex and Bison
├─make.sh                       Shell script for easily building the translator
├─package.sh                       Shell script for building package for submission
├─run.sh                Shell script for easily running examples
│
...

2. Build

This project requires Bison 3.5+ and Flex 2.5+ to build. It's developed with Bison 3.7.1 and Flex 2.6.4.

To build the translator, simply run the following shell script via CLI:

./make.sh

Then, the executable binary should be available at dist/bin/trans

3. Usage

3.1 dist/bin/trans

When running, the translator binary reads from standard input (stdin) and output its translation to standard output (stdout).

3.1.1 Direct input and output (not recommended)

You can type all the content that needs to be translated and end it with Ctrl+D. But as is shown by the following example, the output would be mixed up with the input, which is messy.

$ ./dist/bin/trans
%%
a: digit+;
%%

%%

a:
    a_quant
  ;

a_quant:
    digit
  |
    digit a_quant
  ;

%%

3.1.2 Input file content through pipe (recommended)

$ cat examples/cal1.y | ./dist/bin/trans

3.1.3 Redirect output to file (recommended)

$ cat example/cal1.y | ./dist/bin/trans > example/cal1.out.y

3.2 Example runner / helper script: run.sh

This project includes 7 example bison source files, 2 with BNF rules (cal1.y, cal2.y), the others with EBNF rules (test1.y, test2.y, test3.y, test4.y, test5.y).

To translate an example file or any file, you can try:

$ ./run.sh ./examples/test1.y

Output translation result to a file:

$ ./run.sh ./examples/test1.y > ./examples/test1.out.y

./examples/test1.y could be replaced with any path to a valid Bison file with BNF/EBNF rules.

4. Limitations

4.1 Actions are ignored

Because new productions could be generated during translation, and it's difficult to match original actions to new productions, actions are completely ignored during translation, and the translation result would contain no action code blocks.

4.2 Nested blocks in actions are not supported

Because it's impossible to represent the recursive structure of nested blocks using a single standard regular expression, action block with only one level of matching curly brackets can be matched by the regular expression. A possible solution would be further defining C/C++ tokens and grammar structure in Flex and Bison. But that would be out of scope for this project.

For example, the following file would throw an error when being translated.

%%

digit:
    '2' {
      if (flag) {
        flag = 0;
        $$ = new node($1);
      } // if block cannot be matched
    }
  |
    '3'
  ;

%%

4.3 Generated rules are not optimal

The generated rules are correct BNF rules, but sometimes can be too verbose and even with meaningless nodes.

For example, a possible translation result could be as follow:

%%
digit:
    digit_expr
  ;

digit_expr:
    'a'
  |
    'b'
  ;
%%

In this example, non-terminal digit_expr is completely unnecessary as digit_expr here is just an alias for digit.

This is due to how the Bison rules are defined in this project to support EBNF . Because digit_expr could also be used inside, for example, a repetition rule like number -> digit_expr+ where digit_expr is a necessary non-terminal for it to be translated into BNF rule number -> digit_expr | digit_expr number.

Optimizations can be done to eliminate redundancies like this. For example, by visiting the AST, find out rules like A -> B where B is only used here. Then just delete this rule and rename B as A. Unfortunately due to time limit, this has not been implemented yet.