Skip to content
/ Lucix Public

A toy programming language compiled to LLVM IR (<200 LoC, dumbed down for educational)

License

Notifications You must be signed in to change notification settings

quangIO/Lucix

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

How to make a (toy) programming language (Part 0.0: meta basic)

How a compiler work?

+--------------------------------+                  +--------------------------------------------------------------------------------------------------------------------------+
|                                |                  |                                                                                                                          |
|                                |                  |                                                                                                                          |
|      Human readable code       |----------------->|  +--------------------------+ +--------------------------+  +--------------------------+  +--------------------------+   |
|                                |                  |  |                          | |                          |  |                          |  |                          |   |
|                                |                  |  |                          | |                          |  |                          |  |                          |   |
+--------------------------------+                  |  |         Lexing           | |         Parsing          |  |       Optimization       |  |  Emitting instructions   |   |
                                                    |  |                          | |                          |  |                          |  |                          |   |
                                                    |  |                          | |                          |  |                          |  |                          |   |
                                                    |  +--------------------------+ +--------------------------+  +--------------------------+  +--------------------------+   |
                                                    |                                                                                                                          |
+--------------------------------+                  |      Break text into token        Tokens --> parse tree         Run different                Turn everything into        |
|                                |                  |                                   representing language         transformations to           (virtual) machine code      |
|                                |                  |                                   constructs                    optimize the code                                        |
|           Profit???            |<-----------------|                                                                                                                          |
|                                |                  |                                                                                                                          |
|                                |                  |                                                                                                                          |
+--------------------------------+                  +--------------------------------------------------------------------------------------------------------------------------+

Write a lexer & parser?

There are many excellent tutorials on the Internet about those things. However, if you want to write one, an easy option is looking into recursive descent parsers. In this tutorial, we’ll make it even easier: just use Antlr to generate our parser. The flow now becomes: define our grammar -> (Antlr generates the parser & the abstract syntax tree[AST]) -> traverse AST -> make optimization -> emit code.

Write an optimizer and assembler?

This seems hard. Thankfully, there is LLVM which will do those hard works for us. So, we only need to do:

define our grammar -> (Antlr generates the parser & the abstract syntax tree[AST]) -> traverse AST, emmit LLVM IR -> (LLVM does the magic) -> profit.

LLVM IR for lazy people

How to use LLVM IR to construct X, where X = functions, for loops, variable assignments, if else statements…

Just write X in C and compile the program into LLVM IR (clang -S -emit-llvm somefile.c). For example:

// somefile.c
int func(int a) {
  return a;
}

int main() {
  int x = func(2);
  if (x > 3) {
    return 1;
  } else {
    return 2;
  }
}
; result LLVM IR
define i32 @func(i32) #0 {
  %2 = alloca i32, align 4
  store i32 %0, i32* %2, align 4
  %3 = load i32, i32* %2, align 4
  ret i32 %3
}

; Function Attrs: noinline nounwind optnone
define i32 @main() #0 {
  %1 = alloca i32, align 4
  %2 = alloca i32, align 4
  store i32 0, i32* %1, align 4
  %3 = call i32 @func(i32 2)
  store i32 %3, i32* %2, align 4
  %4 = load i32, i32* %2, align 4
  %5 = icmp sgt i32 %4, 3
  br i1 %5, label %6, label %7

; <label>:6:                                      ; preds = %0
  store i32 1, i32* %1, align 4
  br label %8

; <label>:7:                                      ; preds = %0
  store i32 2, i32* %1, align 4
  br label %8

; <label>:8:                                      ; preds = %7, %6
  %9 = load i32, i32* %1, align 4
  ret i32 %9
}

Having trouble wrap your head around this? This guide is useful (and short) https://mapping-high-level-constructs-to-llvm-ir.readthedocs.io/en/latest/

Sound easy enough right? Let’s do it

Very small code for reference: https://github.com/quangio/Lucix

About

A toy programming language compiled to LLVM IR (<200 LoC, dumbed down for educational)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published