fast_brainfuck.lua

High performance brainfuck transpiler/interpreter for Lua with FFI support

⚠️ I strongly advise you to use it with LuaJIT ⚠️

Usage

luajit fast_brainfuck.lua program.bf

luajit fast_brainfuck.lua program.bf output.lua

Benchmark on mandelbrot.bf

Tested on Ryzen 3700x | 32gb ram 2400mhz

Lua version\Implementation	fast_brainfuck.lua	brainfuck.lua transpiler	brainfuck.lua transpiler	bf.lua interpreter	bf for ubuntu
LuaJIT 2.1.0	1.870s	38.309s	34.392s	53.81s	N/A
Lua 5.4	15.242s	294.12s	188.39s	very slow	N/A
Clanc C -Ofast	0.7s	N/A	N/A	N/A	5.8s

2^16 BC ins limit on LuaJIT

LuaJIT had an instruction limit of 2^16 on JMP because of size of register D When the main code chunk is greater or equal to 2^16, it scans for chunks of code it can dispatch to a subfunction. It's only enabled on LuaJIT and turns itself on only when ins count is >= 2^16

Optimization passes

Long story short, I convert brainfuck code to Intermediate Representation that can itself be turned into Lua (it's either way turned into lua at the end so it can be JIT compiled by LuaJIT compiler)

Before being turned into Lua, it gets optimized thru multiple optimisation passes, each optimisation pass has it's own dedicated instruction that got added in the instruction (IR) definitions in the code.

1. Folding

Turning for example

data[i] = data[i] + 1 
data[i] = data[i] + 1 
data[i] = data[i] + 1 
data[i] = data[i] + 1 
data[i] = data[i] + 1 
data[i] = data[i] + 1 
data[i] = data[i] + 1 
data[i] = data[i] + 1 
data[i] = data[i] + 1

into

data[i] = data[i] + 9

2. loop inc/dec to assignation

while data[i] ~= 0 do
	data[i] = data[i] [-/+] 1
end

into

data[i] = 0

3. multiple 0 assignation to memset

	i = i + 1
	data[i] = 0
	i = i + 1
	data[i] = 0
	i = i + 1
	data[i] = 0
	i = i + 1
	data[i] = 0
	i = i + 1
	data[i] = 0
	i = i + 1
	data[i] = 0
	i = i + 1
	data[i] = 0
	i = i + 1
	data[i] = 0
	i = i + 1
	data[i] = 0

into

ffi.fill(data + i, intSize * 9, 0)

4. Unrolled assignation based on another variable

pretty much turning

while data[i] ~= 0 do
  i = i - 1 (jmp1)
  data[i] = data[i] + 3 (inc1)
  i = i + 1 (jmp2)
  data[i] = data[i] - 1 (inc2)
end

i = i - 1 (jmp1)

into

data[i+jmp1] = (-(data[i]/inc2))*inc1
data[i] = 0
i = i + jmp1

3rd pass requires second as assignation is created by second.

Comparing optimization passes and speed on hanoi.bf

Benchmarked on Ryzen 3700x on WSL this time

No optimization pass (other than folding) : 2.59s

First optimization pass (assignation) : 0.8125s high effectiveness

Second optimization pass (memset) : 0.84375s ❌INEFFECTIVE HERE

Third optimization pass (dynamic assignation folding) : 0.171875s

Comparing optimization passes and speed on mandel.bf

Benchmarked on Ryzen 3700x still on WSL

No optimization pass (other than folding) : 3.5s

First optimization pass (assignation) : 2.92s low effectiveness

Second optimization pass (memset) : 2.54s ✔️EFFECTIVE HERE

Third optimization pass (dynamic assignation folding) : 2.01s

CMP of the optimization passes.

Left part is 1st pass only, right is all 4.

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
torture test		torture test
LICENSE		LICENSE
README.md		README.md
fast_brainfuck.lua		fast_brainfuck.lua

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

torture test

torture test

LICENSE

LICENSE

README.md

README.md