TunaDB: A task-based in-memory research DBMS

Tunadb is an research in-memory Database Management System aimed to enhance query compilation, job processing, and data prefetching. Through the integration of creative approaches and novel models, this project seeks to create a high-performance and efficient database solution.

Note: This is a research project. While TunaDB demonstrates promising capabilities, it is unsuitable for production use.

Key Features and Implemented Concepts

Task-based Architecture

TunaDB adopts our MxTask-based approach to manage query execution efficiently. By breaking down large queries into smaller tasks, it optimizes resource efficiency and task parallelism, enabling fast data processing.

Query Compilation

TunaDB's query compilation process is streamlined by using FlounderIR as a lightweight intermediate representation to comile queries into executable assembly code. It facilitates the translation of high-level inquiries to low-level execution activities by simplifying the query structure. The shipped implementation of FlounderIR improves on the original implementation by some optimizations (e.g., better register assignment and branch relocation).

Profiling and Performance Analysis

TunaDB provides powerful profiling features for in-depth performance investigation. It supports inlined perf-counter profiling, as well as perf sampling of memory addresses and instructions, offering useful insights into system behavior and identifying potential bottlenecks.

Furthermore, developers can benefit from the seamless integration of third-party applications like Intel® VTune™ and perf. TunaDB will make compiled query code available for such tools. This allows for a comprehensive inspection of compiled code, enabling detailed performance evaluations and optimizations to unlock the DBMS's full potential.

Build Instructions

Dependencies

Please install the following dependencies

cmake >= 3.10
clang >= 13 (gcc is not tested)
clang-tidy >= 13
libnuma or libnuma-dev
bison
flex
libgtest-dev for tests in test/ (optional)

Building

Step 1: Clone the repository

git clone https://github.com/jmuehlig/mxtasking-tunadb.git

Step 2: Generate the `Makefile` using `cmake`

cmake . -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=clang-15 -DCMAKE_CXX_COMPILER=clang++-15

Step 3: Build `TunaDB`

make tunadb -j4

The tunadb binary will be located in bin/.

Basic Usage

Starting TunaDB

Calling the binary ./bin/tunadb will start both the server and a client in a single process. You can now create tables, insert data, and execute queries, using the client's console.

Using the web client

TunaDB will start an additional web client that is available for convenient use when the --web-client switch is added (./bin/tunadb --web-client). The web client allows to execute queries, show query plans (both logical and physical), show generated FlounderIR and assembly code and profile the execution. After startup, the web console is available under

http://0.0.0.0:9100

Loading initial data

TunaDB can execute one SQL file to initially load data before starting the server. Use the --load <file.sql> option. The given SQL file may

create tables (CREATE TABLE <table> (...)),
copy data from (CSV) files (COPY <table> FROM '<file>'),
execute further SQL files (.LOAD FILE '<file.sql>'),
and/or update statistics (.UPDATE STATISTICS <table>).

Loading TPC-H data

If you want to bring the data of the TPC-H benchmark into TunaDB:

Create a folder sql/data/tpch
Generate all .tbl files and move them into sql/data/tpch

Load the SQL script sql/load_tpch.sql:

`./bin/tunadb --load sql/load_tpch.sql`

Further Commands

See ./bin/tunadb --help for further options and flags. More information about the code structure and implemented commands, data types, etc. are given in src/db/README.md.

Related Publications

Jan Mühlig, Jens Teubner. Micro Partitioning: Friendly to the Hardware and the Developer. DaMoN 2023: 27-34. Read the Paper
Henning Funke, Jan Mühlig, Jens Teubner. Low-latency query compilation. VLDB J. 31(6): 1171-1184 (2022). Read the Paper | See the original Source Code
Jan Mühlig, Jens Teubner. MxTasks: How to Make Efficient Synchronization and Prefetching Easy. SIGMOD Conference 2021: 1331-1344. Read the Paper | See the original Source Code
Henning Funke, Jan Mühlig, Jens Teubner. Efficient generation of machine code for query compilers. DaMoN 2020: 6:1-6:7. Read the Paper

Code Structure

The code is separated in four different branches:

src/application contains stuff of MxTask-based applications (TunaDB is one of them). For guidance: Every application should be hold in a separated folder and end up in at least one binary (stored in bin/).
src/db contains database-related implementations, such as indices, types, execution engine, etc..
src/mx includes all stuff for the task-based abstraction MxTasking.
src/flounder includes the low-latency IR, used for jit compiling operators.
src/perf includes an implementation of in-source perf counter and sampling.

Further Applications

Besides TunaDB, this repository includes further task-based applications used for papers or development.

B-link-Tree Benchmark

The folder src/application/blinktree_benchmark contains the benchmark code used in our paper MxTasks: How to Make Efficient Synchronization and Prefetching Easy.

Radix Join Benchmark

The folder src/application/radix_join_benchmark contains the benchmark code used in our paper Micro Partitioning: Friendly to the Hardware and the Developer.

Task-based "Hello World"

The folder src/application/hello_world contains a task-based example for creating and spawning a simple task.

External Libraries

TunaDB would not be possible without the help of various external libraries. The used libraries will be downloaded automatically (using git) during the build process. Special thanks to:

argparse (view on GitHub) under MIT license
nlohmann json (view on GitHub) under MIT license
linenoise (view on GitHub) under BSD-2 license
cpp-httplib (view on GitHub) under MIT license
asmjit (view on GitHub) under Zlib license
{fmt} (view on GitHub) under MIT license
spdlog (view on GitHub) under MIT license
static_vector (view on GitHub) under MIT license
robin-map (view on GitHub) under MIT license
libcount (view on GitHub) under Apache-2.0 license
xxhashct (view on GitHub) published without license
ittapi (view on GitHub) under GPLv2 and 3-Clause BSD licenses

Contact

If you have any questions or comments, feel free to contact via mail: jan.muehlig@tu-dortmund.de.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
lib		lib
scripts		scripts
sql		sql
src		src
test		test
workloads_specification		workloads_specification
.clang-format		.clang-format
.clang-tidy		.clang-tidy
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md

License

jmuehlig/mxtasking-tunadb

Folders and files

Latest commit

History

Repository files navigation