Benchmark for Logs & Traces Search Engines

Overview

This benchmark is designed to measure the performance of various search engines for logs and traces use cases and more generally for append-only semi-structured data.

The benchmark makes use of two datatsets:

A 1TB dataset sampled from the GitHub Archive dataset.
A 1TB log datasets generated with the https://github.com/elastic/elastic-integration-corpus-generator-tool

We plan to add a trace dataset soon.

The supported engines are:

Quickwit
Elasticsearch
Loki (only for generated logs)

Prerequisites

Dependencies

Make to ease the running of the benchmark.
Docker to run the benchmarked engines, including the Python API.
Python3 to download the dataset and run queries against the benchmarked engines.
Rust and openssl-devel to build the ingestion tool qbench.
gcloud to download datasets.
Various python packages installed with pip install -r requirements.txt

Build qbench

cd qbench
cargo build --release

Download datasets

For the generated logs dataset:

mkdir -p datasets
gcloud storage cp "gs://quickwit-datasets-public/benchmarks/generated-logs/generated-logs-v1-????.ndjson.gz" datasets/

Running the benchmark manually

Start engines

Go to desired engines subdirs engines/<engine_name> and run make start.

Indexing phase

python3 run.py --engine quickwit --storage SSD --track generated-logs --instance m1 --tags my-bench-run --indexing-only

By default this will export results to the benchmark service accessible at this address. The first time this runs, you will be re-directed to a web page where you should login with you Google account and pass back a token to run.py (just follow the instructions the tool prints). Exporting to the benchmark service can be disabled by passing the flag --export-to-endpoint ""

After indexing (and if exporting to the service was not disabled), the tool will print a URL to access results, e.g.: https://qw-benchmarks.104.155.161.122.nip.io/?run_ids=678

Results will also be saved to a results/{track}.{engine}.{tag}.{instance}/indexing-results.json file.

{
  "doc_per_second": 8752.761519421289,
  "engine": "quickwit",
  "index": "generated-logs",
  "indexing_duration_secs": 1603.68884367,
  "mb_bytes_per_second": 22.77175235654048,
  "num_indexed_bytes": 18840178633,
  "num_indexed_docs": 14036706,
  "num_ingested_bytes": 36518805205,
  "num_ingested_docs": 14036706,
  "num_splits": 12
}

Execute the queries

python3 run.py --engine quickwit --storage SSD --track generated-logs --instance m1 --tags my-bench-run --search-only

The results will also be exported to the service and saved to a results/{track}.{engine}.{tag}.{instance}/search-results.json file.

{
    "engine": "quickwit",
    "index": "generated-logs",
    "queries": [
        {
            "id": 0,
            "query": {
                "query": "payload.description:the",
                "sort_by_field": "-created_at",
                "max_hits": 10
            },
            "tags": [
                "search"
            ],
            "count": 138290,
            "duration": [
                8843,
                9131,
                9614
            ],
            "engine_duration": [
                7040,
                7173,
                7508
            ]
        }
    ]
}

Exploring results

Use the Benchmark Service web page.

Run comparison

The default page allows selecting and comparing runs: example.

Runs are identified by a numerical ID and are automatically named <engine>.<storage>.<instance>.<short_commit_hash>.<tag>. For now, names are allowed to collide, i.e. a given name can refer to multiple runs. In that case, selecting a name in the list of runs to compare will show the most recent indexing run with that name, and the most recent search run with that name.

Tips:

The URL of the page is a permanent link to the runs shown. This is convenient way to share results.
Clicking on the run name in the comparison shows the raw run results with additional information.
It's fine if a run only has indexing or search results.
The full list of runs is loaded when the web page is loaded, so you may need to reload it to see your latest runs.

Graphs

The graphs page allows plotting graphs of indexing and search run results over time (example). Only runs with source continuous_benchmarking or github_workflow are shown there. Runs are identified by a string <engine>.<storage>.<instance>.<tag> (note the absence of commit hash) which refers to a series of indexing and search runs over time.

Tip:

The URL of the page is a permanent link to the series of runs shown. Later visits can contain additional data points.
Clicking on a point in any graph opens the comparison page between the run that contributed the point to the run that contributed the previous point.

Running the service

See here for running the benchmark service.

Loki VS Quickwit (WIP)

Details of the comparison can be found here.

Name		Name	Last commit message	Last commit date
Latest commit History 136 Commits
engines		engines
monitoring		monitoring
qbench		qbench
scripts		scripts
service		service
tracks		tracks
web		web
.gitignore		.gitignore
AUTHORS		AUTHORS
Dockerfile		Dockerfile
Dockerfile.service_and_web		Dockerfile.service_and_web
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
benchmark_service_client.py		benchmark_service_client.py
docker-compose.yaml		docker-compose.yaml
loki_quickwit_benchmark.md		loki_quickwit_benchmark.md
requirements.txt		requirements.txt
run.py		run.py

License

quickwit-oss/benchmarks

Folders and files

Latest commit

History

Repository files navigation