Bitmap Usage

Goal

For data size <2M entities

Efficient storage data in memory
Efficient search
Prepare infrastructure for performance analyze

Guidance about this repo

This repo is comparing bitmap and map as a cache.

map is provided by Golang built-in library
bitmap's are provided by https://github.com/RoaringBitmap/roaring. Few other bitmap repositories used occasionally, but excluded from results described in this section

Some criterias to consider: decrease memory requirements? improve latency? Increase cost of development and support for that? See below

Read Status, you might get some issues running non-essential benchmarks
Read Design
- Take into consideration: very few search options are supported, so benchmarks are limited
Read Benchmark Aggregation - it may not be up-to-date, but will give you the common understanding of the bitmap benefits
Data sizing: only ~500k prices with ~8 conditions each. The results for your data could be different.
Fair competition: there are different strategies to improve both bitmap and map and some of them were applied.
High throughput vs Low Latency - low-latency is expected here
Usage: this repository is comparing search capabilities, leaving update or inserts behind, which could be important use case. Not only timing and memory usage (gc) is important here, but also you mind end up with x2 memory requirements (e.g. creating two caches at the same time) or will be forced to add READ lock for concurrency update\read. Map as well doesn't allow concurrent writes, so at least search compare oranges with oranges
Requests: with bitmaps client were able to send on 1281% more requests. e.g. 2 244 242 vs 162 471
Latency: at the same time for 99% percentile got better timing on >900%, e.g. 911µs vs 9.534ms with even better results for all requests before 99% percentile
Memory: bitmap cache occupies less memory? This is only half of the story. First, you should beware why. bitmap and map index part has minimum difference (2MB vs 4MB) for such data sizing. In fact, benefits comes from ability not to store all fields of original entities. If storing original entities is MUST - you may not gain any memory benefits. bitmap takes less for used structures - 90MB vs 180MB total. However, proper GC configuration is MUST, and minimum possible memory limit will be higher.
GOMEMLIMIT: You may need to set memory limit as 800MB for 100MB of the actual cache to get decent performance for 99% percentile. See bitmap 2gb vs 800mb. If degradation further is fine (we are still discussing timings <5ms) see bitmap 2gb vs 500mb. No difference for map 2gb vs 500mb
CPU: bitmap is way faster (1 order of magnitude, 1000% more requests), but it .
GoGC: Bitmap comes with the cost of increased memory allocation, which means increased GC pressure. Value 1000 looks good. For some background see bitmap. gogc 1000 vs 100
Cost: bitmap development and support is way harder than regular map. It is about supporting low-level data types. In contrast Map is like business as usual - you will be able to spend more time on you actual business cases, then on dealing with performance. More over, you can check issues of different bitmap libraries and found all sort of issues hard-to-spot, like race conditions. Add your own changes on top. Good testing, even for concurrent READ scenarios is must.
Language: go is used here. java maybe a good alternative, over the years people built different efficient map implementations, and not only that, java's 32-bit support is free (which means less memory, faster).
Http Server: when it comes to bitmaps, you can achieve micro-seconds latency and here every improvement matters. For example, bitmap. http server fiber vs default fiber got 40% better timings (319µs improvement for 97% percentile). Map has fewer requests, and timing from the start way higher, so, it is 1-3% (135µs for 97% percentile) map
Micro Second Tooling: Not many tools support microsecond measurements, wrk is among those few.
Docker: has fewer calls 1 457 261 compare to non-docker option 2 244 242, but similar timing. It is not yet clear why

Disclaimer

Work in progress. There are still a lot todo to get the 'right' results. Unit tests, different data cases, more business cases, challenges with benchmarking and env configuration

See Status for details

Build & Run

build: make build
run
- map: make run-map32-fiber
- bitmap: make run-roaring32-fiber

Usage

Search 1 price. Bitmap

curl -POST http://localhost:8091/v1/search/bitmap/prices \
      -H "Content-Type: application/json" \
      -d @benchmark/500k-large-groups/sample/search-price-request.json

Search 1 price. Map

curl -POST http://localhost:8091/v1/search/map/prices \
      -H "Content-Type: application/json" \
      -d @benchmark/500k-large-groups/sample/search-price-request.json

Search 5 prices. Bitmap

curl -H "Content-Type: application/json" -POST http://localhost:8091/v5/search/bitmap/bulk/prices \
    -d @benchmark/500k-large-groups/sample/search-price-bulk-request-5-nd.json

Search 10000 prices. Bitmap

time curl -H "Content-Type: application/json" -o /dev/null -POST http://localhost:8091/v5/search/bitmap/bulk/prices \
    -d @benchmark/500k-large-groups/sample/search-price-bulk-request-10000-nd.json

Benchmarks

Benchmark Aggregation

Low-level: make bench (go tests)
Memory:
- make bench-memory-bitmap See bitmap memory
- make bench-memory-map. See map memory
High-level
- Wrk new approach: make run-wrk-experiments (in 1 click). See reports
- Wrk: make wrk
- Wrk2: make wrk2
Docker (beta)
- Build: make build docker
  - Run roaring32: make docker-run-roaring32-fiber
  - Run map32: make docker-run-map32-fiber
- Benchmarks
  - Wrk: make wrk -e IN_DOCKER=true
  - Wrk2: make wrk2 -e IN_DOCKER=true

Design & more

Design
Tasks

Name		Name	Last commit message	Last commit date
Latest commit History 171 Commits
.github/workflows		.github/workflows
app		app
benchmark		benchmark
bulk		bulk
cache		cache
cache64		cache64
docs		docs
handlers-kelindar		handlers-kelindar
handlers-map		handlers-map
handlers-map64		handlers-map64
handlers-roaring		handlers-roaring
handlers-roaring64		handlers-roaring64
handlers-sroar		handlers-sroar
handlers		handlers
index-bits-and-blooms		index-bits-and-blooms
index-kelindar-column		index-kelindar-column
index-kelindar		index-kelindar
index-map		index-map
index-map64		index-map64
index-roaring		index-roaring
index-roaring64		index-roaring64
index-sroar		index-sroar
misc		misc
model		model
model64		model64
reports		reports
runner		runner
validator		validator
validator64		validator64
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
coverprofile.out		coverprofile.out
go.mod		go.mod
go.sum		go.sum
json.lua		json.lua
main.go		main.go
wrk		wrk
wrk2		wrk2

marniks7/bitmap-usage

Folders and files

Latest commit

History

Repository files navigation

Bitmap Usage

Goal

Guidance about this repo

Disclaimer

Build & Run

Usage

Benchmarks

Design & more

About

Topics

Resources

Stars

Watchers

Forks

Languages