Skip to content

marniks7/bitmap-usage

Repository files navigation

Bitmap Usage

Test codecov

Goal

For data size <2M entities

  1. Efficient storage data in memory
  2. Efficient search
  3. Prepare infrastructure for performance analyze

Guidance about this repo

This repo is comparing bitmap and map as a cache.

  • map is provided by Golang built-in library
  • bitmap's are provided by https://github.com/RoaringBitmap/roaring. Few other bitmap repositories used occasionally, but excluded from results described in this section

Some criterias to consider: decrease memory requirements? improve latency? Increase cost of development and support for that? See below

  • Read Status, you might get some issues running non-essential benchmarks
  • Read Design
    • Take into consideration: very few search options are supported, so benchmarks are limited
  • Read Benchmark Aggregation - it may not be up-to-date, but will give you the common understanding of the bitmap benefits
  • Data sizing: only ~500k prices with ~8 conditions each. The results for your data could be different.
  • Fair competition: there are different strategies to improve both bitmap and map and some of them were applied.
  • High throughput vs Low Latency - low-latency is expected here
  • Usage: this repository is comparing search capabilities, leaving update or inserts behind, which could be important use case. Not only timing and memory usage (gc) is important here, but also you mind end up with x2 memory requirements (e.g. creating two caches at the same time) or will be forced to add READ lock for concurrency update\read. Map as well doesn't allow concurrent writes, so at least search compare oranges with oranges
  • Requests: with bitmaps client were able to send on 1281% more requests. e.g. 2 244 242 vs 162 471
  • Latency: at the same time for 99% percentile got better timing on >900%, e.g. 911µs vs 9.534ms with even better results for all requests before 99% percentile
  • Memory: bitmap cache occupies less memory? This is only half of the story. First, you should beware why. bitmap and map index part has minimum difference (2MB vs 4MB) for such data sizing. In fact, benefits comes from ability not to store all fields of original entities. If storing original entities is MUST - you may not gain any memory benefits. bitmap takes less for used structures - 90MB vs 180MB total. However, proper GC configuration is MUST, and minimum possible memory limit will be higher.
  • GOMEMLIMIT: You may need to set memory limit as 800MB for 100MB of the actual cache to get decent performance for 99% percentile. See bitmap 2gb vs 800mb. If degradation further is fine (we are still discussing timings <5ms) see bitmap 2gb vs 500mb. No difference for map 2gb vs 500mb
  • CPU: bitmap is way faster (1 order of magnitude, 1000% more requests), but it .
  • GoGC: Bitmap comes with the cost of increased memory allocation, which means increased GC pressure. Value 1000 looks good. For some background see bitmap. gogc 1000 vs 100
  • Cost: bitmap development and support is way harder than regular map. It is about supporting low-level data types. In contrast Map is like business as usual - you will be able to spend more time on you actual business cases, then on dealing with performance. More over, you can check issues of different bitmap libraries and found all sort of issues hard-to-spot, like race conditions. Add your own changes on top. Good testing, even for concurrent READ scenarios is must.
  • Language: go is used here. java maybe a good alternative, over the years people built different efficient map implementations, and not only that, java's 32-bit support is free (which means less memory, faster).
  • Http Server: when it comes to bitmaps, you can achieve micro-seconds latency and here every improvement matters. For example, bitmap. http server fiber vs default fiber got 40% better timings (319µs improvement for 97% percentile). Map has fewer requests, and timing from the start way higher, so, it is 1-3% (135µs for 97% percentile) map
  • Micro Second Tooling: Not many tools support microsecond measurements, wrk is among those few.
  • Docker: has fewer calls 1 457 261 compare to non-docker option 2 244 242, but similar timing. It is not yet clear why

Disclaimer

Work in progress. There are still a lot todo to get the 'right' results. Unit tests, different data cases, more business cases, challenges with benchmarking and env configuration

Build & Run

  • build: make build
  • run
    • map: make run-map32-fiber
    • bitmap: make run-roaring32-fiber

Usage

  • Search 1 price. Bitmap
curl -POST http://localhost:8091/v1/search/bitmap/prices \
      -H "Content-Type: application/json" \
      -d @benchmark/500k-large-groups/sample/search-price-request.json
  • Search 1 price. Map
curl -POST http://localhost:8091/v1/search/map/prices \
      -H "Content-Type: application/json" \
      -d @benchmark/500k-large-groups/sample/search-price-request.json
  • Search 5 prices. Bitmap
curl -H "Content-Type: application/json" -POST http://localhost:8091/v5/search/bitmap/bulk/prices \
    -d @benchmark/500k-large-groups/sample/search-price-bulk-request-5-nd.json
  • Search 10000 prices. Bitmap
time curl -H "Content-Type: application/json" -o /dev/null -POST http://localhost:8091/v5/search/bitmap/bulk/prices \
    -d @benchmark/500k-large-groups/sample/search-price-bulk-request-10000-nd.json

Benchmarks

Benchmark Aggregation

  • Low-level: make bench (go tests)
  • Memory:
  • High-level
    • Wrk new approach: make run-wrk-experiments (in 1 click). See reports
    • Wrk: make wrk
    • Wrk2: make wrk2
  • Docker (beta)
    • Build: make build docker
      • Run roaring32: make docker-run-roaring32-fiber
      • Run map32: make docker-run-map32-fiber
    • Benchmarks
      • Wrk: make wrk -e IN_DOCKER=true
      • Wrk2: make wrk2 -e IN_DOCKER=true

Design & more

About

No description or website provided.

Topics

Resources

Stars

Watchers

Forks

Languages