Spatial Benchmarks

This repo is a testbed for benchmarking basic geospatial queries across a range of data stores.

It is written in Ruby, and organized as series of Rake tasks.

Findings

TLDR

1 second of PostGIS time ≈ 6 seconds of Elasticsearch time ≈ 7 seconds of Mongo time

Well, sort of. On my machine, PostGIS and Mongo display steady throughput, while Elasticsearch is more erratic, possibly due to JVM garbage collection.

Between Mongo and Elastic, it seems that Elastic has better peak throughput, but Mongo has better average throughput.

Postgres is still the hands-down winner, nearly an order of magnitude faster.

Methodology

Perform a set of 100 bounding box queries that represent a moving window, as might be fetched from a front-end map client, over a swath of the United States extending from New York to Florida.

Repeat this 100 times for each data store.

Dataset

The spatial dataset for this benchmark is the Museum Universe Data File, published by the Institute of Museum and Library Services, a collection of ~33,000 museums and related organizations in the United States.

See it on a map

Results

For N=100, i.e. 10,000 bounding box queries:

Data store	Version	Index	Elapsed	Normalized	Throughput
Postgres / PostGIS	10.1 / 2.4	GiST	2.38 sec	1.0	4,210 queries/sec
Elasticsearch	2.4	n/a	14.90 sec	6.27	671 queries/sec
MongoDB	3.4	2dsphere	16.77 sec	7.06	596 queries/sec

Hardware note: This is on my 2014-vintage Mac laptop:

Macbook Pro
Intel i7, quad-core, 2.3GHz
16 GB RAM

Running the benchmarks

Prerequisites

You will need working installations of:

PostgreSQL with the PostGIS spatial extensions
MongoDB
Elasticsearch

With Homebrew this would be something like

brew install postgresql postgis
brew install mongodb
brew install elasticsearch
# follow post-install instructions

brew services start postgresql
brew services start mongodb
brew services start elasticsearch

This project will take care of creating the necessary databases and indexes when you do rake load.

You can configure the services and databases in databases.yml.

Steps

Clone this project and install its dependencies

$ git clone https://github.com/anandaroop/spatial-benchmarks.git

$ cd spatial-benchmarks

$ bundle install

Obtain the MUDF csv datafile:

$ rake get_csv

Load up the data

$ rake load

Run the benchmarks

$ rake benchmark

If you run the benchmarks, why not open an issue or PR with the results 😀 ?

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.vscode		.vscode
config		config
docs		docs
lib		lib
.gitignore		.gitignore
.rubocop.yml		.rubocop.yml
.rubocop_todo.yml		.rubocop_todo.yml
.ruby-version		.ruby-version
Gemfile		Gemfile
Gemfile.lock		Gemfile.lock
README.md		README.md
Rakefile		Rakefile
queries.json		queries.json

anandaroop/spatial-benchmarks

Folders and files

Latest commit

History

Repository files navigation

Spatial Benchmarks

Findings

TLDR

Methodology

Dataset

Results

Running the benchmarks

Prerequisites

Steps

About

Resources

Stars

Watchers

Forks

Languages