Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance testing point-in-polygon #7

Open
missinglink opened this issue Jul 11, 2019 · 17 comments
Open

Performance testing point-in-polygon #7

missinglink opened this issue Jul 11, 2019 · 17 comments
Labels
enhancement New feature or request

Comments

@missinglink
Copy link
Member

Basic benchmarks show that the point-in-polygon API takes between 0 & 1 millisecond to execute.

We don't fully understand what the performance is like:

  • under heavy load
  • on a cold start vs. when the Linux filesystem cache has paged all/most of the DB
  • single core vs. multi-core
  • when it hits the max QPS for a machine
  • with a small DB vs a large DB
  • at various levels of 'max shard complexity' (a tunable config value).

This ticket is to figure out how to generate benchmarks which return more than simply vanity metrics.

It would be ideal if we can automate this process to measure performance over time, as new features are added.

@missinglink missinglink added the enhancement New feature or request label Jul 11, 2019
@Joxit
Copy link
Member

Joxit commented Jul 21, 2020

Hi there, I did some stress test for spatial so that we can have an idea on the performances.

I used Gatling and the pip-service scenario. The service and injector are on different machines.

Spec

Service:

OS: Debian
CPU: 4 thread (2.3ghz)
RAM: 60Go
Containers: pelias/spatial:master and pelias/pip-service:master
Database: Admin France extract from Geocode Earth

Injector:

OS: Debian
CPU: 8 thread (2.3ghz)
RAM: 30Go
Containers: jawg/pelias-server-streess

Scenario

We use a set of regions, get a random point in the region and do a PIP request on the endpoint /query/pip/_view/pelias/:lon/:lat for spatial or /:lon/:lat for pip-service.
The seeds were generated only once to have the same scenario each time.

In this scenario, we have a total of 75,000 users arriving in 60 seconds. Each user makes a unique request. The goal is a 95th percentile below 750ms. Gatling will inject 1,250 req/s
 
Regions:

AUVERGNE-RHONE-ALPES,44.1154,46.804,2.0629,7.1859
BOURGOGNE-FRANCHE-COMTE,46.1559,48.4001,2.8452,7.1435
BRETAGNE,47.278,48.9008,-5.1413,-1.0158
CENTRE-VAL DE LOIRE,46.3471,48.9411,0.053,3.1286
CORSE,41.3336,43.0277,8.5347,9.56
GRAND EST,47.4202,50.1692,3.3833,8.2333
HAUTS-DE-FRANCE,48.8372,51.089,1.3797,4.2557
ILE-DE-FRANCE,48.1205,49.2413,1.4465,3.5587
NORMANDIE,48.1799,50.0722,-1.9485,1.8027
NOUVELLE-AQUITAINE,42.7775,47.1758,-1.7909,2.6116
OCCITANIE,42.3331,45.0467,-0.3272,4.8456
PAYS DE LA LOIRE,46.2664,48.568,-2.6245,0.9167
PROVENCE-ALPES-COTE D'AZUR,42.9818,45.1268,4.2303,7.7188

Results

I launched the scenario 3 times.

  1. cold start on spatial service, without Linux file system cache. View online or 75k-spatial-without-cache.pdf
    image
  2. hot start on spatial service, with Linux file system cache. View online or 75k-spatial-with-cache.pdf
    image
  3. on pip-service. View online or 75k-pip-service.pdf
    image

Conclusion

Without Linux cache, spatial can't handle this scenario, the number of user is growing until the end (815.217 req/s). With the CPU chart, we see the bottleneck : iowait.
But with the cache, the 95th percentile is at 869ms >~ 750ms (1209.677 req/s).
Without surprise, pip-service is blazing fast with its 95th percentile at 594 ms (1229.508 req/s).

Since we can't control the Linux cache, I can say that 800 req/s is the first limit for spatial.

Next tests will be without multi-core and with fewer users.

@missinglink
Copy link
Member Author

missinglink commented Jul 21, 2020

Nice benchmarks 👍, I had a quick look at the query generation for PIP and there are definitely some 'quick wins' to reduce latency.

Recently I added #65 which can probably mean we can delete a bunch of query logic for finding the default names.

There's probably other things which can be improved too.

If possible could you please keep your benchmarking scripts around so we can run a comparison once this feature lands?

@missinglink
Copy link
Member Author

I did some similar benchmarking in the past and found that reducing the number of users greatly improved the performance, I think in a real-world scenario we're going to be having <5 'users' connected (ie. open HTTP streams).

I'd be interested to see what difference it makes to reduce the user count, assuming that Connection: keep-alive is used?

@missinglink
Copy link
Member Author

One of the really nice things about using SQLite is that it's so easy (and cheap!) to scale this compared to something which is memory-bound.

So I'm more interested in throughput than latency, although we should still make it run as efficiently as possible 😄

If we can run several high-CPU instances (or threads) of this service it'll be capable of PIP-ing many thousands per-second and can theoretically scale linearly as more servers are added.

One interesting thing to note is that the mmap filesystem cache is shared between all processes on the same machine, so a 64-core machine would be able to do 64x this benchmark while only requiring one copy of the disk pages in RAM.

And! (and this is the interesting bit) this is also true of Docker, so you can run multiple containers/pods on the same physical machine using mmap and they will also share the same filesystem cache from the host machine 🧙‍♂️

@missinglink
Copy link
Member Author

missinglink commented Jul 21, 2020

Okay so #67 should hopefully improve these numbers!
~10x? 🤞

@Joxit
Copy link
Member

Joxit commented Jul 21, 2020

If possible could you please keep your benchmarking scripts around so we can run a comparison once this feature lands?

Okay 👍 I wrote all the info I need in my comment if I need to redo the same benchmark 😄 For results the pdf will still be present, I should remove the online version when we release spatial or close this issue.
 

I did some similar benchmarking in the past and found that reducing the number of users greatly improved the performance, I think in a real-world scenario we're going to be having <5 'users' connected (ie. open HTTP streams).

Yes, for me, what we should target is at least 500 req/s for a 95th percentile at 750ms without Linux cache and I think it is possible. 🌈
IDK if Gatling's .shareConnections uses Keep-Alive or not 🤔
 
Let's try with #67 now !

@missinglink
Copy link
Member Author

Any reason you are testing with the Linux cache (mmap mode) disabled?
I was assuming we would always leave that on since it prevents a lot of I/O.

@missinglink
Copy link
Member Author

FYI pelias/interpolation#243

@Joxit
Copy link
Member

Joxit commented Jul 21, 2020

The Linux cache is not disable, I flush the cache before the stress test to simulate a cold start.
In reality, it won't happen every day, but it gives me a worst case scenario (after machine reboot/database update). That's why I run the stress test twice 😄

@Joxit
Copy link
Member

Joxit commented Jul 21, 2020

Guess what ?

@Joxit
Copy link
Member

Joxit commented Jul 21, 2020

A little suspense ...

So, new benchmark with #67 with and without Linux cache. Same scenario as before.

Results

  1. First run just after a linux cache flush. View online or 75k-spatial-without-cache-67.pdf
    image
  2. Second run after the first one. View online or 75k-spatial-with-cache-67.pdf
    image

With a cache flush, the 95th percentile is at 43,889ms without timeout which is better !
And with Linux cache... The 95th percentile is at 31ms which is better that pip-service 😱 The CPU is OK so we can increase the number of requests... But we already have 1,229.508 req/s which I think is more than correct !

@missinglink
Copy link
Member Author

BOOM 💥

@orangejulius
Copy link
Member

Dang, that's some great performance. I guess we need to get serious about integrating it into Pelias :)

@missinglink
Copy link
Member Author

Yeah I'm really happy with that because I put a lot of faith in this architecture and it's nice to know it's bearing fruit.
One other thing which recently happened, which worked in our favour was WiseLibs/better-sqlite3@758665a#diff-6f4c547489674c10529650f5632f129f which changed the threading mode in better-sqlite3, IIRC before that multi-core was actually making it slightly slower and now it's hopefully working as expected.

@Joxit
Copy link
Member

Joxit commented Jul 21, 2020

Dang, that's some great performance. I guess we need to get serious about integrating it into Pelias :)

👍 ! And now we have proof that this project performs better than the current stack. Which should please our customers.

Obama Mic Drop

@missinglink
Copy link
Member Author

missinglink commented Jul 21, 2020

I just ran k6 for a comparison from another load-testing util on my dev server (16 threads @3.6Ghz) and it flew through it:

This is actually not a great test since it used the same lat/lon for each request.

k6 run --vus 20 --iterations 100000 test.js

iteration_duration.........: avg=5.59ms  min=1.41ms  med=4.4ms   max=42.88ms  p(90)=8.61ms  p(95)=10.04ms
    iterations.................: 100000 3565.959785/s
$ cat test.js

import http from 'k6/http';
const url = 'http://localhost:3000/query/pip/_view/pelias/174.766843/-41.288788'

export default function() {
  http.get(url);
}

Screenshot 2020-07-21 at 15 55 43

@missinglink
Copy link
Member Author

I'm using 174.766843/-41.288788 (a location in New Zealand) since the NZ country polygon is large and complex, it's a good 'worst-case scenario' for PIP 😆

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants