Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make GeoHash coordinate conversions ~40-100x faster #348

Merged
merged 38 commits into from May 21, 2024

Conversation

PaulusParssinen
Copy link
Contributor

@PaulusParssinen PaulusParssinen commented Apr 30, 2024

Note

Credits for the quantization approach to https://mmcloughlin.com/posts/geohash-assembly (by @mmcloughlin)
and de-quantization method to https://github.com/georust/geohash, big thank you for everyone contributing to both efforts!

This PR originally was meant to be simply tiny optimization to avoid unnecessary heap allocations on the coordinate conversions but I had change of plans when I discovered an article about much faster (de)quantization method by Michael McLoughlin and this ended up being GeoHash class rewrite. Please see his great article and explanation.

Much of the commits after 564891a is me discovering new exciting details about IEEE-754 floating-point rounding, confusing reference implementations (it turns out, there's no single correct way to reference a location on earth 😅) and other fun Geohash corner-cases.

However, more notably there's tiny error in the quantization approach described in the McLoughing's excellent article which I describe in this comment and fix in f364a96c..b4ec75c. Gist of it is that in IEEE-754 double precision format, the exponent assumption of 1023 does not hold for 2.0, which the maximum coordinate values get rounded to.

There should be no other behavioral differences than the last character of textual representation, which one of the existing tests already ignored understandably due to nuances around the precision of it. The textual representation is should simply be base-32 encoded (custom geohash alphabet) of the integer that represents a specific GeoHash where more characters means more bits of precision. Every encoded character gives 5-bits of precision. Very simple right! I wish it was

Garnet (and Redis) chose 52-bit precision. Why? Because we want to store the geohash integer in a sorted set, which allows us to utilize the geohash structure and do really efficient range queries. Now because we want to reuse the existing sorted set we need to safely store the geohash integer in IEEE-754 double precision score which we can do that safely when our integer is in range [(-2)^53. 2^53].

Now, we can still encode this 52-bit precision integer in the base-32 alphabet just fine. The remaining 55-52=3 bits will be zero and that's what Garnet has done, but Redis seems to zero out the last 2 bits and output the last character of the textual representation as 0.

Redis made another choice too, which is that they chose to follow EPSG:3875 where:

  • Valid longitudes are from -180 to 180 degrees.
  • Valid latitudes are from -85.05112878 to 85.05112878 degrees.

This makes sense as the poles cause all sorts of problems in GIS applications.

Confusing history behind Web Mercator

image

These limits however do not apply to "standard" geohash format which valid latitudes are [-90, 90]. This difference also adds nuance to geohash implementations. Garnet currently accepts coordinates per the geohash limitations.

This PR also introduces AsciiUtils in Garnet.common as place to share common ASCII manipulation logic (acts more like polyfill for Ascii class in .NET 6 and a place where we can optimize further than what BCL's currently does).

This PR is strictly focused on GeoHash class alone. I'll remove more unnecessary allocations in the actual RESP protocol parsing & encoding in follow-up. There's some vectorization opportunities here and the distance math can be definitely improved too.

Benchmarks (Tiger Lake, i5-1135G7)

BenchmarkDotNet v0.13.12, Ubuntu 22.04.4 LTS (Jammy Jellyfish)
11th Gen Intel Core i5-1135G7 2.40GHz, 1 CPU, 8 logical and 4 physical cores
.NET SDK 8.0.204
  [Host] : .NET 8.0.4 (8.0.424.16909), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  .NET 6 : .NET 6.0.29 (6.0.2924.17105), X64 RyuJIT AVX2
  .NET 8 : .NET 8.0.4 (8.0.424.16909), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI

Server=True  

main

Method Job Mean Error StdDev Gen0 Allocated
GeoToLongValue .NET 6 278.3 ns 0.51 ns 0.43 ns 0.0191 80 B
GeoToLongValue .NET 8 260.0 ns 0.85 ns 0.80 ns 0.0191 80 B
GetCoordinatesFromLong .NET 6 429.1 ns 1.30 ns 1.22 ns 0.0801 336 B
GetCoordinatesFromLong .NET 8 213.8 ns 0.85 ns 0.75 ns 0.0801 336 B
GetGeoHashCode .NET 6 810.1 ns 4.79 ns 4.24 ns 0.2537 1064 B
GetGeoHashCode .NET 8 619.1 ns 1.88 ns 1.76 ns 0.2537 1064 B

PR

Method Job Mean Error StdDev Gen0 Allocated
GeoToLongValue .NET 6 5.556 ns 0.0156 ns 0.0146 ns - -
GeoToLongValue .NET 8 1.888 ns 0.0156 ns 0.0130 ns - -
GetCoordinatesFromLong .NET 6 5.074 ns 0.0038 ns 0.0031 ns - -
GetCoordinatesFromLong .NET 8 1.215 ns 0.0038 ns 0.0034 ns - -
GetGeoHashCode .NET 6 15.944 ns 0.2487 ns 0.2077 ns 0.0014 48 B
GetGeoHashCode .NET 8 15.933 ns 0.0331 ns 0.0294 ns 0.0012 48 B

Benchmarks (Zen 2, Ryzen 7 3700X)

BenchmarkDotNet v0.13.12, Windows 10 (10.0.19045.4170/22H2/2022Update)
AMD Ryzen 7 3700X, 1 CPU, 16 logical and 8 physical cores
.NET SDK 9.0.100-preview.3.24204.13
  [Host] : .NET 8.0.4 (8.0.424.16909), X64 RyuJIT AVX2
  .NET 6 : .NET 6.0.27 (6.0.2724.6912), X64 RyuJIT AVX2
  .NET 8 : .NET 8.0.4 (8.0.424.16909), X64 RyuJIT AVX2

Server=True

main

Method Job Mean Error StdDev Gen0 Allocated
GeoToLongValue .NET 6 290.1 ns 3.08 ns 2.88 ns 0.0095 80 B
GeoToLongValue .NET 8 269.4 ns 2.14 ns 1.90 ns 0.0095 80 B
GetCoordinatesFromLong .NET 6 510.0 ns 9.71 ns 9.54 ns 0.0401 336 B
GetCoordinatesFromLong .NET 8 195.3 ns 3.94 ns 3.49 ns 0.0401 336 B
GetGeoHashCode .NET 6 877.3 ns 6.87 ns 6.42 ns 0.1268 1064 B
GetGeoHashCode .NET 8 525.6 ns 6.46 ns 6.04 ns 0.0982 824 B

PR

Method Job Mean Error StdDev Gen0 Allocated
GeoToLongValue .NET 6 5.646 ns 0.0306 ns 0.0255 ns - -
GeoToLongValue .NET 8 5.829 ns 0.0537 ns 0.0502 ns - -
GetCoordinatesFromLong .NET 6 5.507 ns 0.0583 ns 0.0545 ns - -
GetCoordinatesFromLong .NET 8 5.928 ns 0.0423 ns 0.0396 ns - -
GetGeoHashCode .NET 6 16.286 ns 0.0968 ns 0.0808 ns 0.0003 48 B
GetGeoHashCode .NET 8 15.721 ns 0.1790 ns 0.1587 ns 0.0003 48 B
Zen 2 PDEP/PEXT "erratum" in action
Method Job Mean Error StdDev Allocated
GeoToLongValue .NET 6 w/ UsePdepPext 64.98 ns 0.132 ns 0.103 ns -
GeoToLongValue .NET 8 w/ UsePdepPext 64.95 ns 0.383 ns 0.359 ns -
GetCoordinatesFromLong .NET 6 w/ UsePdepPext 62.87 ns 0.137 ns 0.128 ns -
GetCoordinatesFromLong .NET 8 w/ UsePdepPext 62.92 ns 0.090 ns 0.084 ns -

See that 48B heap allocation for the final string in GetGeoHashCode, we don't need that :)

@TalZaccai TalZaccai requested a review from yrajas April 30, 2024 22:40
@PaulusParssinen PaulusParssinen marked this pull request as draft May 1, 2024 20:02
@PaulusParssinen
Copy link
Contributor Author

PaulusParssinen commented May 2, 2024

CanEncodeAndDecodeCoordinates

All values computed values printed with ToString("F16") (except the updated row, which is F14..)

Label Latitude Longitude Lat Error (vs. input) Lon Error (vs. input)
Test input 30.5388942218 104.0555758833 - -
Test IEEE-754 repr. 30.5388942217999997 104.0555758832999942 - -
main 30.5388928949832916 104.0555772185325623 0.0000013268167081 -0.0000013352325681
PR (1b59a0a) 30.5388915538787842 104.0555745363235474 0.0000026679212155 0.0000013469764468
PR (calculate center) 30.538892894983292 104.05557721853256 0.0000013268167081 -0.0000013352325681

edit: and I finally realize, the de-quantized coordinate represents the minimum value and we get the same coordinate as main branch which represented the center of this error-bounding-box by:
$(\text{{minLatitude}} + \frac{{\text{{latitudeError}}}}{2.0}, \text{{minLongitude}} + \frac{{\text{{longitudeError}}}}{2.0})$

Funny in hindsight as I was earlier wondering why those error values were multiplies of each other..

@PaulusParssinen
Copy link
Contributor Author

PaulusParssinen commented May 2, 2024

Premature optimization is root of all evil fun!

Few test failures left, need to dive deeper into some rounding corner-cases.

The diff looks incomprehensible to review, it's pretty much a rewrite 😅

@PaulusParssinen
Copy link
Contributor Author

PaulusParssinen commented May 3, 2024

Added USE_PDEP_PEXT to allow those consumers who want to extract every bit of perf. for their Geo commands (and who know their CPU has the non-microcode-emulated version). SharpLab (Entire encoding/decoding is inlined into one method 😋)

@PaulusParssinen
Copy link
Contributor Author

PaulusParssinen commented May 4, 2024

adds "Investigate LightClient implementation" to todo list. It seems run some tests until timeout if it receives the expected amount of bytes

For some reason, I remembered/thought that the hash integer was bit-cast i.e. BitConverter.DoubleToUInt64Bits to the sorted set instead of just cast. That explains the 52-bits and I'll cancel the precision change plans..

I just realized the original implementation was not actually redis "52-bit" precision either 🤔 Redis seems to always "clear" out the last 2 bits for the textual representation output, so it always returns 0 as last character. Before this PR, there was already difference between the textual representation between Garnet and Redis.

@PaulusParssinen
Copy link
Contributor Author

PaulusParssinen commented May 6, 2024

There's small corner-case error in the McLoughin's quantization trick on which this PR is based.

The statement

Since y is in the range [1,2], the largest power of two less than or equal to y is 1.0 and the exponent field will always be 1023.

is wrong for the upper-bound of the range. Exponent bias is 1023 for [1.0, 2.0) and exponent for 2.0 flips over to 1024. This means that 90.0 and 180.0 that are clamped to the range maximum 2.0, have their signicand (and subsequently, the geohash integer repesentation) zeroed and dequantization will give equivalent output as -90.0 and -180.0 respectively.

There's a lot of problems with GeoHash in general around the corner values where implementations differ..

Humor

GCQ2uv5bwAEQgju

@PaulusParssinen
Copy link
Contributor Author

PaulusParssinen commented May 7, 2024

And to add to then confusion, the original latitude limits are "wrong". According EPSG:3857 (also known as EPSG:3785, also known as EPSG:900913.. etc.) the latitude limits are -85.05112878 to 85.05112878.. not -90 to 90. Fun! I'm starting to see why the implementation are all over the place as the GeoHash standard uses different limits than the most common web projection.

I'll won't touch the existing latitude limit in this PR.

@badrishc
Copy link
Contributor

badrishc commented May 8, 2024

Interesting stuff! Glad to see you are making progress here. :)

@PaulusParssinen PaulusParssinen marked this pull request as ready for review May 8, 2024 01:06
@PaulusParssinen
Copy link
Contributor Author

PaulusParssinen commented May 8, 2024

Added benchmarks from my laptop which has AVX512F by which we guard the PEXT/PDEP path. This made both encoding/decoding of geohashes integers go from ~5.5ns to ~1.5ns which was more than I expected. It may be worthwhile to enable more platforms with static readonly Zen 2 check by doing X86Base.CpuId(0, 0) and X86Base.CpuId(1, 0) and check if AuthenticAMD && family < 0x19 to guard Pre-Zen 3.

Also wrote little bit more about the PR to its description.

@PaulusParssinen
Copy link
Contributor Author

PaulusParssinen commented May 8, 2024

Added more background to the PR description about the nuances of geohash..

@PaulusParssinen PaulusParssinen changed the title Optimize GeoHash coordinate conversions Make GeoHash coordinate conversions ~40-150x faster May 9, 2024
@PaulusParssinen PaulusParssinen changed the title Make GeoHash coordinate conversions ~40-150x faster Make GeoHash coordinate conversions ~40-100x faster May 9, 2024
Garnet.sln Outdated Show resolved Hide resolved
* Keep them in registers (or atmost spill to stack)
* Also some other misc. simplification
* I will return to this method in a follow-up
* Abuse IEEE-754 binary representation in the encoding too
* Implement Z-curve decoding more efficiently
* Add GetGeoErrorByPrecision to calculate the error at given bit precision (52 for us).
* Optimize base32 string encoding.
* Test still fail. Will need to investigate more.
* And define it to run the tests with it
…unding

* More accurate and faster, what not to love
* Little extra encouragement to JIT.
* And add GeoHashBenchmark job with it enabled
* We might not even wan't cmov/csel which might stall out-of-order execution. Doesn't matter what is emitted tbh.
* The GeoHash class incorporates material from mmcloughlin/geohash and georust/geohash, both licensed under MIT License. Thank you for sharing!
@PaulusParssinen
Copy link
Contributor Author

PaulusParssinen commented May 21, 2024

Out of curiosity, what required a force-push? 😅 (This broke the history experience in GitHub UI. I have still this branch in it's pre-force-push sync state if we want to restore that.)

@yrajas
Copy link
Contributor

yrajas commented May 21, 2024

Out of curiosity, what required a force-push? 😅 (This broke the history experience in GitHub UI. I have still this branch in it's pre-force-push sync state if we want to restore that.)

No changes, actually. I was updating the branch with rebase option to bring in latest changes from main.

@yrajas yrajas merged commit e0027d1 into microsoft:main May 21, 2024
23 checks passed
@PaulusParssinen PaulusParssinen deleted the opt-geohash branch May 22, 2024 09:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants