Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature request] generic over the hash function #1

Open
peter7891 opened this issue Mar 30, 2023 · 5 comments
Open

[feature request] generic over the hash function #1

peter7891 opened this issue Mar 30, 2023 · 5 comments

Comments

@peter7891
Copy link

If you look at the Rust api for std::collections::HashMap it is generic over the hashing function.
This is very useful for people to be able to choose, based on whether they care more about speed than security in the particular functionality they are using the map in.

I think, this would be a good feature to have here.

@andy-wm-arthur
Copy link
Contributor

andy-wm-arthur commented Mar 31, 2023

Thanks for the request @peter7891!

There are a few things to consider when choosing a hash function for a hash table. Hash functions with a poor output distribution can quickly degrade the performance of the hash table. This is especially true with closed-hashing hash tables and tables that use linear probing to resolve collisions (both apply to SwissMap). Secondly, if the hash function is not properly seeded, it can be susceptible to a hash-flooding attack.

There's a tradeoff between giving users more flexibility/choice and giving them enough rope to hang themselves. An argument against exposing custom hash functions is that the default hash is both fast and high quality. It's the same hash function used by the built in map that leverages AES instructions on supported platforms.

That said, I'm happy to hear other opinions and discuss further

@peter7891
Copy link
Author

Thanks for the request @peter7891!

There are a few things to consider when choosing a hash function for a hash table. Hash functions with a poor output distribution can quickly degrade the performance of the hash table. This is especially true with closed-hashing hash table and tables that use linear probing to resolve collisions (both apply in this case). Secondly, if the hash function is not properly seeded, it can be susceptible to a hash-flooding attack.

There's a tradeoff between giving users more flexibility/choice and giving them enough rope to hang themselves. An argument against exposing custom hash functions is that the default hash is both fast and high quality. It's the same hash function used by the built in map that leverages AES instructions on supported platforms.

That said, I'm happy to hear other opinions and discuss further

You are probably right. The Go compiler doesn't do many optimizations so its probably futile to try to optimize it.
I might fork and test to see if it will perform better.

@andy-wm-arthur
Copy link
Contributor

I wrote some microbenchmarks to compare a few potential hash functions. The current hash function is dolthub/maphash.

goos: darwin
goarch: amd64
pkg: github.com/dolthub/maphash
cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
BenchmarkCompareStringHasher
BenchmarkCompareStringHasher/dolthub/maphash
BenchmarkCompareStringHasher/dolthub/maphash-12         	273337081	         4.314 ns/op	       0 B/op	       0 allocs/op
BenchmarkCompareStringHasher/hash/maphash
BenchmarkCompareStringHasher/hash/maphash-12            	291626719	         4.068 ns/op	       0 B/op	       0 allocs/op
BenchmarkCompareStringHasher/xxHash3
BenchmarkCompareStringHasher/xxHash3-12                 	287016580	         4.138 ns/op	       0 B/op	       0 allocs/op
BenchmarkCompareStringHasher/hash/fnv
BenchmarkCompareStringHasher/hash/fnv-12                	73133010	        16.78 ns/op	       0 B/op	       0 allocs/op
PASS

@peter7891
Copy link
Author

That looks pretty good.
I wonder, how would aHash perform in Go.
https://github.com/tkaitchuck/aHash/blob/master/compare/readme.md

@icellan
Copy link

icellan commented Dec 19, 2023

@andy-wm-arthur Just adding my 2-cents to this. We are using Swiss Map with keys that are already hashed and very evenly distributed. Our keys are [32]byte arrays. It would be great to be able to use them directly instead of hashing them again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants