Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow hashers other than SipHasher #88

Open
Manishearth opened this issue Jul 8, 2016 · 13 comments
Open

Allow hashers other than SipHasher #88

Manishearth opened this issue Jul 8, 2016 · 13 comments

Comments

@Manishearth
Copy link
Contributor

With a compile-time hashmap, having a cryptographically secure hash is less important since there are no pathological cases to worry about -- everything is done at compile time.

IIRC SipHasher isn't that secure anyway, but there are options like FNV available too.

Perhaps we should allow for these to be selected instead, using defaulted generics?

@sfackler
Copy link
Collaborator

sfackler commented Jul 8, 2016

It's definitely possible, though it'll be a bit interesting to see how to configure it at compile time.

It is worth noting that the current algorithm appears to need a very high quality hash function. xxhash works, and we used to use it, but some lower quality functions prevent the algorithm from finding a solution. I know I tried Java-style hashes at one point that didn't work, and I may have tried FNV as well at some point.

@Manishearth
Copy link
Contributor Author

Oh, I see, the algorithm needs specific hash functions too. I guess a more specific trait (more specific than Hasher) implemented on these would work?

@SimonSapin
Copy link
Contributor

Today on IRC:

  • SimonSapin: does the phf algorithm rely on properties of Sip, or would any parameterized hash function do? We’re considering FNV-based phf. https://github.com/servo/rust-fnv/pull/11/files parameterized FNV, but per http://www.isthe.com/chongo/tech/comp/fnv/index.html#FNV-param this key is not completely arbitrary: it’s better for it to be non-zero
  • cbreeden: the paper references an assumption of needing a family of fully random hash functions -- it references another paper on how to construct such a hasher, but that paper assumes that you at least have a sequence of fully random hashers
  • sfackler: it does not rely on the security properties of sip, but it does seem like the current algorighm needs a very high quality hash function. I played around with FNV a while ago and found that the table generation logic tended to spin forever. It would definitely be cool to swap it out. A full rewrite to a different underlying algorithm could be interesting as well
  • cbreeden: I think FNV is really picky about its seeds

@joshtriplett
Copy link

Would seahash work for this purpose?

@sfackler
Copy link
Collaborator

I haven't tried, but I think SeaHash is supposed to produce pretty high quality output so I wouldn't be surprised if it did.

@twmb
Copy link

twmb commented Sep 26, 2017

PCG hash is another alternative?

@briansmith
Copy link

In many cases it is possible to find a very simple minimal hash function. For example, when looking up an ECC public key by value in a table of ~200 public keys, you can probably find a hash function that only looks at a single byte or a few bytes of the key and does 0, 1, or a few bitwise operations on those bytes.

IMO it would be ideal if I could pass my own hash function body to the generator, and have the generator verify that it is indeed a (minimal) perfect hash function for the input data, and then have it use my function.

@djudd
Copy link
Contributor

djudd commented Feb 3, 2019

Out of curiosity I tried out swapping in the FxHash and Seahash crates here. FxHash did not finish compiling the tests; Seahash works but is slower on the benchmarks here (takes about 60ns instead of about 30ns on my laptop).

If that's not too disappointing to folks interested in trying another hash--it looks like the main complication to making this generic would be making the key field in HashState generic, including in codegen/macros. (For example, Seahash wants four u64s as its seed, not just one.)

@joshtriplett
Copy link

@djudd Odd; I had the impression that seahash was supposed to be faster.

@abonander
Copy link
Collaborator

With the fixes in #164 I was able to get FNVHasher solving 1M entries, but it was slower than SipHasher.

@SimonSapin
Copy link
Contributor

@abonander do you mean slower at constructing the map, or looking up one key in it?

@abonander
Copy link
Collaborator

@SimonSapin constructing the map; I haven't benchmarked accessing it but i suspect it would be a relatively small difference since the hash function is only executed once per lookup.

@YoniFeng
Copy link

YoniFeng commented Mar 9, 2023

It's definitely possible, though it'll be a bit interesting to see how to configure it at compile time.

Hello from the future.
Any thoughts/guidance on how to make this possible? As far as I can tell, there's no way to do this that preserves the simplicity of simply using phf_map! macro straight up where needed.

The abandoned #236 has scaffolded the internal support for a generic hash function.
The question remains, how can this generic hasher be passed and used by the proc macro?
I'm new to Rust, but after extensive searching it seems there's no support for this. Which makes sense, macros process source code and output/add source code.

To solve this, it seems to me that the user would need to have an additional proc-macro intermediary crate for their project.
In this intermediary crate, they would define their custom hasher and export new phf_map! (etc.) macros that do the compile-time generation with that custom hasher. (rust-phf could supply a macro that takes care of this).
Those macros are then used within the project, exactly where the original phf_map! macros are used.
Proc-macro "inception".

Am I wrong/missing a better way to do this?

Swatinem added a commit to Swatinem/rust-phf that referenced this issue Jul 30, 2023
Solves part of rust-phf#88 by providing a separate function to customize the hashing function during generation.
That way, one can avoid usage of the default SipHash as well as avoid having a `PhfHash` bound on the keys.
This custom hash function can be used in combination with `get_index` directly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants