Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generic Hashers #236

Draft
wants to merge 5 commits into
base: master
Choose a base branch
from
Draft

Generic Hashers #236

wants to merge 5 commits into from

Conversation

bhgomes
Copy link
Contributor

@bhgomes bhgomes commented Aug 22, 2021

This is a draft proposal for a generic hasher interface (#88) for this library.

How this Works

  1. Adds PhfHasher trait
  2. Adds generic PhfHasher parameter to maps/sets
  3. Adds FmtConstPath trait to hook PhfHasher into phf_codegen
  4. Uses quote::ToTokens trait to hook PhfHasher into phf_macros

Note on Point 1

Right now, the PhfHasher trait is only responsible for the initial PHF distribution (think SipHasher keys) and the actual hashing (computing Hashes object). The PHF generation algorithm is still the same as before, but if it makes more sense to use a separate generation algorithm for each hashing algorithm, then we should replace PhfHasher with something like the following which is a little simpler than what the current draft proposes:

pub trait PhfHasher {
	/// PHF Generation Error
	type Error;
	
	/// Try to generate a perfect hash function for the list of `entries`.
	fn try_generate<T>(entries: &[T]) -> Result<HashState, Self::Error>
	where
		T: PhfHash;
	
	/// Hashes the data at `x` returning the computed displacement parameters.
	fn hash<T>(&self, x: &T) -> Hashes
	where
		T: PhfHash;
}

where HashState and Hashes are the computed map/displacements and index respectively.

Also, in this PR, phf_generator is removed since it only contains the generation algorithm which should live closer to PhfHasher and eventually could be part of it.

Note on Point 4

Right now, the macro interface itself has not changed so it's not possible with this current draft to use custom hashers with macros. To enable this, we would need to have to extend the macro interface or have some sort of type inference to detect the hasher. It's still not clear to me how we should do this.

@overlookmotel
Copy link

Can I ask, is anything blocking advancement of this feature? Can I help?

@JohnTitor
Copy link
Member

I haven't reviewed this because it's WIP, @bhgomes if you're still interested in this, could you resolve merge conflicts and make it ready for review?

@YoniFeng
Copy link

YoniFeng commented Mar 7, 2023

@JohnTitor
I think I'm interested in making macros that prioritize fast lookups for static data (set/map etc).
I'm not sure if I should look to start a separate project or try to add to phf.
Sharing some thoughts, @JohnTitor I'd appreciate your feedback when you get a chance (no rush!).

It seems like phf prioritizes space over speed, true to its name - not settling for less than perfect hashing.
The main use case seems to be having a compact-as-possible const map/set.
This PR (I've just barely skimmed it) aimed to add the ability to supply any hash function.
I would like to take it even further:

  • The ability to relax the minimal hashing requirement (i.e. be willing to produce a sparse non-minimal array, i.e. not a surjective mapping, not a phf).
    This allows for very fast lookups using arbitrary hash functions that otherwise may never produce a perfect mapping, or take a very long time to do so.
  • The ability to supply a comparison function for the keys.
    This can be used to squeeze that little bit of extra performance by supplying comparison function more optimal than the default for the key type by utilizing known properties of the set of keys.
    For example, suppose the keys are strings that are known to have unique suffixes.

Do think this can/should be a feature of the phf crate?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants