Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix asymptotic hashing performance #164

Merged
merged 4 commits into from
Jul 13, 2019

Conversation

abonander
Copy link
Collaborator

The previous implementation took a single 64-bit hash and split it into 3 21-bit components.

This implementation takes 2 64-bit hashes and splits them into 3 32-bit values, discarding the upper 32 bits on the second hash. For comparison

$ time cargo run --bin gen_hash_test --release --features criterion
   Compiling phf_shared v0.7.24 (/home/austin/rust/rust-phf/phf_shared)
   Compiling phf_generator v0.7.24 (/home/austin/rust/rust-phf/phf_generator)
    Finished release [optimized] target(s) in 0.70s
     Running `/home/austin/rust/rust-phf/target/release/gen_hash_test`
112.18user 0.11system 1:51.61elapsed 100%CPU (0avgtext+0avgdata 262776maxresident)k
0inputs+6280outputs (0major+58275minor)pagefaults 0swaps
$ time cargo run --bin gen_hash_test --release --features criterion
   Compiling phf_shared v0.7.24 (/home/austin/rust/rust-phf/phf_shared)
   Compiling phf_generator v0.7.24 (/home/austin/rust/rust-phf/phf_generator)
    Finished release [optimized] target(s) in 0.77s
     Running `/home/austin/rust/rust-phf/target/release/gen_hash_test`
4.83user 0.11system 0:04.14elapsed 119%CPU (0avgtext+0avgdata 265640maxresident)k
0inputs+6168outputs (0major+58545minor)pagefaults 0swaps

I didn't use the test data from #132 as I didn't want to include such a large file in the repo; still, this seems to demonstrate a significant reduction in asymptotic behavior with the algorithm.

@abonander
Copy link
Collaborator Author

cc @sfackler @derekdreery

@sfackler
Copy link
Collaborator

Also worth trying the 128-bit output in siphasher itself - might be faster than hashing twice: https://docs.rs/siphasher/0.3.0/siphasher/sip128/index.html

@sfackler
Copy link
Collaborator

Really interesting that this has such a huge impact!

@abonander
Copy link
Collaborator Author

Also worth trying the 128-bit output in siphasher itself - might be faster than hashing twice: https://docs.rs/siphasher/0.3.0/siphasher/sip128/index.html

It's faster by maybe 5% but it simplifies the code a bit so an overall positive change.

@abonander
Copy link
Collaborator Author

Failure on nightly is due to a bug that has been fixed; build should pass if rerun tomorrow. rust-lang/rust#62562

@abonander
Copy link
Collaborator Author

Passed, merging.

@abonander abonander merged commit 70129c6 into rust-phf:master Jul 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants