zobristhashing

Zobrist hashing in C

Zobrist hashing is the simplest form of tabulation-based hashing. It can be shown to be 3-wise independent. The Zobrist approach tested here is used in real systems, e.g., Gigablast https://www.gigablast.com/ Alternatively, one could use a tabulation-based function as a complement to other hash functions: first hash the content down to a few bytes (e.g., 4) and then apply a tabulation-based hash on the result.

Tabulation-based hashing uses a lot of memory and is susceptible to cache faults. E.g., to hash 4-byte strings to 64-bit values, you need 8 KB. Moreover, its speed is limited (in part) by the system's ability to issue random access loads.

In an exhaustive experimental evaluation of hash-table performance, Richter et al. (VLDB, 2016) found that Zobrist hashing produces a low throughput. Consequently, the authors declare it to be "less attractive in practice" than its strong randomness properties would suggest.

This C code expects a GCC-like compiler on an x64 system.

The code demonstrates that it is difficult on a x64 to hash much more than 0.65 bytes per cycle on recent Intel processors, even when repeatedly hashing the same short string. In contrast, it is possible to hash 4 to 10 bytes per cycle using fast hash families. See https://github.com/lemire/StronglyUniversalStringHashing

C Usage

zobrist_t k;
init_zobrist(& k);// call once

// then you can hash as many strings as you want:
uint64_t hashvalue = zobrist (mystring, mystringsize, &k)

// for null-terminated strings, you can use:
uint64_t hashvalue = zobrist_nt (mystring,  &k)

// strings longer than 256 bytes fail the theoretical bounds

Usage

make
./benchmark

Relevant software libraries

References

Daniel Lemire, The universality of iterated hashing over variable-length strings, Discrete Applied Mathematics 160 (4-5), 2012. http://arxiv.org/abs/1008.1715
Owen Kaser, Daniel Lemire, Strongly universal string hashing is fast, Computer Journal 57(11), 2014. http://arxiv.org/abs/1202.4961
Daniel Lemire, Owen Kaser, Faster 64-bit universal hashing using carry-less multiplications, Journal of Cryptographic Engineering (to appear) http://arxiv.org/abs/1503.03465

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
benchmarks		benchmarks
include		include
src		src
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

benchmarks

benchmarks

include

include

src

src

.gitignore

.gitignore

LICENSE

LICENSE

Makefile

Makefile

README.md

README.md

Repository files navigation

zobristhashing

C Usage

Usage

Relevant software libraries

References

About

Releases

Packages

Languages

License

lemire/zobristhashing

Folders and files

Latest commit

History

Repository files navigation

zobristhashing

C Usage

Usage

Relevant software libraries

References

About

Topics

Resources

License

Stars

Watchers

Forks

Languages