Why is hashing stateless? #31

JeffGreen · 2019-10-22T21:39:35Z

Hello.

I'm trying to understand the design decision behind requiring hashing to be stateless.

Each call to hashByte/hashLong etc. starts with the state equal to the original seed and calls finalize at the end. As far as I can tell, this means that you can't chain these hash calls (like you can with Guava, for example) unless you keep passing finalized() hashes around.

Am I missing something? It seems like we should require the user to call finalize and to have a stateful long hash variable that can be reset() in each implementation. This allows the user to replicate the current behavior by calling reset()-hashByte()-finalize() but also to chain calls to each of the hash functions (reset()-hashByte()-hashLong()...finalize()).

The text was updated successfully, but these errors were encountered:

leventov · 2019-10-23T07:06:02Z

Removing the currently-present API (single method calls) and making users to emulate it would be a bad decision, because requiring people to call three methods when just one suffices 95% of the time doesn't make sense.

Augmenting the current API with support for chaining is possible.

JeffGreen · 2019-10-23T17:05:13Z

Agreed, removing / breaking the current API would be a mistake. Doing a bit of refactoring to enable chaining seems like a better move as you suggest.

gzm55 · 2020-02-18T05:05:26Z

Streaming version api (with state) can be use to wrap a hash function as a OutputStream. It will reduce the overall allocations when we hash an object, which is the real scene in many RPC servers.

With stateless api, hashing the object will perform many allocations on the serialization step:

ByteBuffer/byte[] ser_results = do_serialization(obj);
long h = long_function.hash64(ser_results);

Meanwhile, the streaming api will be friendly for less allocations:

HashOutputStream hash_stream = wrap_streaming_hash(long_function_streaming);
do_serialization_to_a_stream(ser_results, hash_stream);
long h=hash_stream.finalize();

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why is hashing stateless? #31

Why is hashing stateless? #31

JeffGreen commented Oct 22, 2019

leventov commented Oct 23, 2019

JeffGreen commented Oct 23, 2019

gzm55 commented Feb 18, 2020

Why is hashing stateless? #31

Why is hashing stateless? #31

Comments

JeffGreen commented Oct 22, 2019

leventov commented Oct 23, 2019

JeffGreen commented Oct 23, 2019

gzm55 commented Feb 18, 2020