Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document a way for other languages to get the same output for a given input string #30

Open
sean-abbott opened this issue Jun 21, 2019 · 4 comments

Comments

@sean-abbott
Copy link

The xxHash implementation doesn't produce predictable results when compared to python for strings (as in the given example.

I believe this is due to java's handling of character arrays: https://codeahoy.com/2016/05/08/the-char-type-in-java-is-broken/

We get predictable results when we dump to a byte array instead:
LongHashFunction.xx().hashBytes("test".getBytes()) gets the same output as
xxhash.xxh64('test').intdigest() which is the result that we'd expect.

We can't figure out how to have the python implementation find the same key for LongHashFunction.xx().hashChars("test")

If there's a way for other languages to get the same output for a given input string, it'd be nice to have it documented.

@leventov
Copy link
Member

We can't figure out how to have the python implementation find the same key for LongHashFunction.xx().hashChars("test")

You probably need to encode a string in UTF-16, because that's what essentially the Java's character arrays are. Also would need byte order, though.

@leventov leventov changed the title xxHash implementation produces inconsistent results compared to python Document a way for other languages to get the same output for a given input string Jul 19, 2019
@drummerwolli
Copy link

anyone been able to provide this? we would be interested in this as well.

@anzecesar
Copy link

We had the same issue, except with nodejs.

In the end what produced the same hash was:

LongHashFunction.xx(123).hashBytes("teststring".getBytes("UTF-8"))

and in nodejs:

const hash = XXHash.hash64(Buffer.from('teststring', 'utf-8'), 123);
const hashValue = hash.readBigUInt64LE();

I hope this helps someone in the future.

@gzm55
Copy link
Collaborator

gzm55 commented May 26, 2021

This should not be an issue.

the default encoding in different languages are not same. to get the same hash result, the binary layout of the input must be exactly same, so select a well defined encoding codec before hash.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants