Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Hash #231

Open
michaelmior opened this issue Aug 3, 2022 · 6 comments
Open

Implement Hash #231

michaelmior opened this issue Aug 3, 2022 · 6 comments

Comments

@michaelmior
Copy link
Contributor

Would it be possible to implement the Hash trait for RoaringBitmap? It looks like this can be derived automatically (via ArrayStore/BitmapStore, Store, and Container). Alternatively, perhaps something like the hash function in pyroaring would be more suitable?

@michaelmior
Copy link
Contributor Author

I tried benchmarking this vs BTreeHash which is the closest thing I'm aware of in the standard library. Hashing roaring bitmaps is orders of magnitude faster when simply using the derived implementation. My branch is here.

@RaduBerinde
Copy link
Contributor

I don't think it can be derived automatically - identical RoaringBitmaps with different representations (array vs bitmap) should have the same hash.

@michaelmior
Copy link
Contributor Author

@RaduBerinde Good point! I somehow hadn't thought of that. It happens to be working as far as I can tell, although that could be just by chance.

@RaduBerinde
Copy link
Contributor

RaduBerinde commented Aug 14, 2022

It's possible there is currently only one possible choice, depending on the cardinality (we call "ensure correct store" after each op). But maybe that will change in the future (I actually found this surprising, as it invites some annoying cornercases, like repeatedly adding and removing an element when cardinality is at the cutoff).

@Kerollmops
Copy link
Member

One simple way for a good enough impl Hash for RoaringBitmap could be to iter on the numbers. The impl Hash for Treemap could simply be a basic derive as the contained types implement Hash (a BTreeMap of RoaringBitmaps).

I understand the performance concern but I just wanted to notice that.
Thank you very much for the time you take to help on this project!

we call "ensure correct store" after each op

Not sure about the performance, also it can only be called, as you said, after an op, as the bitmap must be mutable.

@Oppen
Copy link

Oppen commented Nov 10, 2022

My $0.02: Java, C and Go allow two equivalent bitmaps to have mismatched hashes if one contains run containers and the other doesn't, so from a consistency and speed (now you can operate on bytes plain and simple!) point of view it's still an option here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants