Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce boxing #72

Open
Ralith opened this issue Feb 6, 2020 · 7 comments
Open

Reduce boxing #72

Ralith opened this issue Feb 6, 2020 · 7 comments

Comments

@Ralith
Copy link

Ralith commented Feb 6, 2020

The current implementation boxes every key/value pair to obtain a stable address with which to construct an intrusive linked list. While investigating new designs following #70, I realized that this indirection and the associated reduction in locality can, in principle, be avoided.

The payload of a HashMap is already heap-allocated, and only moves when the container's capacity grows and it's rehashed. An intrusive linked list could therefore be soundly maintained by the HashMap implementation itself by including the links in the underlying raw table and rewriting them when copying them over in the course of a resize.

Unfortunately, I can't think of a way to implement this without this crate containing its own full hash table implementation; even hashbrown's RawTable API does not expose a way to customize the rehashing procedure, and I don't see a way to do it as a second pass. Maybe worth reaching out to @Amanieu or someone to see if there's a way hashbrown could enable this so an effective fork, and the resulting maintenance headache, isn't necessary?

@Amanieu
Copy link

Amanieu commented Feb 6, 2020

Have you considered using a copy of the key instead of a pointer? Generally keys are pretty small, but you would need K: Clone which may be a breaking change.

@Amanieu
Copy link

Amanieu commented Feb 6, 2020

Ah nevermind, this would work poorly if the key is something like a String.

Regarding the hash table, I don't think hashbrown is a good fit, but you may be interested in Amanieu/intrusive-rs#37.

@Ralith
Copy link
Author

Ralith commented Feb 6, 2020

I don't think hashbrown is a good fit

It'd be nice to benefit from hashbrown's optimizations. It seems like the only missing piece here is the ability for the caller to control the rehashing operation. In particular, it would be necessary for this lib to remove elements from the old table and insert them into the new one in LRU list order, constructing the correct chain as it goes.

That could be accomplished by something as simple as impl RawTable<T> { fn resize(&mut self, n: usize, rehash: impl Fn(&mut [MaybeUninit<T>], &mut [MaybeUninit<T>])) }. Is anything like that completely out of the question?

@Ralith
Copy link
Author

Ralith commented May 22, 2020

On review, I think this could be accomplished with the existing interface by constructing the LRU list from bucket indices rather than raw pointers. As a bonus, I think this would allow removing nearly all of the unsafe from this crate, including the dicey uninitialized stuff.

@Ralith
Copy link
Author

Ralith commented Aug 1, 2020

@jeromefroe, would you be interested in a PR to replace the use of pointers with bucket indices, and thereby remove all the boxing?

@jeromefroe
Copy link
Owner

@Ralith definitely, would be more than happy to review it.

@Ralith
Copy link
Author

Ralith commented Aug 2, 2020

Hmm, bucket indexes still need to be rewritten on rehash. This would be really easy if hashbrown exposed a hook to rewrite entries on rehash, and otherwise seems infeasible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants