Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider replacing HashMap with HashSet #119

Open
matklad opened this issue Dec 1, 2021 · 1 comment
Open

Consider replacing HashMap with HashSet #119

matklad opened this issue Dec 1, 2021 · 1 comment

Comments

@matklad
Copy link
Contributor

matklad commented Dec 1, 2021

Today, the core data structure is a hash-map of LruEntries keyed by key refs

struct LruEntry<K, V> {
    key: mem::MaybeUninit<K>,
    val: mem::MaybeUninit<V>,
    prev: *mut LruEntry<K, V>,
    next: *mut LruEntry<K, V>,
}

struct Lru {
  map: HashMap<KeyRef<K>, Box<LruEntry<K, V>>, S>,
}

Essentially, the data strored here has this shape: HashMap<&K, (K, V)>. That is, it's a hash map which stores objects, where the key is the part of the object itself. THe current implementation is less efficient than it could have been, because we effectively store the K twice (once as a part of (K, V), and once as a &K). A more efficient impl would be HashSet<Entry<K, V>>, where entry is defined as follows:

struct Entry<K, V> { key: K, value: V}
impl<K, V> Borrow<K> for Entry<K, V> {}

Here's a playground that shows the pattern: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=bc60814a14a353784d3ba8c89c2cc109

@matklad
Copy link
Contributor Author

matklad commented Sep 5, 2022

Spend some time looking into this. This is harder than it seems for somewhat stupid reasons.

First thing I've run into is that I need something like HashSet:get_mut(k: &K) -> Option<&mut K> to fudge the pointers. Sadly, std only exposes & version (so that it's impossible to break hash map by modifying the value). This can be worked-around with Cell for next/prev pointers, but unfortunately we also need &mut V for the value for the case where we update existing entry.

The &mut API we need is available via unstable raw_entry feature, as well as via hash brown.

So I wanted to just use hashbrown (ie, make it non optional), but, alas, that was not trivial, due to different DefaultHasher business. Basically, I think what we want here is this pattern. We'd also have to fix iterator definitions in this crate, as today they only are defined for the default value of S.

On the positive side, switching to hasbrown would allow us to correct wrong KeyRef<K>: Borrow<Q> bounds into correct K: Borrow<Q> ones.

Aaaand, while working on it, I've noticed that the code I think isn't fully kosher with respect to stacked borrows. Specifically, bits like this:

            let old_key = KeyRef {
                k: unsafe { &(*(*(*self.tail).prev).key.as_ptr()) },
            };
            let mut old_node = self.map.remove(&old_key).unwrap();

Here, map.remove has &mut over the whole map (so, presumably, over all entries as well). But in Eq for KeyRef, we also create & for a single key, which I am not sure is fully correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant