-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Deprecate Weak.get_*_copy and Ephemeron.K*.get_*_copy #9990
base: trunk
Are you sure you want to change the base?
Conversation
@stedolan @damiendoligez @pascal-cuoq you could be interested in the discussion and for testing. I wonder if the weak hashset could arbitrarily reorder the elements in the buckets to mitigate the bad case. |
With this change, do I understand correctly that the caveat "The equal relation must be able to work on a shallow copy of the values and give the same result as with the values themselves." in the documentation of Weak hash sets could be removed? That would be a nice result. I've stubbed my toe into that constraint in the past. |
Most users of OCaml are probably only interested in 64-bit code generation. I know I am. In 64-bit code generation, the full-hash-comparison guard is in principle very good at preventing unnecessary calls to On the one hand, the scenario were a value remains in the heap indefinitely just because its hash fully collides with the hash of another popular value is a memory leak. On the other hand, there are already (before this PR) scenarios where an unreachable chained value is recovered slowly because
It sounds complicated. If the full hash collision scenario happens often in practice, the behavior without any mitigation would be rather bad, as unreachable values at the beginning of the bucket would not only stay alive forever but also waste time at each lookup, so it is worth thinking about a mitigation, but this one seems too complex to me. I would rather simply have a way to detect that full hash collisions are happening more often than they should. This could be done from the outside if the weak hashset type was not abstract. A function scanning the entire weak hashset for such collisions can be implemented inside the |
What about simply moving the element we just looked up at the beginning of the bucket? Then, the unreachable element will percolate to the end of the bucket and will no longer be compared when looking up the other elements. There are two remaining issues:
|
528f38f
to
8512f2f
Compare
Indeed it can be removed, done in the PR for clarity. |
I though doing the reordering during the hashset shriking. We can can just compute at each resizing the number of such collision, and add a field to keep this value.
If we reorder during shrinking, we are already at a time which is modifying the datastructure. Currently shrinking is done by iterating the values from first to last, we can iterate differently for example by looking at the first or last of the cell not looked up. We just have to add |
* They are similar to a hidden Obj.copy * They can't copy everything * They are perhaps not needed in the weak hashset, the only place used in the stdlib, since the original hash is kept limiting the use of the equality function. The drawbacks is if you have a full hash collision, between a value that is unreachable and a value that is often accessed, the unreachable value will possibly stay alive if it append to be before the other value in the bucket.
8512f2f
to
cc26222
Compare
I believe this PR has been superseded by #10737 and can now be closed. All the function it is trying to deprecate have been marked as deprecated in OCaml 4.14 |
ah, good catch. I missed that one, I was only looking at the Ephemeron module. |
We should aim to remove |
The drawbacks is if you have a full hash collision, between a value that is unreachable
and a value that is often accessed, the unreachable value will possibly stay
alive if it append to be before the other value in the bucket.
This MR is at that time mainly to allow people that use hashconsing to see if they have regression.