Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Deprecate Weak.get_*_copy and Ephemeron.K*.get_*_copy #9990

Open
wants to merge 1 commit into
base: trunk
Choose a base branch
from

Conversation

bobot
Copy link
Contributor

@bobot bobot commented Oct 25, 2020

   * They are similar to a hidden Obj.copy
   * They can't copy everything
   * They are perhaps not needed in the weak hashset (the only place used in
     the stdlib) since the original hash is kept limiting the use of the
     equality function.

The drawbacks is if you have a full hash collision, between a value that is unreachable
and a value that is often accessed, the unreachable value will possibly stay
alive if it append to be before the other value in the bucket.

This MR is at that time mainly to allow people that use hashconsing to see if they have regression.

@bobot
Copy link
Contributor Author

bobot commented Oct 25, 2020

@stedolan @damiendoligez @pascal-cuoq you could be interested in the discussion and for testing.

I wonder if the weak hashset could arbitrarily reorder the elements in the buckets to mitigate the bad case.

@jberdine
Copy link
Contributor

With this change, do I understand correctly that the caveat "The equal relation must be able to work on a shallow copy of the values and give the same result as with the values themselves." in the documentation of Weak hash sets could be removed? That would be a nice result. I've stubbed my toe into that constraint in the past.

@pascal-cuoq
Copy link

pascal-cuoq commented Oct 25, 2020

Most users of OCaml are probably only interested in 64-bit code generation. I know I am. In 64-bit code generation, the full-hash-comparison guard is in principle very good at preventing unnecessary calls to Weak.get (previously Weak.get_copy).

On the one hand, the scenario were a value remains in the heap indefinitely just because its hash fully collides with the hash of another popular value is a memory leak. On the other hand, there are already (before this PR) scenarios where an unreachable chained value is recovered slowly because get_copy marks the children of the argument as reachable for the current cycle. The important thing is that these theoretical concerns do not occur in practice, and that a workaround is ready if someone encountered them and wasn't able to avoid them by using a better hash function (run Gc.full_major from time to time). The only systematic solution would be to move away from incremental snapshot-at-beginning garbage collection and even that seems unnecessary as long as the few users of Weak manage in practice with the current implementation.

I wonder if the weak hashset could arbitrarily reorder the elements in the buckets to mitigate the bad case.

It sounds complicated. If the full hash collision scenario happens often in practice, the behavior without any mitigation would be rather bad, as unreachable values at the beginning of the bucket would not only stay alive forever but also waste time at each lookup, so it is worth thinking about a mitigation, but this one seems too complex to me.

I would rather simply have a way to detect that full hash collisions are happening more often than they should. This could be done from the outside if the weak hashset type was not abstract. A function scanning the entire weak hashset for such collisions can be implemented inside the Weak.Make functor as an additional function taking O(n) time without changing the representation. More palatable, also without changing the representation, an alternative lookup function that would tell if a full hash collision happened during the lookup at the same time that it returns the result of the lookup would have the same complexity as the existing lookup function. I would use such a function to make sure that the proportion of full hash collisions is as low as it should be.

@jhjourdan
Copy link
Contributor

I wonder if the weak hashset could arbitrarily reorder the elements in the buckets to mitigate the bad case.

What about simply moving the element we just looked up at the beginning of the bucket? Then, the unreachable element will percolate to the end of the bucket and will no longer be compared when looking up the other elements.

There are two remaining issues:

  • we still need to iterate the whole bucket (and hence make everything alive) in the case of a failing look-up
  • modifying the data-structure when doing a lookup might be considered a bad practice. This requires the use of caml_modify, and it makes impossible to use lookups in a multi-threaded setting.

@bobot bobot force-pushed the deprecate_get_copy_in_ephemerons branch from 528f38f to 8512f2f Compare October 26, 2020 09:20
@bobot
Copy link
Contributor Author

bobot commented Oct 26, 2020

"The equal relation must be able to work on a shallow copy of the values and give the same result as with the values themselves."

Indeed it can be removed, done in the PR for clarity.

@bobot
Copy link
Contributor Author

bobot commented Oct 26, 2020

A function scanning the entire weak hashset for such collisions can be implemented inside the Weak.Make functor as an additional function taking O(n) time without changing the representation.

I though doing the reordering during the hashset shriking. We can can just compute at each resizing the number of such collision, and add a field to keep this value.

modifying the data-structure when doing a lookup might be considered a bad practice. This requires the use of caml_modify, and it makes impossible to use lookups in a multi-threaded setting.

If we reorder during shrinking, we are already at a time which is modifying the datastructure. Currently shrinking is done by iterating the values from first to last, we can iterate differently for example by looking at the first or last of the cell not looked up. We just have to add swap w i j to swap two values from the weak arrays.

       * They are similar to a hidden Obj.copy
       * They can't copy everything
       * They are perhaps not needed in the weak hashset, the only place used in
         the stdlib, since the original hash is kept limiting the use of the
         equality function.

The drawbacks is if you have a full hash collision, between a value that is unreachable
and a value that is often accessed, the unreachable value will possibly stay
alive if it append to be before the other value in the bucket.
@kit-ty-kate
Copy link
Member

I believe this PR has been superseded by #10737 and can now be closed. All the function it is trying to deprecate have been marked as deprecated in OCaml 4.14

@kayceesrk
Copy link
Contributor

#10737 does not remove Weak.get_copy:

val get_copy : 'a t -> int -> 'a option
.

@kit-ty-kate
Copy link
Member

ah, good catch. I missed that one, I was only looking at the Ephemeron module.

@kayceesrk
Copy link
Contributor

We should aim to remove Weak.get_copy if this is still something we want to do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants