Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Ephemeron interface is incomplete #12536

Open
mlemerre opened this issue Sep 6, 2023 · 5 comments
Open

New Ephemeron interface is incomplete #12536

mlemerre opened this issue Sep 6, 2023 · 5 comments

Comments

@mlemerre
Copy link

mlemerre commented Sep 6, 2023

Hello,

I am developing Map and Set data structures using Patricia trees and I would need to have one version that has weak keys. To do that, it suffices to replace the key-value pair by an Ephemeron.K1.t, and indeed this works well with the 4.xx interface.

However, many functions are removed in the new 5.0 interface, and we only have access to a make function to create the key-value pair, and a query function to retrieve the data from a key. In particular, we no longer have any way to retrieve the key from the Ephemeron.

It seems that the get_key function is still available in the Obj.Ephemeron interface, which is used by the Hashtable implementation based on Ephemerons. Indeed, not having a way to retrieve the key makes the interface very limited: in the case of a Map implementation, folding or merging maps is impossible, only find can be implemented.

My question is thus the following: is there a planned way to get the key-value pair from the official Ephemeron interface in the future, is the Obj.Ephemeron planned to be supported for some time, or what alternative solution can I use? The only portable alternative that I can see, which is to have a redundant weak reference to the key in the leaves of the map together with the Ephemeron, would be a huge waste in memory.

@gasche
Copy link
Member

gasche commented Sep 6, 2023

I am not sure, but I believe that this feature request is actually the same as the one in #12251, which was closed as "not planned".

@mlemerre
Copy link
Author

mlemerre commented Sep 6, 2023

Thanks for the pointer! The "weak_query" function in #12251 would indeed solve my problem. And @Elarnon also observed that we could bypass the API restriction by adding a redundant weak reference.

I really don't understand the point of this new GC-oblivious API:

  • The "surprising behaviour" of the normal API mentionned by @kayceersk in new ephemeron API, implemented on top of old ephemerons #10737 is not eliminated, as it still happens if we have a pair (weak ref to key, ephemeron)
  • This surprising behaviour would be eliminated by the weak_query function proposed by @Elarnon (and ephemerons are low-level anyway, so a bit of surprise is probably OK);
  • It seems hard to have an ephemeron without a reference to the key, so it is quite natural to just be able to read this key;
  • As said by @jberdine in new ephemeron API, implemented on top of old ephemerons #10737, the Bucket API cannot be used to implement Ephemeron.Make (and the current 5.0 implementation still uses the hidden old API). Indeed it seems difficult to implement anything else than a Hashtable where keys are physically compared with this API, without having a redundant storage of the key.

I would be very interested in knowing what the plan is regarding this API: I use very large maps (thus the solution to store a pair of a weak ref and an ephemeron instead of just an ephemeron at every leaf would incur a prohibitive memory consumption), so I don't know yet how I could make my code future-proof.

@stedolan
Copy link
Contributor

stedolan commented Oct 2, 2023

A style that is supported by the new API is to have two separate data structures:

  1. A single weak hashconsing table, which turns logically equal inputs into physically equal (hashconsed) outputs
  2. Maps keyed by physical equality using ephemerons, whose key type is hashconsed.

Does that suit your needs? If not, could you explain exactly what behaviour you're looking for from your weak-keyed maps?

In particular, if I have a local variable k1, what should happen if I do Map.find m k1 if m is a map containing a binding for some k2 which is not physically equal to k1 but satisfies compare k1 k2 = 0? Is the result deterministic?

What is Map.merge supposed to do on maps that contain physically unequal but logically equal keys? Is it allowed to depend on when GC runs?

@mlemerre
Copy link
Author

mlemerre commented Oct 3, 2023

In my current implementation, I do hashcons the keys, so it would already be nice if the new API supported this! Though I fail to understand how this could be done.

I would have to think on the behavior questions that you raise for my use case if I did not hashcons the keys. My use case is that of an abstract interpreter where keys are symbolic expressions, mapped to abstract values in a map (e.g. intervals). Sometimes the key are no longer used, thus we no longer need to attach a value to them. This means that we loose some information that we should no longer need.

Now, I can imagine executions were precision would not be deterministic because of garbage collection, which would be cumbersome for testing purposes when that happens (but should be infrequent in practice), but on the other hand the weak key hash can free a lot of memory with few effort from the programming side, so it is often a good tradeoff.

@zapashcanon
Copy link
Contributor

One thing that I'm missing is a stdlib defined module signature that represents the common output signature of the Hashtbl.Make and Ephemeron.K1.Make functors.

Before, it was possible to write:

module Make (Cache : Hashtbl.S) = struct
  ...
end

module Make_weak (H: Hashtbl.HashedType) = Make (Ephemeron.Make (H))

module Make_strong (H: Hashtbl.HashedType) = Make (Hashtbl.Make (H))

Now, it is not possible anymore. You can not substitute Hashtbl.S for Ephemeron.S because it has extra functions like clean that are not included in Hashtbl.

Would it be possible to add a module signature definition representing the shared part ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants