Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make global state domain-local in Random, Hashtbl and Filename #582

Merged
merged 12 commits into from
Jun 17, 2021

Conversation

kayceesrk
Copy link
Contributor

@kayceesrk kayceesrk commented Jun 11, 2021

Fixes issues #564, #565, #566.

Depends on PR #579.

Copy link
Collaborator

@ctk21 ctk21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR fixes the Stdlib issues which are pressing.

Down the line we want to have a uniform efficient way for user level DLS variables vs the special fast locations for Stdlib variables introduced in this PR. We should probably file an issue for that after we get this one done.

I had a one place where we should avoid causing an allocation in DLS.get and minor naming/documentation nits.

(I ignored the diff coming from #579).

stdlib/domain.ml Outdated Show resolved Hide resolved
stdlib/hashtbl.ml Outdated Show resolved Hide resolved
stdlib/random.ml Outdated Show resolved Hide resolved
stdlib/random.ml Outdated Show resolved Hide resolved
This enables constant-time access to any field in the domain-local
state. Hence, there is no longer a need to specialise the domain-local
state for local states in Stdlib. This simplifies the code massively
compared to the previous design.
@kayceesrk
Copy link
Contributor Author

I've significantly simplified the PR thanks to making the domain-local state an array allowing constant-time access.

stdlib/hashtbl.ml Outdated Show resolved Hide resolved
stdlib/hashtbl.ml Outdated Show resolved Hide resolved
stdlib/random.mli Outdated Show resolved Hide resolved
stdlib/domain.ml Show resolved Hide resolved
stdlib/StdlibModules Outdated Show resolved Hide resolved
Copy link
Contributor

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reviewed the new runtime code and the use of Random in other parts of the stdlib (Filename.temp_file_name and randomized hashtables), and I believe that those are correct.

However, I am not sure about the behavior we get for the Random module itself.

For users that want self-initialization, Domain.DLS.new_key Random.State.make_self_init is a good approach (if a bit low-level), it gives the expected behavior of an independent self-initialized seed for all domains. They can also use Random.int and other global-state functions to the same effect, as long as they remember to call Random.self_init at the creation of each domain. (Are they going to remember this? I'm a bit worried.)

However, for users that want to initialize their generator with a fixed seed, I don't think that the current state is very satisfying. If they don't do anything, all domains are going to be initialized with the same seed, so all domains are going to draw the same random sequence. This is pretty bad, creating unwanted correlations in the program.

What they probably want is to either:

  • Set the seed of new domains by consuming random state from the initial domain (to spawn 9 new domains, draw 9 random integers in the initial domain and set the seed of each new domain by the corresponding random integer using Random.init). This requires non-trivial logic, I don't think that most Random users would be careful enough to do this.
  • Or call Random.full_init : int array -> unit with a seed of their choice and the domain identifier. This is simple, but I am not sure it is a good idea: if we stop a domain and start another one right after, are we going to get a "fresh" domain identifier, or the same identifier as some previous domain? (The latter would be my guess from skimming the runtime code.)

I think that a multi-domain program and a global-state random generator are fundamentally incompatible and we are never going to get a stdlib design that feels nice anyway. (In particular, we can't hope that a fixed seed will make random-generation deterministic, as long as random-using program fragments are scheduled non-deterministically on the domains.) Trying to do magic things for user convenience is not a good idea, the model must be simple so that users can easily reason about the sort of randomness they get. But I think it's worth giving some thoughts to what the basic needs are, and how to meet them correctly; they should be documented, and, ideally, on the path of least effort.

For me the basic needs are as follows:

  • The user wants self-initialization for a multi-domain program. Currently this requires specific thinking when spawning new domains, but predictable and easy to achieve. This should be documented carefully in the Random documentation. I considered the idea of remembering that self_init has been called on the main domain and doing it implicitly on other domains, but that would be too magic.
  • The user wants fixed-seed initialization for the program, but different domains should draw different values. I don't know what is a simple and correct approach to do this currently.

Remark: if we had a random generator supporting splitting (for example @xavierleroy's pringo), I would expect spawning a new domain to split the random generator state between the spawning domain and the spawned domain. Notice that this does "exactly the right thing" in both the self-initialized and fixed-seed scenarios. But this requires a much tighter coupling between the Random module and the runtime, Random cannot be explained anymore as a user-side library with global state.

@@ -599,6 +599,7 @@ CAMLprim value caml_sys_random_seed (value unit)
data[n++] = getpid();
data[n++] = getppid();
#endif
data[n++] = (intnat)Caml_state->unique_token_root;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current implementation is caml_sys_random_seed is not very nice (I think); in the current state you also need to add this logic to caml_win32_random_seed in runtime/win32.c.

Another approach would be to add your write outside the #ifdef WIN32 ... #endif block, written as if (n < 16) data[n++] .... Then it would happen on both systems, which reduces duplication.

Reading this code made we want to refactor it, so maybe I'll send a PR upstream directly -- without your change.

testsuite/tests/gc-roots/globroots.ml Outdated Show resolved Hide resolved
testsuite/tests/weak-ephe-final/weaklifetime_par.ml Outdated Show resolved Hide resolved
@kayceesrk
Copy link
Contributor Author

@gasche Thanks for the detailed reviews. I have a feeling that the question of whether Random module does the right thing in a multi-domain setting is a question that seems wider than what this PR is trying to do, which is to fix issues with unsafe sharing of random state between multiple domains. As you suggest, the changes are technically correct but whether they are doing the right thing is still an open question.

In order to make progress, may I suggest splitting the question of Random with multiple domains into a separate issue, and we work to merge this PR in? I also don't understand randomness and its implications under multiple domains well, and I shall defer to others who understand the topic well. Who do you suggest may help with this?

@gasche
Copy link
Contributor

gasche commented Jun 15, 2021

Yes, I think the split is very reasonable.

Who do you suggest may help with this?

I strategically pinged @xavierleroy above, because I think he is the right person to ask about this.

@kayceesrk
Copy link
Contributor Author

Great. I've updated runtime/win32.c with corresponding change, though we don't yet support . With that it looks like all of your original concerns had been addressed. Is that a correct reading?

@lpw25
Copy link
Contributor

lpw25 commented Jun 15, 2021

I would expect spawning a new domain to split the random generator state between the spawning domain and the spawned domain. Notice that this does "exactly the right thing" in both the self-initialized and fixed-seed scenarios. But this requires a much tighter coupling between the Random module and the runtime, Random cannot be explained anymore as a user-side library with global state.

Does this suggest a missing feature of domain local state? Perhaps when a new domain is created it should be able to create its new domain local states from the values of the same states in its parent domain? You could probably implement this fairly efficiently by remembering the tree structure on the domains and copying the state array of the parents when they create their children.

@xavierleroy
Copy link
Contributor

I'm catching up on the discussion. Generally speaking, the usual approaches to pseudo-random number generation in multithreaded programs are

  • share a single state between all threads, or
  • use a splittable PRNG such as SplitMix and split the state when a thread is created.

The latter gives you reproducibility (from a fixed initial state) if all the threads are created before they start generating random numbers. However, this reproducibility is lost if the state is attached to a domain and lightweight threads are then interleaved in this domain.

The former sounds expensive in terms of synchronization, but with SplitMix for example you only need to update one 64-bit integer, which you could do with a CAS.

@gasche already pointed to my PRINGO library (https://github.com/xavierleroy/pringo). It's a preliminary investigation of splittable PRNGs that could one day replace OCaml's Random module.

@kayceesrk
Copy link
Contributor Author

Thanks @xavierleroy. I've opened a separate issue to keep track of multi-thread compatible PRNGs #585.

@kayceesrk
Copy link
Contributor Author

Does this suggest a missing feature of domain local state? Perhaps when a new domain is created it should be able to create its new domain local states from the values of the same states in its parent domain? You could probably implement this fairly efficiently by remembering the tree structure on the domains and copying the state array of the parents when they create their children.

Sounds reasonable. I've made an issue to track this: #587.

Copy link
Collaborator

@ctk21 ctk21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The update to this PR removes my concern about 'stdlib specific slots' and gives us O(1) lookups for user keys, which is much nicer.

I had some minor nits: one place where we weren't guarding exceptions and I think we should retain the existing random behaviour in two of the testsuite changes.

I think we should document what users should expect when using Random with domains. For example, I think the following short comment in Random.mli might be enough:

  • By default the Random seed is static and identical across all domains. This means that calling Random.int N repeatedly in two domains will give the same sequence of numbers in each domain.
  • If you wish each domain to have uncorrelated sequences of random numbers then one way to achieve that is to call Random.self_init () on each domain before requesting random numbers.

We have #587 for parent-child DLS and #585 for splittable PRNG but I'm keen in the meantime we document what is there.

stdlib/domain.ml Outdated Show resolved Hide resolved
testsuite/tests/gc-roots/globroots.ml Outdated Show resolved Hide resolved
testsuite/tests/weak-ephe-final/weaklifetime_par.ml Outdated Show resolved Hide resolved
@kayceesrk
Copy link
Contributor Author

Thanks @ctk21. I believe I have addressed all of your comments.

@ctk21
Copy link
Collaborator

ctk21 commented Jun 17, 2021

Thanks @ctk21. I believe I have addressed all of your comments.

Yes, I think it looks good.

I think we are saying that the CI fail is #588 and unrelated to this PR, so will merge?

@kayceesrk
Copy link
Contributor Author

Yep. CI failure is unrelated and I am working on a fix for it.

@@ -586,7 +586,7 @@ CAMLprim value caml_sys_random_seed (value unit)
/* If the read from /dev/urandom fully succeeded, we now have 96 bits
of good random data and can stop here. Otherwise, complement
whatever we got (probably nothing) with some not-very-random data. */
if (n < 12) {
if (n < 11) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: after thinking harder about this, I think that changing n < 12 to n < 11 is not the best way to do this change. (If we get exactly 11 bytes from /dev/urandom, we do want to add some non-random seeds; currently the after-PR implementation will lack a bit of entropy in this weird corner case that will probably not occur in practice.)

I proposed a refactoring of the upstream function in ocaml/ocaml#10472 that would have made it easier to correctly extend the function.

sadiqj pushed a commit to sadiqj/ocaml that referenced this pull request Jan 10, 2022
…ticore/kayceesrk/stdlib_dls

Make global state domain-local in Random, Hashtbl and Filename
c-cube pushed a commit to c-cube/ocaml that referenced this pull request Feb 3, 2022
…ticore/kayceesrk/stdlib_dls

Make global state domain-local in Random, Hashtbl and Filename
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants