Reimplementation of Random using an LXM pseudo-random number generator #10742

xavierleroy · 2021-11-01T17:39:33Z

~~[ I am not proposing this PR for inclusion in OCaml 4.14. It is intended for Multicore OCaml and OCaml 5.00. ]~~

[ Now that Multicore OCaml is merged and the domain-local-storage interface is what we need, it's time to change PRNG ! ]

Following extensive experimentation (https://github.com/xavierleroy/pringo/), discussions (ocaml/RFCs#28), and a previous attempt based on a Xoshiro generator (#10701), this PR reimplements the Random standard library module using a pseudo-random generator from the LXM family, as described by Steele and Vigna in their OOPSLA 2021 paper. I used the L64X128 variant, shown in figure 1 of the paper.

The main benefit of the new PRNG is that it supports "splitting", i.e. deterministically drawing a new PRNG state from a given PRNG state, getting two statistically-independent PRNGs that can be splitted again later. (The current Random implementation does not support this.) Splitting will come very handy for Multicore OCaml, enabling each domain to maintain its own PRNG state in domain-local storage, with splitting being used to give a fresh PRNG state when a new domain is created. Splitting has other uses for Quickcheck-style generation of random functions and streams.

The L64X128 generator has a large internal state (192 bits + 64 bits for diversification at splits), making reseeding unnecessary. Steele and Vigna's extensive statistical analysis shows no known weaknesses.

Compared with the current implementation of Random, we get nice performance improvements on 64-bit platforms and not so nice performance degradation on 32-bit platforms.

(Times are in seconds for 50000000 repetitions, unless indicated.)

x86 64 bits

  4.13  LXM

  0.30  0.21  bit
  0.28  0.20  bits
  0.42  0.31  int 0xFFEE
  0.71  0.40  int32 0xFFEEDD
  0.97  0.32  int64 0xFFEEDDCCAA
  0.72  0.31  float 1.0
 11.35  0.02  full_init 3 (/100)

ARMv7 32 bits

  4.13  LXM

  1.75  4.84  bit
  1.53  2.78  bits
  4.63  6.16  int 0xFFEE
  5.12  6.89  int32 0xFFEEDD
 30.92 20.51  int64 0xFFEEDDCCAA
  3.36  8.30  float 1.0
 47.40  0.13  full_init 3 (/100)

xavierleroy · 2021-11-01T17:47:48Z

For the same reasons explained in #10701 (comment), the internal state of L64X128 (four 64-bit integers) is represented as a 1-dimensional big array of int64. This ensures proper alignment of the state on 32-bit platforms and gives us marshaling and other generic operations for free. If this choice is a problem with js-of-ocaml, an opaque representation (fully implemented in C) could be used instead.

gasche

Looks promising, thanks!

I checked that the code is as in the paper.

runtime/prng.c

gasche · 2021-11-01T20:21:46Z

runtime/prng.c

+  struct LXM_state * st = LXM_val(v);
+  st->a = i1 | 1;              /* must be odd */
+  st->x[0] = i2 != 0 ? i2 : 1; /* must be nonzero */
+  st->x[1] = i3 != 0 ? i3 : 2; /* must be nonzero */


The paper writes something different about the x parameter initialization.

The constructor then forces the additive parameter to be odd by setting its low-order
bit to 1, but beyond that no additional vetting of the additive parameter (to reject “weak valuesž
[Steele, Lea, and Flood 2014]) is necessary. In the unlikely circumstance that the state for the XBG
subgenerator is entirely 0, it is necessary to force it to be nonzero; this could be done by making
additional calls to nextLong() or nextInt(), but it is easier (and acceptable in practice, because it
happens so rarely) to derive the new XBG state from the new LCG state.

I'm not sure what they mean by "derive the new XBG state from the new LCG state", but maybe they are thinking of setting one of x[0] or x[1] to the odd st->a value? (Note: the code currently suggests that neither of x0 and x1 can be zero, but in fact the paper says that having only one of them non-zero is enough.)

Instead of "x[0] and x[1] must not be both 0", I used the more conservative "none of x[0] and x[1] should be zero" because that's what the Xoshiro papers recommend, if I remember correctly.

For states that are created by splitting, i2 and i3 were obtained by pseudo-random number generation, so the probability that they are 0 is extremely low, and using 1 and 2 in this unlikely case is unnoticeable.

What they mean by "derive the new XBG state from the new LCG state" is better explained in the Xoshiro papers. To initialize an XBG from a single 64-bit integer seed, Vigna suggests to use... a SplitMix PRNG, initialized with this seed, and run N times to get N good-looking numbers to initialize the N-word state of the XBG.

For states that are created by initialization from an array of integers (function Random.full_init and related functions), the situation is more delicate. Sometimes we are given a single integer (function Random.init), in which case Vigna's suggestion would apply. But sometimes we are given 12 bytes with good entropy (function Random.self_init when /dev/urandom can be read) and we should "spread out" this 96-bit entropy on the 4 components of the L64X128 state.

After some thinking, I implemented a cute solution where the integer array is hashed (as a string of 64-bit "characters") using a hash function that produces a 256-bit result, which is then taken as the initial L64X128 state. It is hoped that the result incorporates all the entropy present in the integer array, up to 256 bits, yet produces four 64-bit numbers that don't have strong correlations and "look random already" (whatever that means).

This solution is commit 83c2830 in this PR.

I understand the solution and I think it should work fine indeed. Two remarks (that don't suggest any change):

We would also have the choice of asking /dev/urandom for more input (but that doesn't work when /dev/urandom is not available, where we generate a fixed amount of bytes with bad entropy)

We already have a hash function implemented in the runtime, Murmurhash 3 (the 32bit-producing version). I understand that you need a 256-large output, so FNV1a fits the bill better.

Hash functions come in all sizes and shapes :-) However, I woke up in the middle of the night with the realization that FNV1a doesn't avalanche much on short inputs (e.g. a single integer).

So, I just pushed yet another proposal that uses good old MD5 as the hashing function. It only produces 128 bits of output, so I call it twice with some diversification of the input. Now, it should avalanche perfectly, while preserving up to 128 bits of entropy present in the seed. It's a bit slower than the previous implementation, but still significantly faster than the one in OCaml today.

As an additional benefit, all the initialization code is now done in OCaml, reducing the C code in runtime/prng.c to the one speed-critical function next.

Further remark 1: yes, we need a 256-bit (or more) crypto hash function to replace MD5 in Digest. It's on my to do list.

Further remark 2: yes, we should revisit the PRNG initialization from /dev/urandom. This will come next.

Regarding next, I found some time again to try a pure OCaml implementation (using bigarrays this time), and in optimal conditions (amd64 backend, native mode) it outperforms the C version by ~5% in microbenchmarks.
It is less readable than the C version though, and of course it will probably be significantly slower in bytecode and 32-bit native backends.

I'm not surprised you're getting good performance from a pure OCaml implementation on x86-64: there is not much to optimize in this code, and the only optimization OCaml misses is the recognition of the "rotate" instruction. Note however that I forgot to declare the C primitive as "noalloc". This would regain a few percents...

stdlib/random.ml

The implementation in stdlib/random.ml is going to change.

xavierleroy · 2022-01-14T10:13:32Z

Now that #10887 is merged, it's time to change PRNG ! Please review...

…M family As described in the paper [LXM: Better Splittable Pseudorandom Number Generators (and Almost as Fast)](https://doi.org/10.1145/3485525) by Guy L. Steele Jr. and Sebastiano Vigna, proceedings of OOPSLA 2021. This provides a fast, state-of-the-art PRNG with full support for splitting. Initialization from a seed (an array of integers) proceeds by serializing the array to a byte sequence (each seed element occupies 8 bytes in little-endian representation) then hashing twice using the MD5 hash function and two different suffixes, obtaining 256 bits of initialization data. More tests were added in testsuite/tests/lib-random, and the lone test testsuite/tests/basic-more/testrandom.ml was moved there.

gasche · 2022-01-14T19:21:02Z

testsuite/tests/lib-random/testvectors.ml

+
+let _ =
+  let a = Array1.of_array Int64 C_layout [| 1L; 2L; 3L; 4L |] in
+  (* Violate abstraction of type Random.State.t to manipulate state directly *)


I wondered if we should provide a function to set the "raw" state in the library.

Pros:

no Obj.magic here

there might be cases where users want to avoid paying the cost of the additional mixing operations

Cons:

this breaks the generator if an all-zero array is passed

we have to leak details about the generator algorithm to explain this (which raw seeds to avoid)

In the end I think the current approach is sensible, and no change is required.

gasche

This looks good, thanks!

I checked that the implementation is as described in the paper.

It's nice that you have test vectors from a reference implementation.
Would you by chance have a pointer to this implementation? I couldn't find a pointer from the paper itself. I wanted to see if they used the same "bad state protection" tricks that you use in set (this is probably not checked by the test vectors you chose, as the trick are put to use with probability almost-0).

My review comments are mostly comments. There is a question about the usage of max_int (and its portability), and possibly a suggestion to simplify full_int.

gasche · 2022-01-15T15:27:20Z

runtime/prng.c

+  z = (z ^ (z >> 32)) * 0xdaba0b6eb09322e3;
+  z = (z ^ (z >> 32));
+  /* LCG update */
+  st->s = st->s * M + st->a;


the paper uses M * s + a, not sure why there is a slight difference here.

gasche · 2022-01-15T15:31:36Z

stdlib/random.mli

+    With multiple domains, each domain has its own generator that evolves
+    independently of the generators of other domains.  When a domain is
+    created, its generator is initialized by splitting the state
+    of the generator associated with the parent domain.


I guess that we should have fixed this comment in the previous PR already (#10887). Thanks for noticing.

gasche · 2022-01-15T15:53:21Z

runtime/prng.c

+#include "caml/bigarray.h"
+#include "caml/mlvalues.h"
+
+/* The L64X128 member of the LXM family.  Taken from figure 1 in


I think that the mixing function (lea64) should be mentioned as well. (The LXM authors describe "better mixing functions" as interesting future work, so it may be that in ten years what people understand as "The L64X128 member" is not that one anymore.)

The reference to figure 1 in the paper makes it clear which variant is being implemented.

gasche · 2022-01-15T20:45:09Z

stdlib/random.ml

-        in
-          (bpos, max_int)
-    in
+    let r = Int64.to_int (next s) land max_int in
    let v = r mod n in
    if r - v > max_int - n + 1 then int63aux s n else v


It's nice that we can simplify these functions so much with a full int64 generator.

But: I'm confused by the use of max_int here (already present before this change), what happens when running OCaml 32bit?

This slightly confusing code predates this PR. The idea is that full_int calls int63aux only when bound > 0x3FFFFFFF, which cannot happen on a 32 bit platform.

Maybe we could add an assert (Sys.int_size >= 63) here to make this precondition explicit.

No, because there's also the js-of-ocaml case (Sys.int_size = 32). This was discussed at length in #9489; I won't have this discussion again.

gasche · 2022-01-15T20:51:17Z

stdlib/random.ml

@@ -126,13 +102,15 @@ module State = struct
    else
      intaux s bound


We could simplify the code here by calling int63aux unconditionally, right?

(This would also reduce the average number of retries, in a noticeable way for bounds slightly above 2^29.)

We wanted full_int to return the same values as int when the bound is small enough.

gasche · 2022-01-15T20:56:21Z

stdlib/random.ml

    let v = Int32.rem r n in
-    if Int32.sub r v > Int32.add (Int32.sub Int32.max_int n) 1l
+    if Int32.(sub r v > add (sub max_int n) 1l)
    then int32aux s n
    else v


If I understand correctly, drawing in the full int64 range would also reduce the number of retries in some cases, but it would perform worse in 32bit mode.

gasche · 2022-01-15T20:57:04Z

stdlib/random.ml


+  (* Return 30 random bits as an integer 0 <= x < 1073741824 *)
+  let bits s =
+    Int64.to_int (next s) land 0x3FFF_FFFF


We could call this bits30 for consistency, as in your Pringo API, and then have bits as a backward-compatibility alias.

gasche · 2022-01-15T21:01:14Z

stdlib/random.ml

+  let rec rawfloat s =
+    let b = next s in
+    let n = Int64.shift_right_logical b 11 in
+    if n <> 0L then Int64.to_float n *. 0x1.p-53 else rawfloat s


I remember we had a discussion about how to draw floats recently, probably with @silene, and there was an explanation for why one approach was more desirable than the other. Could the reasons for using this one be mentioned in a comment?

Since I was invoked in this thread, I will take the chance to point out that the comment is a truism. I guess it was meant to read "Return a float 0 < x < 1 that is a multiple of 2^-53".

The discussion was in your RFC, ocaml/RFCs#28 (comment)

The bottom line for the discussion is that the current approach (multiples of 2^-53) is very simple and pretty good overall. More sophisticated algorithms can produce any representable FP number between 0.0 and 1.0 with the appropriate probability, but they are more complex and I'm not convinced there is a benefit.

Agreed about the tautological nature of the comment. (But at least it's not wrong!) Will reword.

By the way, since the only user of rawfloat is float, there is no point in excluding zero and thus to write a recursive function. Or is this an undocumented feature of float that, for normal values of bound, it returns a non-zero value?

This was also discussed in the RFC: ocaml/RFCs#28 (comment) Eventually, I'd like to export rawfloat (it's called uniform in PRINGO) with explicit guarantees that it returns a random FP between 0 and 1 excluded.

gasche · 2022-01-15T21:06:34Z

stdlib/random.ml

+    Array1.unsafe_set s 0 (Int64.logor i1 1L); (* must be odd *)
+    Array1.unsafe_set s 1 i2;
+    Array1.unsafe_set s 2 (if i3 <> 0L then i3 else 1L); (* must not be 0 *)
+    Array1.unsafe_set s 3 (if i4 <> 0L then i4 else 2L) (* must not be 0 *)


The reason why those unsafe_ are correct is thanks to type abstraction: the module exports an abstract state type t, and the three exported functions that can build a t (make, make_self_init, copy) guarantee that the size is exactly 4.

hhugo · 2022-01-15T23:25:31Z

stdlib/random.ml

+      Bytes.set_int64_le b (i * 8) (Int64.of_int seed.(i))
+    done;
+    Bytes.set b (n * 8) '\x01';
+    let d1 = Bytes.unsafe_of_string (Digest.bytes b) in


No need to use Bytes.unsafe_of_string if we use String.get_int64_le below

Very good point, thanks! I was not aware that get_* functions had been added to String...

hhugo · 2022-01-16T12:19:18Z

If this choice is a problem with js-of-ocaml, an opaque representation (fully implemented in C) could be used instead.

The current implementation should work fine with jsoo once next is implemented.

gasche

I believe that the code is correct, well-tested, and that the feature is useful.

(It's nice in particular that, on 64-bit systems, we are able to get a better generator, with extra features, and a performance improvement.)

xavierleroy · 2022-01-16T15:58:55Z

t's nice that you have test vectors from a reference implementation. Would you by chance have a pointer to this implementation?

It's a cut-and-paste from figure 1 in the paper, plus a main function.

gasche · 2022-01-16T16:26:24Z

Ah, so we don't know what choice they made to sanitize the state-initialization input. No big deal.

(check-typo complains that make alldepend needs to re-run now that you depend on String as well.)

`@since` tags, etc.

gasche · 2022-01-17T19:31:31Z

I went ahead and merged -- I consider that people had ample time to voice any concern since ocaml/RFCs#28 was opened in early September.

Thanks!

This was referenced Nov 1, 2021

Reimplementation of Random using the Xoshiro256++ PRNG #10701

Closed

[RFC] changing the Random PRNG to a splittable algorithm from PRINGO ocaml/RFCs#28

Closed

xavierleroy mentioned this pull request Nov 1, 2021

Allow newly spawned domains to have a copy of the parent domain's local state ocaml-multicore/ocaml-multicore#587

Closed

gasche reviewed Nov 1, 2021

View reviewed changes

lthls reviewed Nov 1, 2021

View reviewed changes

stdlib/random.ml Outdated Show resolved Hide resolved

xavierleroy force-pushed the lxm branch from f5920d9 to 83c2830 Compare November 2, 2021 18:02

xavierleroy marked this pull request as draft November 5, 2021 18:08

kayceesrk mentioned this pull request Nov 24, 2021

Audit stdlib for mutable state ocaml-multicore/ocaml-multicore#757

Open

This was referenced Jan 11, 2022

Move Random to a splitting generator #10877

Closed

generalize the Domain.DLS interface to split PRNG state for child domains #10887

Merged

Remove tests/lib-random/full_int.ml

5ca278b

The implementation in stdlib/random.ml is going to change.

xavierleroy force-pushed the lxm branch from a5e07ad to 69f76c8 Compare January 14, 2022 10:10

xavierleroy marked this pull request as ready for review January 14, 2022 10:10

xavierleroy force-pushed the lxm branch from 69f76c8 to ea4947f Compare January 14, 2022 10:17

xavierleroy added 2 commits January 14, 2022 15:31

Update reference test outputs after change of PRNG

bfb2659

xavierleroy force-pushed the lxm branch from ea4947f to bfb2659 Compare January 14, 2022 14:31

gasche reviewed Jan 14, 2022

View reviewed changes

gasche reviewed Jan 15, 2022

View reviewed changes

hhugo reviewed Jan 15, 2022

View reviewed changes

gasche approved these changes Jan 16, 2022

View reviewed changes

xavierleroy added a commit to xavierleroy/ocaml that referenced this pull request Jan 16, 2022

Changes entry for ocaml#10742

278d8b5

xavierleroy added 3 commits January 16, 2022 17:40

Address review comments

f2ebc04

Update documentation of module Random

82547ea

`@since` tags, etc.

Changes entry for ocaml#10742

b3dc20c

xavierleroy force-pushed the lxm branch from 278d8b5 to b3dc20c Compare January 16, 2022 16:40

gasche merged commit 1d6ff7b into ocaml:trunk Jan 17, 2022

xavierleroy deleted the lxm branch January 18, 2022 18:21

jmid mentioned this pull request Jan 19, 2022

Deriver: derive shrinkers c-cube/qcheck#191

Open

This was referenced Jan 23, 2022

Fix entropy of random state ocaml-multicore/ocaml-multicore#597

Closed

Implement Multi-threaded compatible PRNG ocaml-multicore/ocaml-multicore#585

Closed

abbysmal mentioned this pull request Jan 28, 2022

Audit stdlib for mutable state #10960

Closed

Octachron mentioned this pull request Feb 8, 2022

Tests fail on OCaml 5.00 backtracking/ocamlgraph#120

Closed

This was referenced Nov 2, 2022

Run expect tests on OCaml5 c-cube/qcheck#262

Merged

Add a splittable random number generator c-cube/qcheck#86

Closed

		@@ -126,13 +102,15 @@ module State = struct
		else
		intaux s bound

Reimplementation of Random using an LXM pseudo-random number generator #10742

Reimplementation of Random using an LXM pseudo-random number generator #10742

Conversation

xavierleroy commented Nov 1, 2021 • edited

xavierleroy commented Nov 1, 2021

gasche left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xavierleroy commented Jan 14, 2022

Choose a reason for hiding this comment

gasche left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xavierleroy Jan 16, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hhugo commented Jan 16, 2022

gasche left a comment

Choose a reason for hiding this comment

xavierleroy commented Jan 16, 2022

gasche commented Jan 16, 2022

gasche commented Jan 17, 2022

xavierleroy commented Nov 1, 2021 •

edited

xavierleroy Jan 16, 2022 •

edited