BUG: Get full precision for 32 bit floating point random values. #20314

WarrenWeckesser · 2021-11-06T07:01:50Z

The formula to convert a 32 bit random integer to a random float32,

(next_uint32(bitgen_state) >> 9) * (1.0f / 8388608.0f)

shifts by one bit too many, resulting in uniform float32 samples always
having a 0 in the least significant bit. The formula is corrected to

(next_uint32(bitgen_state) >> 8) * (1.0f / 16777216.0f)

Occurrences of the incorrect formula in numpy/random/tests/test_direct.py
were also corrected.

Closes gh-17478.

The formula to convert a 32 bit random integer to a random float32, (next_uint32(bitgen_state) >> 9) * (1.0f / 8388608.0f) shifts by one bit too many, resulting in uniform float32 samples always having a 0 in the least significant bit. The formula is corrected to (next_uint32(bitgen_state) >> 8) * (1.0f / 16777216.0f) Occurrences of the incorrect formula in numpy/random/tests/test_direct.py were also corrected. Closes numpygh-17478.

seberg · 2021-11-08T18:26:17Z

@bashtage or @rkern the changes look good. Could you make a quick call with respect to only modifying the new API and whether this should have a release note?

bashtage · 2021-11-08T18:30:54Z

Change is safe. Whether it requires a release not only matters if you think the ls bit in a 32 bit float is worth one. Probably best to be safe.

WarrenWeckesser · 2021-11-08T21:38:43Z

I added a release note about the change.

WarrenWeckesser · 2021-11-10T23:59:38Z

Here's a script that demonstrates the changes in the variates that can occur, and shows why a release note is warranted.

Script to print random samples

import numpy as np


print(f"numpy version {np.__version__}")
print()

seed = 98765432109
print(f"seed: {seed}")
print()

print("rng.random")
rng = np.random.default_rng(seed)
x = rng.random(12, dtype=np.float32)
print("first 12 samples:")
print(x)
print()

n = 500_000_000

print("rng.standard_exponential")
rng = np.random.default_rng(seed)
x = rng.standard_exponential(size=n, dtype=np.float32)
x1 = x[-1]
x = rng.standard_exponential(size=n, dtype=np.float32)
x2 = x[-1]
print(f"last sample of {n:10d}:", x1)
print(f"last sample of {2*n:10d}:", x2)

print()

k = 2.5
print(f"rng.standard_gamma, k={k}")
rng = np.random.default_rng(seed)
x = rng.standard_gamma(2.5, size=n, dtype=np.float32)
print("first 14 samples:")
print(x[:14])
print(f"last sample of {n}:", x[-1])

The output for the current main development branch:

numpy version 1.22.0.dev0+1733.g8dbd507fb

seed: 98765432109

rng.random
first 12 samples:
[0.754159   0.72002673 0.00234556 0.49236786 0.16807711 0.845093
 0.06266415 0.48290312 0.80823255 0.9720112  0.01573467 0.9534826 ]

rng.standard_exponential
last sample of  500000000: 0.3884329
last sample of 1000000000: 2.664465

rng.standard_gamma, k=2.5
first 14 samples:
[2.9261618 2.168996  2.3661127 2.06449   4.889103  2.145251  2.651206
 2.109355  1.3617952 2.1510322 1.2934842 0.8435856 5.8445168 2.0458326]
last sample of 500000000: 0.46255943

The output for this pull request:

numpy version 1.22.0.dev0+1688.ge5af24d51

seed: 98765432109

rng.random
first 12 samples:
[0.754159   0.7200268  0.00234556 0.49236786 0.16807711 0.845093
 0.06266421 0.48290312 0.80823255 0.97201127 0.01573473 0.9534826 ]

rng.standard_exponential
last sample of  500000000: 0.3884329
last sample of 1000000000: 0.71304655

rng.standard_gamma, k=2.5
first 14 samples:
[2.9261618 2.168996  2.3661127 2.06449   4.889103  2.145251  2.651206
 2.109355  1.3617952 2.1510322 1.2934842 0.8435856 5.8445168 2.0458326]
last sample of 500000000: 4.6183786

You can see the small variation in the ULP of the output of rng.random.

The outputs for rng.standard_exponential and rng.standard_gamma show no differences at first, but eventually, the difference in the ULP of the values generated by next_float cause a different branch to be taken in their iterative algorithms, resulting in large changes in the stream of variates.

seberg · 2021-11-12T18:48:11Z

Thanks @WarrenWeckesser! As far as I see, this only affects the new API, since this is dtype=np.float32 and the old API does not support for this all functions. So there are no stream-compat concerns.

However, since the streams do change I am removing the backport candidate label. Please just re-add if you disagree!

bashtage · 2021-11-12T20:08:20Z

This seems to be nearly an enhancement, even though is also a bug. While the intent was clearly to provide the maximum number of random bits, the nature of the bug only resulted in slightly less random values in most plausible scenarios.

- numpy/numpy#20314 - numba/numba#7754

The formula to convert a 32 bit random integer to a random float32, (((rng)->next_uint32((rng)->state) >> 9) * (1.0f / 8388608.0f)) shifts by one bit too many, resulting in uniform float32 samples always having a 0 in the least significant bit. The formula is corrected to (((rng)->next_uint32((rng)->state) >> 8) * (1.0f / 16777216.0f)) See numpy/numpy#20314 for more details.

WarrenWeckesser added 00 - Bug component: numpy.random labels Nov 6, 2021

WarrenWeckesser force-pushed the float32-rand-unused-bit branch from d754442 to 4b9e569 Compare November 6, 2021 07:21

charris added the 09 - Backport-Candidate PRs tagged should be backported label Nov 6, 2021

DOC: Add release note about the fix for 32 bit float random variates.

e5af24d

seberg merged commit 1995e2c into numpy:main Nov 12, 2021

seberg removed the 09 - Backport-Candidate PRs tagged should be backported label Nov 12, 2021

WarrenWeckesser deleted the float32-rand-unused-bit branch November 12, 2021 19:42

davemfish mentioned this pull request Jan 3, 2022

numpy 1.22 compatibility natcap/invest#796

Closed

ahirner added a commit to MoonVision/moonbox-docker that referenced this pull request Apr 16, 2022

downgrade numpy to 1.21 for downstream compat

b917fbd

- numpy/numpy#20314 - numba/numba#7754

stuartarchibald mentioned this pull request May 17, 2022

Support for Numpy BitGenerators PR#1 - Core Generator Support numba/numba#8031

Merged

zoj613 mentioned this pull request Jun 17, 2022

MAINT: Get full precision for 32 bit floating point random values. zoj613/polyagamma#111

Closed

zoj613 mentioned this pull request Jun 22, 2022

MAINT: Get full precision for 32 bit floating point random values. zoj613/polyagamma#117

Merged

alecandido mentioned this pull request Mar 17, 2023

Evolve n3fit with eko 0.12 NNPDF/nnpdf#1694

Merged

aegkmq mentioned this pull request Jul 27, 2023

Replace the rand() with a portable rng karpathy/llama2.c#138

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Get full precision for 32 bit floating point random values. #20314

BUG: Get full precision for 32 bit floating point random values. #20314

WarrenWeckesser commented Nov 6, 2021

seberg commented Nov 8, 2021

bashtage commented Nov 8, 2021

WarrenWeckesser commented Nov 8, 2021

WarrenWeckesser commented Nov 10, 2021 •

edited

seberg commented Nov 12, 2021

bashtage commented Nov 12, 2021

BUG: Get full precision for 32 bit floating point random values. #20314

BUG: Get full precision for 32 bit floating point random values. #20314

Conversation

WarrenWeckesser commented Nov 6, 2021

seberg commented Nov 8, 2021

bashtage commented Nov 8, 2021

WarrenWeckesser commented Nov 8, 2021

WarrenWeckesser commented Nov 10, 2021 • edited

seberg commented Nov 12, 2021

bashtage commented Nov 12, 2021

WarrenWeckesser commented Nov 10, 2021 •

edited