Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Get full precision for 32 bit floating point random values. #20314

Merged
merged 2 commits into from Nov 12, 2021

Conversation

WarrenWeckesser
Copy link
Member

The formula to convert a 32 bit random integer to a random float32,

(next_uint32(bitgen_state) >> 9) * (1.0f / 8388608.0f)

shifts by one bit too many, resulting in uniform float32 samples always
having a 0 in the least significant bit. The formula is corrected to

(next_uint32(bitgen_state) >> 8) * (1.0f / 16777216.0f)

Occurrences of the incorrect formula in numpy/random/tests/test_direct.py
were also corrected.

Closes gh-17478.

The formula to convert a 32 bit random integer to a random float32,

    (next_uint32(bitgen_state) >> 9) * (1.0f / 8388608.0f)

shifts by one bit too many, resulting in uniform float32 samples always
having a 0 in the least significant bit. The formula is corrected to

    (next_uint32(bitgen_state) >> 8) * (1.0f / 16777216.0f)

Occurrences of the incorrect formula in numpy/random/tests/test_direct.py
were also corrected.

Closes numpygh-17478.
@charris charris added the 09 - Backport-Candidate PRs tagged should be backported label Nov 6, 2021
@seberg
Copy link
Member

seberg commented Nov 8, 2021

@bashtage or @rkern the changes look good. Could you make a quick call with respect to only modifying the new API and whether this should have a release note?

@bashtage
Copy link
Contributor

bashtage commented Nov 8, 2021

Change is safe. Whether it requires a release not only matters if you think the ls bit in a 32 bit float is worth one. Probably best to be safe.

@WarrenWeckesser
Copy link
Member Author

I added a release note about the change.

@WarrenWeckesser
Copy link
Member Author

WarrenWeckesser commented Nov 10, 2021

Here's a script that demonstrates the changes in the variates that can occur, and shows why a release note is warranted.

Script to print random samples
import numpy as np


print(f"numpy version {np.__version__}")
print()

seed = 98765432109
print(f"seed: {seed}")
print()

print("rng.random")
rng = np.random.default_rng(seed)
x = rng.random(12, dtype=np.float32)
print("first 12 samples:")
print(x)
print()

n = 500_000_000

print("rng.standard_exponential")
rng = np.random.default_rng(seed)
x = rng.standard_exponential(size=n, dtype=np.float32)
x1 = x[-1]
x = rng.standard_exponential(size=n, dtype=np.float32)
x2 = x[-1]
print(f"last sample of {n:10d}:", x1)
print(f"last sample of {2*n:10d}:", x2)

print()

k = 2.5
print(f"rng.standard_gamma, k={k}")
rng = np.random.default_rng(seed)
x = rng.standard_gamma(2.5, size=n, dtype=np.float32)
print("first 14 samples:")
print(x[:14])
print(f"last sample of {n}:", x[-1])

The output for the current main development branch:

numpy version 1.22.0.dev0+1733.g8dbd507fb

seed: 98765432109

rng.random
first 12 samples:
[0.754159   0.72002673 0.00234556 0.49236786 0.16807711 0.845093
 0.06266415 0.48290312 0.80823255 0.9720112  0.01573467 0.9534826 ]

rng.standard_exponential
last sample of  500000000: 0.3884329
last sample of 1000000000: 2.664465

rng.standard_gamma, k=2.5
first 14 samples:
[2.9261618 2.168996  2.3661127 2.06449   4.889103  2.145251  2.651206
 2.109355  1.3617952 2.1510322 1.2934842 0.8435856 5.8445168 2.0458326]
last sample of 500000000: 0.46255943

The output for this pull request:

numpy version 1.22.0.dev0+1688.ge5af24d51

seed: 98765432109

rng.random
first 12 samples:
[0.754159   0.7200268  0.00234556 0.49236786 0.16807711 0.845093
 0.06266421 0.48290312 0.80823255 0.97201127 0.01573473 0.9534826 ]

rng.standard_exponential
last sample of  500000000: 0.3884329
last sample of 1000000000: 0.71304655

rng.standard_gamma, k=2.5
first 14 samples:
[2.9261618 2.168996  2.3661127 2.06449   4.889103  2.145251  2.651206
 2.109355  1.3617952 2.1510322 1.2934842 0.8435856 5.8445168 2.0458326]
last sample of 500000000: 4.6183786

You can see the small variation in the ULP of the output of rng.random.

The outputs for rng.standard_exponential and rng.standard_gamma show no differences at first, but eventually, the difference in the ULP of the values generated by next_float cause a different branch to be taken in their iterative algorithms, resulting in large changes in the stream of variates.

@seberg seberg merged commit 1995e2c into numpy:main Nov 12, 2021
@seberg
Copy link
Member

seberg commented Nov 12, 2021

Thanks @WarrenWeckesser! As far as I see, this only affects the new API, since this is dtype=np.float32 and the old API does not support for this all functions. So there are no stream-compat concerns.

However, since the streams do change I am removing the backport candidate label. Please just re-add if you disagree!

@seberg seberg removed the 09 - Backport-Candidate PRs tagged should be backported label Nov 12, 2021
@WarrenWeckesser WarrenWeckesser deleted the float32-rand-unused-bit branch November 12, 2021 19:42
@bashtage
Copy link
Contributor

This seems to be nearly an enhancement, even though is also a bug. While the intent was clearly to provide the maximum number of random bits, the nature of the bug only resulted in slightly less random values in most plausible scenarios.

ahirner added a commit to MoonVision/moonbox-docker that referenced this pull request Apr 16, 2022
zoj613 added a commit to zoj613/polyagamma that referenced this pull request Jun 22, 2022
The formula to convert a 32 bit random integer to a random float32,

    (((rng)->next_uint32((rng)->state) >> 9) * (1.0f / 8388608.0f))

shifts by one bit too many, resulting in uniform float32 samples always
having a 0 in the least significant bit. The formula is corrected to

    (((rng)->next_uint32((rng)->state) >> 8) * (1.0f / 16777216.0f))

See numpy/numpy#20314 for more details.
zoj613 added a commit to zoj613/polyagamma that referenced this pull request Jun 23, 2022
The formula to convert a 32 bit random integer to a random float32,

    (((rng)->next_uint32((rng)->state) >> 9) * (1.0f / 8388608.0f))

shifts by one bit too many, resulting in uniform float32 samples always
having a 0 in the least significant bit. The formula is corrected to

    (((rng)->next_uint32((rng)->state) >> 8) * (1.0f / 16777216.0f))

See numpy/numpy#20314 for more details.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Wasted bit in random float32 generation
4 participants