BUG: stats.noncentral_t: incorrect pdf values #20693

tk-yoshimura · 2024-05-10T18:18:00Z

Describe your issue.

When the non-centrality parameter is large, minute pdfs do not round to zero and become incorrect.

_continuous_distns.py L7751
When the signs of the hypergeometric function values trm1 and trm2 are different, a digit loss may occur and a positive number as small as 1e-15 times the value of trm1 may be obtained.

0.000000000000000000e+00,
0.000000000000000000e+00,
6.903232523204403334e-20,
5.246638030092020843e-20,
2.507426003902358323e-20,
0.000000000000000000e+00,
2.306273340248685816e-21,
0.000000000000000000e+00,
3.065027443987395148e-22,
3.030573275870970357e-23,
0.000000000000000000e+00,
0.000000000000000000e+00,
0.000000000000000000e+00,
1.607309915758858139e-26,
0.000000000000000000e+00,
2.968184438906829402e-29,
1.289912430713787116e-30,
0.000000000000000000e+00,
0.000000000000000000e+00,
6.875264498813224083e-36,
0.000000000000000000e+00,
1.698743765495028586e-39,
5.899645661409607616e-43,
0.000000000000000000e+00,
1.118820687756764921e-48,
0.000000000000000000e+00,
2.881362839750965093e-54,
1.130519003827435810e-57,
0.000000000000000000e+00,
1.125565060192690696e-60,
1.255559309353209344e-59,
2.451913987471196361e-58,
9.946708461085608711e-57,
9.865746568566806907e-55,
2.608508011669959246e-52,
1.751251717366865741e-49,
2.480353133417280902e-46,
5.651909548871659366e-43,
1.547696961932354756e-39,
3.917040533065069146e-36,
7.451906963902804195e-33,
9.230455588900687233e-30,
6.830776102987085846e-27,
2.903097821643950697e-24,
7.048475233244896531e-22,
9.951342974018210129e-20,
8.434849830519383079e-18,
4.465006069250835433e-16,
1.540119041121871242e-14,
3.611718422442535243e-13,
5.997215072309180812e-12,
7.322834569147971679e-11,
6.805884677873872274e-10,
4.966580900143197686e-09,
2.925747070795927217e-08,
1.425925306582944727e-07,
5.875729668320043319e-07,
2.086553802153421788e-06,
6.493728050836794094e-06,
1.797471724152075438e-05,
4.482912166825650464e-05,
1.018901280862772772e-04,
2.131690774824222302e-04,
4.141541644587887834e-04,
7.530377199797195272e-04,

Reproducing Code Example

import numpy as np
import scipy.stats as stats

x = np.arange(-32, 33) / 4
dist = stats.nct(8, 16)
pdf = dist.pdf(x)

Error message

(N/A)

SciPy/NumPy/Python version and system information

1.13.0 1.26.4 sys.version_info(major=3, minor=9, micro=5, releaselevel='final', serial=0)
Build Dependencies:
  blas:
    detection method: pkgconfig
    found: true
    include directory: /c/opt/64/include
    lib directory: /c/opt/64/lib
    name: openblas
    openblas configuration: USE_64BITINT= DYNAMIC_ARCH=1 DYNAMIC_OLDER= NO_CBLAS=
      NO_LAPACK= NO_LAPACKE= NO_AFFINITY=1 USE_OPENMP= ZEN MAX_THREADS=24
    pc file directory: c:/opt/64/lib/pkgconfig
    version: 0.3.26.dev
  lapack:
    detection method: pkgconfig
    found: true
    include directory: /c/opt/64/include
    lib directory: /c/opt/64/lib
    name: openblas
    openblas configuration: USE_64BITINT= DYNAMIC_ARCH=1 DYNAMIC_OLDER= NO_CBLAS=
      NO_LAPACK= NO_LAPACKE= NO_AFFINITY=1 USE_OPENMP= ZEN MAX_THREADS=24
    pc file directory: c:/opt/64/lib/pkgconfig
    version: 0.3.26.dev
  pybind11:
    detection method: config-tool
    include directory: unknown
    name: pybind11
    version: 2.12.0
Compilers:
  c:
    commands: cc
    linker: ld.bfd
    name: gcc
    version: 10.3.0
  c++:
    commands: c++
    linker: ld.bfd
    name: gcc
    version: 10.3.0
  cython:
    commands: cython
    linker: cython
    name: cython
    version: 3.0.10
  fortran:
    commands: gfortran
    linker: ld.bfd
    name: gcc
    version: 10.3.0
  pythran:
    include directory: C:\Users\runneradmin\AppData\Local\Temp\pip-build-env-z8a5b3_f\overlay\Lib\site-packages/pythran
    version: 0.15.0
Machine Information:
  build:
    cpu: x86_64
    endian: little
    family: x86_64
    system: windows
  cross-compiled: false
  host:
    cpu: x86_64
    endian: little
    family: x86_64
    system: windows
Python Information:
  path: C:\Users\runneradmin\AppData\Local\Temp\cibw-run-0mr_3znw\cp39-win_amd64\build\venv\Scripts\python.exe
  version: '3.9'

dschmitz89 · 2024-05-11T16:25:59Z

@jzmaddock: not really a bug report, but would Boost be an alternative here? Thanks in advance!

mdhaber · 2024-05-11T17:59:06Z

A comment in the code suggests that Boost was considered but noted accuracy issues in the left tail - "see gh-16591". It's worth considering again though.

jzmaddock · 2024-05-14T16:35:33Z

@dschmitz89 : the smoothness issue in the left tail should be fixed now: values returned may be truncated to zero in order to avoid returning total garbage, but none the less, as the value -> zero the number of correct digits steadily decreases. I don't see any way to avoid this - whatever calculation method is used, exquisite cancellation appears to be inevitable :(

dschmitz89 · 2024-05-14T20:17:58Z

@jzmaddock ; thanks for the quick reply. scipy currently fails already at -1 for this case (df=8, nct=16):

Details

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

x = np.linspace(-10, 0, 500)
dist = stats.nct(8, 16)
pdf = dist.pdf(x)
plt.semilogy(x, pdf)
plt.show()

Any chance Boost would be at least better even if not perfect?

jzmaddock · 2024-05-15T18:55:26Z

I was about to say that we're fine, but actually we're not: when you zoom in enough there are little pockets that have underflowed to zero with correct values either side :(

Seems like the algorithm is constantly teetering on the edge of underflow... I need to investigate some more.

…ects nc-F via beta). See scipy/scipy#20693.

jzmaddock · 2024-05-16T17:30:19Z

This should ultimately be fixed in boostorg/math#1134 which will be Boost-1.86 now.

mdhaber · 2024-05-20T19:48:50Z

@dschmitz89 do you have time to do another version bump?

dschmitz89 · 2024-05-21T06:06:15Z

@mdhaber will look into version bump and using Boost for noncentral_t.pdf in the coming weeks.

jzmaddock · 2024-05-21T08:28:53Z

will look into version bump and using Boost for noncentral_t.pdf in the coming weeks.

Note that fixes for the non-central-t are only in Boost develop at present, it'll be the next Boost release before they're generally available.

dschmitz89 · 2024-05-22T05:01:50Z

@jzmaddock : I did some first tests. Boost now achieves similar or better precision by truncating to 0 than scipy, thanks!

One idea that might help more in the left tail: use an asymptotic expansion. In the past, Wolfram Alpha has proven useful to me to generate the expansions. Here, it produces a lengthy formula using Hermite polynomials instead of the hypergeometric function $_1F_1$. I cannot comment on its accuracy here yet as these functions are outside my expertise but maybe you get at least a gut feeling looking at the formula?

jzmaddock · 2024-05-22T08:16:08Z

but maybe you get at least a gut feeling looking at the formula?

Not really without implementing it, but my guess is it's derived from the 1F1 equivalence and therefore has the same issues... I do note that the remainder is O(1/x^9) which while looking good superficially, remember that in extreme cases the problems start by x = -1 !

Will ponder some more...

jzmaddock · 2024-05-23T16:49:22Z

Question: what is the use case for these very small results?

At present the Boost PDF function here is nice and smooth and then truncates to zero (this is the v=8, mu=16 case):

However, I can evaluate further into the tail by using exp_sinh integration of the integral form of the PDF. In fact it seems to work rather well... the only wart being to figure out when to switch methods.

jzmaddock · 2024-05-23T16:54:00Z

Ah... let me take that back, zooming right in close, things don't look quite so good:

We could always truncate sooner I guess...

Refs scipy/scipy#20693

tk-yoshimura added the defect A clear bug or issue that prevents SciPy from being installed or used as expected label May 10, 2024

j-bowhay added the scipy.stats label May 10, 2024

tk-yoshimura changed the title ~~BUG: stats.noncetral_t: incorrect pdf values~~ BUG: stats.noncentral_t: incorrect pdf values May 11, 2024

jzmaddock added a commit to boostorg/math that referenced this issue May 16, 2024

Correct spurious underflow issues in non-central beta and t (also eff…

c69c393

…ects nc-F via beta). See scipy/scipy#20693.

jzmaddock mentioned this issue May 16, 2024

Correct spurious underflow issues in non-central beta and t (also eff… boostorg/math#1134

Merged

jzmaddock added a commit to boostorg/math that referenced this issue May 23, 2024

Prevent passing denormals in calculation.

0b2aa54

Refs scipy/scipy#20693

jzmaddock mentioned this issue May 23, 2024

Prevent passing denormals in calculation. boostorg/math#1141

Open

dschmitz89 linked a pull request May 25, 2024 that will close this issue

MAINT: use boost for stats.nct.pdf #20785

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: stats.noncentral_t: incorrect pdf values #20693

BUG: stats.noncentral_t: incorrect pdf values #20693

tk-yoshimura commented May 10, 2024 •

edited

dschmitz89 commented May 11, 2024

mdhaber commented May 11, 2024

jzmaddock commented May 14, 2024

dschmitz89 commented May 14, 2024 •

edited

jzmaddock commented May 15, 2024

jzmaddock commented May 16, 2024

mdhaber commented May 20, 2024

dschmitz89 commented May 21, 2024

jzmaddock commented May 21, 2024

dschmitz89 commented May 22, 2024 •

edited

jzmaddock commented May 22, 2024

jzmaddock commented May 23, 2024

jzmaddock commented May 23, 2024

BUG: stats.noncentral_t: incorrect pdf values #20693

BUG: stats.noncentral_t: incorrect pdf values #20693

Comments

tk-yoshimura commented May 10, 2024 • edited

Describe your issue.

Reproducing Code Example

Error message

SciPy/NumPy/Python version and system information

dschmitz89 commented May 11, 2024

mdhaber commented May 11, 2024

jzmaddock commented May 14, 2024

dschmitz89 commented May 14, 2024 • edited

jzmaddock commented May 15, 2024

jzmaddock commented May 16, 2024

mdhaber commented May 20, 2024

dschmitz89 commented May 21, 2024

jzmaddock commented May 21, 2024

dschmitz89 commented May 22, 2024 • edited

jzmaddock commented May 22, 2024

jzmaddock commented May 23, 2024

jzmaddock commented May 23, 2024

tk-yoshimura commented May 10, 2024 •

edited

dschmitz89 commented May 14, 2024 •

edited

dschmitz89 commented May 22, 2024 •

edited