Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: stats.noncentral_t: incorrect pdf values #20693

Open
tk-yoshimura opened this issue May 10, 2024 · 13 comments · May be fixed by #20785
Open

BUG: stats.noncentral_t: incorrect pdf values #20693

tk-yoshimura opened this issue May 10, 2024 · 13 comments · May be fixed by #20785
Labels
defect A clear bug or issue that prevents SciPy from being installed or used as expected scipy.stats

Comments

@tk-yoshimura
Copy link

tk-yoshimura commented May 10, 2024

Describe your issue.

When the non-centrality parameter is large, minute pdfs do not round to zero and become incorrect.

_continuous_distns.py L7751
When the signs of the hypergeometric function values trm1 and trm2 are different, a digit loss may occur and a positive number as small as 1e-15 times the value of trm1 may be obtained.

0.000000000000000000e+00,
0.000000000000000000e+00,
6.903232523204403334e-20,
5.246638030092020843e-20,
2.507426003902358323e-20,

0.000000000000000000e+00,
2.306273340248685816e-21,
0.000000000000000000e+00,
3.065027443987395148e-22,
3.030573275870970357e-23,

0.000000000000000000e+00,
0.000000000000000000e+00,
0.000000000000000000e+00,
1.607309915758858139e-26,
0.000000000000000000e+00,
2.968184438906829402e-29,
1.289912430713787116e-30,

0.000000000000000000e+00,
0.000000000000000000e+00,
6.875264498813224083e-36,
0.000000000000000000e+00,
1.698743765495028586e-39,
5.899645661409607616e-43,

0.000000000000000000e+00,
1.118820687756764921e-48,
0.000000000000000000e+00,
2.881362839750965093e-54,
1.130519003827435810e-57,
0.000000000000000000e+00,
1.125565060192690696e-60,
1.255559309353209344e-59,
2.451913987471196361e-58,
9.946708461085608711e-57,
9.865746568566806907e-55,
2.608508011669959246e-52,
1.751251717366865741e-49,
2.480353133417280902e-46,
5.651909548871659366e-43,
1.547696961932354756e-39,
3.917040533065069146e-36,
7.451906963902804195e-33,
9.230455588900687233e-30,
6.830776102987085846e-27,
2.903097821643950697e-24,
7.048475233244896531e-22,
9.951342974018210129e-20,
8.434849830519383079e-18,
4.465006069250835433e-16,
1.540119041121871242e-14,
3.611718422442535243e-13,
5.997215072309180812e-12,
7.322834569147971679e-11,
6.805884677873872274e-10,
4.966580900143197686e-09,
2.925747070795927217e-08,
1.425925306582944727e-07,
5.875729668320043319e-07,
2.086553802153421788e-06,
6.493728050836794094e-06,
1.797471724152075438e-05,
4.482912166825650464e-05,
1.018901280862772772e-04,
2.131690774824222302e-04,
4.141541644587887834e-04,
7.530377199797195272e-04,

Reproducing Code Example

import numpy as np
import scipy.stats as stats

x = np.arange(-32, 33) / 4
dist = stats.nct(8, 16)
pdf = dist.pdf(x)

Error message

(N/A)

SciPy/NumPy/Python version and system information

1.13.0 1.26.4 sys.version_info(major=3, minor=9, micro=5, releaselevel='final', serial=0)
Build Dependencies:
  blas:
    detection method: pkgconfig
    found: true
    include directory: /c/opt/64/include
    lib directory: /c/opt/64/lib
    name: openblas
    openblas configuration: USE_64BITINT= DYNAMIC_ARCH=1 DYNAMIC_OLDER= NO_CBLAS=
      NO_LAPACK= NO_LAPACKE= NO_AFFINITY=1 USE_OPENMP= ZEN MAX_THREADS=24
    pc file directory: c:/opt/64/lib/pkgconfig
    version: 0.3.26.dev
  lapack:
    detection method: pkgconfig
    found: true
    include directory: /c/opt/64/include
    lib directory: /c/opt/64/lib
    name: openblas
    openblas configuration: USE_64BITINT= DYNAMIC_ARCH=1 DYNAMIC_OLDER= NO_CBLAS=
      NO_LAPACK= NO_LAPACKE= NO_AFFINITY=1 USE_OPENMP= ZEN MAX_THREADS=24
    pc file directory: c:/opt/64/lib/pkgconfig
    version: 0.3.26.dev
  pybind11:
    detection method: config-tool
    include directory: unknown
    name: pybind11
    version: 2.12.0
Compilers:
  c:
    commands: cc
    linker: ld.bfd
    name: gcc
    version: 10.3.0
  c++:
    commands: c++
    linker: ld.bfd
    name: gcc
    version: 10.3.0
  cython:
    commands: cython
    linker: cython
    name: cython
    version: 3.0.10
  fortran:
    commands: gfortran
    linker: ld.bfd
    name: gcc
    version: 10.3.0
  pythran:
    include directory: C:\Users\runneradmin\AppData\Local\Temp\pip-build-env-z8a5b3_f\overlay\Lib\site-packages/pythran
    version: 0.15.0
Machine Information:
  build:
    cpu: x86_64
    endian: little
    family: x86_64
    system: windows
  cross-compiled: false
  host:
    cpu: x86_64
    endian: little
    family: x86_64
    system: windows
Python Information:
  path: C:\Users\runneradmin\AppData\Local\Temp\cibw-run-0mr_3znw\cp39-win_amd64\build\venv\Scripts\python.exe
  version: '3.9'
@tk-yoshimura tk-yoshimura added the defect A clear bug or issue that prevents SciPy from being installed or used as expected label May 10, 2024
@tk-yoshimura tk-yoshimura changed the title BUG: stats.noncetral_t: incorrect pdf values BUG: stats.noncentral_t: incorrect pdf values May 11, 2024
@dschmitz89
Copy link
Contributor

@jzmaddock: not really a bug report, but would Boost be an alternative here? Thanks in advance!

@mdhaber
Copy link
Contributor

mdhaber commented May 11, 2024

A comment in the code suggests that Boost was considered but noted accuracy issues in the left tail - "see gh-16591". It's worth considering again though.

@jzmaddock
Copy link

@dschmitz89 : the smoothness issue in the left tail should be fixed now: values returned may be truncated to zero in order to avoid returning total garbage, but none the less, as the value -> zero the number of correct digits steadily decreases. I don't see any way to avoid this - whatever calculation method is used, exquisite cancellation appears to be inevitable :(

@dschmitz89
Copy link
Contributor

dschmitz89 commented May 14, 2024

@jzmaddock ; thanks for the quick reply. scipy currently fails already at -1 for this case (df=8, nct=16):

Details

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

x = np.linspace(-10, 0, 500)
dist = stats.nct(8, 16)
pdf = dist.pdf(x)
plt.semilogy(x, pdf)
plt.show()

image

Any chance Boost would be at least better even if not perfect?

@jzmaddock
Copy link

I was about to say that we're fine, but actually we're not: when you zoom in enough there are little pockets that have underflowed to zero with correct values either side :(

Seems like the algorithm is constantly teetering on the edge of underflow... I need to investigate some more.

@jzmaddock
Copy link

This should ultimately be fixed in boostorg/math#1134 which will be Boost-1.86 now.

@mdhaber
Copy link
Contributor

mdhaber commented May 20, 2024

@dschmitz89 do you have time to do another version bump?

@dschmitz89
Copy link
Contributor

@mdhaber will look into version bump and using Boost for noncentral_t.pdf in the coming weeks.

@jzmaddock
Copy link

will look into version bump and using Boost for noncentral_t.pdf in the coming weeks.

Note that fixes for the non-central-t are only in Boost develop at present, it'll be the next Boost release before they're generally available.

@dschmitz89
Copy link
Contributor

dschmitz89 commented May 22, 2024

@jzmaddock : I did some first tests. Boost now achieves similar or better precision by truncating to 0 than scipy, thanks!

One idea that might help more in the left tail: use an asymptotic expansion. In the past, Wolfram Alpha has proven useful to me to generate the expansions. Here, it produces a lengthy formula using Hermite polynomials instead of the hypergeometric function $_1F_1$. I cannot comment on its accuracy here yet as these functions are outside my expertise but maybe you get at least a gut feeling looking at the formula?

@jzmaddock
Copy link

but maybe you get at least a gut feeling looking at the formula?

Not really without implementing it, but my guess is it's derived from the 1F1 equivalence and therefore has the same issues... I do note that the remainder is O(1/x^9) which while looking good superficially, remember that in extreme cases the problems start by x = -1 !

Will ponder some more...

@jzmaddock
Copy link

Question: what is the use case for these very small results?

At present the Boost PDF function here is nice and smooth and then truncates to zero (this is the v=8, mu=16 case):

plot

However, I can evaluate further into the tail by using exp_sinh integration of the integral form of the PDF. In fact it seems to work rather well... the only wart being to figure out when to switch methods.

@jzmaddock
Copy link

Ah... let me take that back, zooming right in close, things don't look quite so good:

plot

We could always truncate sooner I guess...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
defect A clear bug or issue that prevents SciPy from being installed or used as expected scipy.stats
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants