MAINT: stats: make reducing functions emit consistent warning when sample is too small or empty #20694

mdhaber · 2024-05-10T23:44:43Z

Reference issue

Address stats part of gh-19805

What does this implement/fix?

Eliminates inconsistent errors and excessive warnings emitted by most stats reducing functions when 1D input is too small. Documents required sample sizes. Emits a single warning when NaN outputs are generated without NaNs in the input or when nan_policy='omit'.

Return values (usually NaNs) are already correct for the most part; this just deals with warnings and errors.

Examples below are for skew, but the behavior of most reducing functions is adjusted.

Before:

import numpy as np
from scipy import stats

stats.skew([])  # too noisy
# /usr/local/lib/python3.10/dist-packages/scipy/stats/_stats_py.py:1193: RuntimeWarning: Mean of empty slice.
#   mean = a.mean(axis, keepdims=True)
# /usr/local/lib/python3.10/dist-packages/numpy/core/_methods.py:121: RuntimeWarning: invalid value encountered in divide
#   ret = um.true_divide(
# /usr/local/lib/python3.10/dist-packages/numpy/core/fromnumeric.py:3504: RuntimeWarning: Mean of empty slice.
#   return _methods._mean(a, axis=axis, dtype=dtype,
# /usr/local/lib/python3.10/dist-packages/numpy/core/_methods.py:129: RuntimeWarning: invalid value encountered in scalar divide
#   ret = ret.dtype.type(ret / rcount)
# nan

stats.skew([np.nan], nan_policy='omit')  # too noisy
# /usr/local/lib/python3.10/dist-packages/scipy/stats/_stats_py.py:1193: RuntimeWarning: Mean of empty slice.
#   mean = a.mean(axis, keepdims=True)
# /usr/local/lib/python3.10/dist-packages/numpy/core/_methods.py:121: RuntimeWarning: invalid value encountered in divide
#   ret = um.true_divide(
# /usr/local/lib/python3.10/dist-packages/numpy/core/fromnumeric.py:3504: RuntimeWarning: Mean of empty slice.
#   return _methods._mean(a, axis=axis, dtype=dtype,
# /usr/local/lib/python3.10/dist-packages/numpy/core/_methods.py:129: RuntimeWarning: invalid value encountered in scalar divide
#   ret = ret.dtype.type(ret / rcount)
# nan

stats.skew([[]], axis=1)  # silent
# array([nan])

stats.skew([[np.nan]], nan_policy='omit', axis=1)  # silent
# array([nan])

After: (maybe I should change "is too small" → "contains too few observations")

import numpy as np
from scipy import stats

stats.skew([])
# UserWarning: One or more sample arguments is too small; all returned values will be NaN. See documentation for sample size requirements.
#   stats.skew([])
# nan

stats.skew([np.nan], nan_policy='omit')
# After omitting NaNs, one or more sample arguments is too small; all returned values will be NaN. See documentation for sample size requirements.
#  stats.skew([np.nan], nan_policy='omit')
# nan

stats.skew([[]], axis=1)
# UserWarning: All axis-slices of one or more sample arguments are too small; all elements of returned arrays will be NaN. See documentation for sample size requirements.
#  stats.skew([[]], axis=1)
# array([nan])

stats.skew([[np.nan]], nan_policy='omit', axis=1)
# UserWarning: After omitting NaNs, one or more axis-slices of one or more sample arguments is too small; corresponding elements of returned arrays will be NaN. See documentation for sample size requirements.
#  stats.skew([[np.nan]], nan_policy='omit', axis=1)
# array([nan])

Additional information

Consider ignoring whitepace changes and reviewing the commits separately. Each has a distinct purpose summarized by the commit message.

@rgommers @h-vetinari all I'd ask for now (if you're still interested in gh-19805) is to 1) do some before/after testing to see whether you prefer the new behavior (almost all reducing functions are fair game) and 2) help me answer these two questions.

Questions before I continue:

Initially, I've changed all the tests that looked for a function-specific error message or warning; now they look for the standard behavior. It took a lot of time, but I think it was important to investigate each case. That said, should I just eliminate these function-specific tests? I would want to add a generic test anyway, and that would make these scattered tests redundant.
Should the generic test (which looks for the correct warning and return value) check every function, or should I just check, say, a few representative functions (e.g. wilcoxon with one argument, mannwhitneyu with two arguments, and kruskal with three arguments)?

To do:

fix special cases (e.g. mode, f_oneway)
fix _axis_nan_policy test - it's very broken because it actually looked for the warnings/errors produced by each function
add test that correct warning is emitted in each case
add function-specific warning information

For a follow-up - fix for array-API.

We still need to adjust a lot of things about the documentation. Some documentation assumes the samples are 1D, whereas in others it is careful to refer to axis-slices of ND arrays. I would prefer that we write the API documentation using 1D language but consistently link to a tutorial that describes how axis/nan_policy/keepdims work with N-D arrays. That's for a separate PR.

Also, most hypothesis tests can easily get a method argument that computes a more accurate p-values instead of returning NaN for small samples (e.g. like pearsonr and anderson_ksamp now have). That can be a series of follow-up PRs.

… samples

[skip ci]

mdhaber · 2024-05-20T19:24:56Z

@h-vetinari @rgommers @ilayn As participants in gh-19805, if you could suggest answers to these two questions and compare the before/after behavior, I'd appreciate it.

Many stats functions had tests that looked for a function-specific error message or warning when there were too few observations. I changed each of these tests to look for the new, standard behavior instead. It took a lot of time, but I think it was important to investigate each case. That said, should I just eliminate these function-specific tests? I need to add a generic test that looks for the new behavior anyway, and that would make these scattered tests redundant.
The generic test will looks for the correct warning and return value. Should it check every function, or should I just check, say, a few representative functions (e.g. wilcoxon with one argument, mannwhitneyu with two arguments, and kruskal with three arguments)?

This PR is going to accumulate merge conflicts, and it would be nice to get it into the release candidate to get wider feedback on the elimination of the ad hoc warnings/errors and new standardized warnings.

h-vetinari

Generally this looks really nice! :)

I changed each of these tests to look for the new, standard behavior instead.

Love this!

That said, should I just eliminate these function-specific tests? I need to add a generic test that looks for the new behavior anyway, and that would make these scattered tests redundant.

Fine by me to just do a generic test (modulo caveats below)

Should it check every function, or should I just check, say, a few representative functions

Parametrization is easy and if the tests run fast, I don't see a problem to run this for (almost) every function.

scipy/stats/_hypotests.py

scipy/stats/_morestats.py

scipy/stats/_stats_py.py

scipy/stats/tests/test_axis_nan_policy.py

rgommers · 2024-05-21T08:07:24Z

Agree with @h-vetinari's answers to the two questions - choices LGTM.

…arnings

[skip cirrus] [skip circle]

[skip circle] [skip cirrus]

…ption

…keepdims

[skip cirrus] [skip circle]

[skip circle] [skip cirrus]

mdhaber · 2024-05-24T14:16:56Z

OK, I think this is now doing what I intended to do here. A few limitations:

It only emits the generic warning for now rather than appending a function-specific part of the message. I can try to append the function-specific part as described above in a follow-up.
The issue with calling f_oneway with too few arguments is resolved, but the problem existed before this PR., and it applies to other functions that accepts *args, so I think there is a more general fix to be applied. I can fix that in the same follow-up.
It doesn't improve the behavior for other array API backends; those still emit warnings or raise errors as before. We'll want to rework the decorator significantly for array API support (and maybe actually modify most functions to handle nan_policy with n-D arrays/axis natively using something like WIP: stats.masked_array: array API compatible masked arrays #20363), and we'll fix this then.

One remaining question: by design, this is going to produce some warnings where there were none before. Should we make the SmallSampleWarning public to make them easier to filter?

scipy/stats/_stats_py.py

mdhaber · 2024-05-28T04:24:55Z

@tylerjereddy This is ready, but I went ahead and re-milestoned. Don't want the new warnings to cause grief in the release process.

h-vetinari

I think this is a clear improvement and it would be a pity to miss the release (much less you having to keep fixing conflicts all the time). I'd vote for including this in 1.14

h-vetinari

Re-review just to make this thread more visible

scipy/stats/tests/test_axis_nan_policy.py

mdhaber · 2024-05-28T04:43:53Z

Not sure why this doesn't show up in the main thread.

I clicked on the link to the outdated commit and commented there. In short, that's all set.

I wouldn't oppose a squash merge if you'll help field complaints about the new warnings (although I think we're removing many more than we're adding, and the ones we're adding are informative, so maybe this won't be necessary).

Also, I think it was a good idea to add the function-specific information to the warning message when "small sample" is an unusual number, like 7 (as opposed to the more obvious 0-2). I can work on that as an immediate follow-up, though.

Up to you @h-vetinari. In any case, thanks for the review.

h-vetinari · 2024-05-28T05:10:44Z

Bombs away!

h-vetinari · 2024-05-28T05:11:57Z

Could you please write a release-note, and make sure it shows up in #20784?

mdhaber · 2024-05-28T05:52:12Z

Sure; please commit https://github.com/scipy/scipy/pull/20784/files#r1616624810 if it looks good to you.

h-vetinari · 2024-05-28T06:23:37Z

Sure; please commit https://github.com/scipy/scipy/pull/20784/files#r1616624810 if it looks good to you.

Thank you! I'll leave the committing to Tyler. :)

mdhaber added 4 commits May 10, 2024 09:29

MAINT: stats: silence warnings of reducing functions

9c9e9ff

ENH: stats: emit consistent warning from reducing functions for small…

2e3416c

… samples

TST: stats: adjust most old tests about too small samples

1ed5e37

MAINT: stats: remove dead code

bf8f004

[skip ci]

mdhaber added scipy.stats maintenance Items related to regular maintenance tasks labels May 10, 2024

mdhaber requested review from rgommers and h-vetinari May 10, 2024 23:44

mdhaber changed the title ~~WIP/MAINT: stats: enforce consistent behavior when sample is too small or empty~~ WIP/MAINT: stats: enforce consistent reducing function warning when sample is too small or empty May 10, 2024

mdhaber changed the title ~~WIP/MAINT: stats: enforce consistent reducing function warning when sample is too small or empty~~ WIP/MAINT: stats: enforce consistent warning from reducing functions when sample is too small or empty May 11, 2024

mdhaber changed the title ~~WIP/MAINT: stats: enforce consistent warning from reducing functions when sample is too small or empty~~ WIP/MAINT: stats: make reducing functions emit consistent warning when sample is too small or empty May 11, 2024

h-vetinari reviewed May 21, 2024

View reviewed changes

mdhaber added 2 commits May 20, 2024 18:40

Merge remote-tracking branch 'upstream/main' into stats_warnings

fd8bf94

MAINT: stats: undo removal of some function-specific code

0826633

mdhaber mentioned this pull request May 21, 2024

TST: stats: refactor tests of normality tests #20756

Merged

mdhaber added 13 commits May 21, 2024 07:51

STY: stats: fix PEP8

0485c41

TST: stats.bartlett: skip test with jax backend

28c0d32

TST: stats._axis_nan_policy: adjust test

4179dcf

Merge remote-tracking branch 'upstream/main' into stats_warnings

77b5852

TST: stats._axis_nan_policy: generic tests for standard 'too small' w…

2288837

…arnings

TST: stats: fixup and raise ValueError for debugging CI

072dd0a

[skip cirrus] [skip circle]

MAINT: stats.axis_nan_policy: try to catch warnings as errors

78763a8

[skip circle] [skip cirrus]

MAINT: stats: introduce SmallSampleWarning

bf686fc

Merge remote-tracking branch 'upstream/main' into stats_warnings

a0b8269

TST: _lib.tests.test_warnings: add exception for axis/nan-policy tests

271ea2d

MAINT: stats._axis_nan_policy: remove unnecessary override['empty'] o…

2fe96e0

…ption

MAINT: stats._axis_nan_policy: fix interaction between too_small and …

e346907

…keepdims

MAINT: stats._axis_nan_policy: rename warning message variable

6924a8f

mdhaber added 3 commits May 23, 2024 12:31

TST: stats._axis_nan_policy: strengthen main test

82baf63

MAINT: stats._axis_nan_policy: refactor funny indexing

8bcb4dd

[skip cirrus] [skip circle]

MAINT: stats.ttest_rel/ttest_1samp: hopefully fix failures

68818e9

[skip circle] [skip cirrus]

mdhaber marked this pull request as ready for review May 24, 2024 00:52

mdhaber requested a review from tupui as a code owner May 24, 2024 00:52

mdhaber requested a review from h-vetinari May 24, 2024 00:52

mdhaber changed the title ~~WIP/MAINT: stats: make reducing functions emit consistent warning when sample is too small or empty~~ MAINT: stats: make reducing functions emit consistent warning when sample is too small or empty May 24, 2024

Merge branch 'main' into stats_warnings

82dff3a

mdhaber added this to the 1.14.0 milestone May 25, 2024

mdhaber commented May 25, 2024

View reviewed changes

scipy/stats/_stats_py.py Show resolved Hide resolved

mdhaber added 2 commits May 25, 2024 06:05

Apply suggestions from code review

fc1c827

TST: stats: use skip_xp_backends fixture properly

7cc797c

mdhaber mentioned this pull request May 25, 2024

DOC: stats: added Raises section to a few functions #20496

Draft

Merge remote-tracking branch 'upstream/main' into stats_warnings

3de8917

mdhaber modified the milestones: 1.14.0, 1.15.0 May 28, 2024

h-vetinari approved these changes May 28, 2024

View reviewed changes

h-vetinari reviewed May 28, 2024

View reviewed changes

scipy/stats/tests/test_axis_nan_policy.py Outdated Show resolved Hide resolved

mdhaber removed this from the 1.15.0 milestone May 28, 2024

h-vetinari merged commit 8593006 into scipy:main May 28, 2024
31 checks passed

h-vetinari added this to the 1.14.0 milestone May 28, 2024

mdhaber mentioned this pull request May 28, 2024

DOC: SciPy 1.14.0 relnotes #20784

Merged

12 tasks

mdhaber mentioned this pull request May 30, 2024

BUG: Kruskal-Wallis test silently produces NaNs as test statistics and p-value if there are NaNs present in the input #20056

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MAINT: stats: make reducing functions emit consistent warning when sample is too small or empty #20694

MAINT: stats: make reducing functions emit consistent warning when sample is too small or empty #20694

mdhaber commented May 10, 2024 •

edited

mdhaber commented May 20, 2024 •

edited

h-vetinari left a comment

rgommers commented May 21, 2024

mdhaber commented May 24, 2024 •

edited

mdhaber commented May 28, 2024 •

edited

h-vetinari left a comment

h-vetinari left a comment

mdhaber commented May 28, 2024 •

edited

h-vetinari commented May 28, 2024

h-vetinari commented May 28, 2024

mdhaber commented May 28, 2024

h-vetinari commented May 28, 2024

MAINT: stats: make reducing functions emit consistent warning when sample is too small or empty #20694

MAINT: stats: make reducing functions emit consistent warning when sample is too small or empty #20694

Conversation

mdhaber commented May 10, 2024 • edited

Reference issue

What does this implement/fix?

Additional information

mdhaber commented May 20, 2024 • edited

h-vetinari left a comment

Choose a reason for hiding this comment

rgommers commented May 21, 2024

mdhaber commented May 24, 2024 • edited

mdhaber commented May 28, 2024 • edited

h-vetinari left a comment

Choose a reason for hiding this comment

h-vetinari left a comment

Choose a reason for hiding this comment

mdhaber commented May 28, 2024 • edited

h-vetinari commented May 28, 2024

h-vetinari commented May 28, 2024

mdhaber commented May 28, 2024

h-vetinari commented May 28, 2024

mdhaber commented May 10, 2024 •

edited

mdhaber commented May 20, 2024 •

edited

mdhaber commented May 24, 2024 •

edited

mdhaber commented May 28, 2024 •

edited

mdhaber commented May 28, 2024 •

edited