Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAINT: stats: make reducing functions emit consistent warning when sample is too small or empty #20694

Merged
merged 30 commits into from
May 28, 2024

Conversation

mdhaber
Copy link
Contributor

@mdhaber mdhaber commented May 10, 2024

Reference issue

Address stats part of gh-19805

What does this implement/fix?

Eliminates inconsistent errors and excessive warnings emitted by most stats reducing functions when 1D input is too small. Documents required sample sizes. Emits a single warning when NaN outputs are generated without NaNs in the input or when nan_policy='omit'.

Return values (usually NaNs) are already correct for the most part; this just deals with warnings and errors.

Examples below are for skew, but the behavior of most reducing functions is adjusted.

Before:

import numpy as np
from scipy import stats

stats.skew([])  # too noisy
# /usr/local/lib/python3.10/dist-packages/scipy/stats/_stats_py.py:1193: RuntimeWarning: Mean of empty slice.
#   mean = a.mean(axis, keepdims=True)
# /usr/local/lib/python3.10/dist-packages/numpy/core/_methods.py:121: RuntimeWarning: invalid value encountered in divide
#   ret = um.true_divide(
# /usr/local/lib/python3.10/dist-packages/numpy/core/fromnumeric.py:3504: RuntimeWarning: Mean of empty slice.
#   return _methods._mean(a, axis=axis, dtype=dtype,
# /usr/local/lib/python3.10/dist-packages/numpy/core/_methods.py:129: RuntimeWarning: invalid value encountered in scalar divide
#   ret = ret.dtype.type(ret / rcount)
# nan

stats.skew([np.nan], nan_policy='omit')  # too noisy
# /usr/local/lib/python3.10/dist-packages/scipy/stats/_stats_py.py:1193: RuntimeWarning: Mean of empty slice.
#   mean = a.mean(axis, keepdims=True)
# /usr/local/lib/python3.10/dist-packages/numpy/core/_methods.py:121: RuntimeWarning: invalid value encountered in divide
#   ret = um.true_divide(
# /usr/local/lib/python3.10/dist-packages/numpy/core/fromnumeric.py:3504: RuntimeWarning: Mean of empty slice.
#   return _methods._mean(a, axis=axis, dtype=dtype,
# /usr/local/lib/python3.10/dist-packages/numpy/core/_methods.py:129: RuntimeWarning: invalid value encountered in scalar divide
#   ret = ret.dtype.type(ret / rcount)
# nan

stats.skew([[]], axis=1)  # silent
# array([nan])

stats.skew([[np.nan]], nan_policy='omit', axis=1)  # silent
# array([nan])

After: (maybe I should change "is too small" → "contains too few observations")

import numpy as np
from scipy import stats

stats.skew([])
# UserWarning: One or more sample arguments is too small; all returned values will be NaN. See documentation for sample size requirements.
#   stats.skew([])
# nan

stats.skew([np.nan], nan_policy='omit')
# After omitting NaNs, one or more sample arguments is too small; all returned values will be NaN. See documentation for sample size requirements.
#  stats.skew([np.nan], nan_policy='omit')
# nan

stats.skew([[]], axis=1)
# UserWarning: All axis-slices of one or more sample arguments are too small; all elements of returned arrays will be NaN. See documentation for sample size requirements.
#  stats.skew([[]], axis=1)
# array([nan])

stats.skew([[np.nan]], nan_policy='omit', axis=1)
# UserWarning: After omitting NaNs, one or more axis-slices of one or more sample arguments is too small; corresponding elements of returned arrays will be NaN. See documentation for sample size requirements.
#  stats.skew([[np.nan]], nan_policy='omit', axis=1)
# array([nan])

Additional information

Consider ignoring whitepace changes and reviewing the commits separately. Each has a distinct purpose summarized by the commit message.

@rgommers @h-vetinari all I'd ask for now (if you're still interested in gh-19805) is to 1) do some before/after testing to see whether you prefer the new behavior (almost all reducing functions are fair game) and 2) help me answer these two questions.

Questions before I continue:

  • Initially, I've changed all the tests that looked for a function-specific error message or warning; now they look for the standard behavior. It took a lot of time, but I think it was important to investigate each case. That said, should I just eliminate these function-specific tests? I would want to add a generic test anyway, and that would make these scattered tests redundant.
  • Should the generic test (which looks for the correct warning and return value) check every function, or should I just check, say, a few representative functions (e.g. wilcoxon with one argument, mannwhitneyu with two arguments, and kruskal with three arguments)?

To do:

  • fix special cases (e.g. mode, f_oneway)
  • fix _axis_nan_policy test - it's very broken because it actually looked for the warnings/errors produced by each function
  • add test that correct warning is emitted in each case
  • add function-specific warning information

For a follow-up - fix for array-API.

We still need to adjust a lot of things about the documentation. Some documentation assumes the samples are 1D, whereas in others it is careful to refer to axis-slices of ND arrays. I would prefer that we write the API documentation using 1D language but consistently link to a tutorial that describes how axis/nan_policy/keepdims work with N-D arrays. That's for a separate PR.

Also, most hypothesis tests can easily get a method argument that computes a more accurate p-values instead of returning NaN for small samples (e.g. like pearsonr and anderson_ksamp now have). That can be a series of follow-up PRs.

@mdhaber mdhaber added scipy.stats maintenance Items related to regular maintenance tasks labels May 10, 2024
@mdhaber mdhaber changed the title WIP/MAINT: stats: enforce consistent behavior when sample is too small or empty WIP/MAINT: stats: enforce consistent reducing function warning when sample is too small or empty May 10, 2024
@mdhaber mdhaber changed the title WIP/MAINT: stats: enforce consistent reducing function warning when sample is too small or empty WIP/MAINT: stats: enforce consistent warning from reducing functions when sample is too small or empty May 11, 2024
@mdhaber mdhaber changed the title WIP/MAINT: stats: enforce consistent warning from reducing functions when sample is too small or empty WIP/MAINT: stats: make reducing functions emit consistent warning when sample is too small or empty May 11, 2024
@mdhaber
Copy link
Contributor Author

mdhaber commented May 20, 2024

@h-vetinari @rgommers @ilayn As participants in gh-19805, if you could suggest answers to these two questions and compare the before/after behavior, I'd appreciate it.

  • Many stats functions had tests that looked for a function-specific error message or warning when there were too few observations. I changed each of these tests to look for the new, standard behavior instead. It took a lot of time, but I think it was important to investigate each case. That said, should I just eliminate these function-specific tests? I need to add a generic test that looks for the new behavior anyway, and that would make these scattered tests redundant.
  • The generic test will looks for the correct warning and return value. Should it check every function, or should I just check, say, a few representative functions (e.g. wilcoxon with one argument, mannwhitneyu with two arguments, and kruskal with three arguments)?

This PR is going to accumulate merge conflicts, and it would be nice to get it into the release candidate to get wider feedback on the elimination of the ad hoc warnings/errors and new standardized warnings.

Copy link
Member

@h-vetinari h-vetinari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally this looks really nice! :)

I changed each of these tests to look for the new, standard behavior instead.

Love this!

That said, should I just eliminate these function-specific tests? I need to add a generic test that looks for the new behavior anyway, and that would make these scattered tests redundant.

Fine by me to just do a generic test (modulo caveats below)

Should it check every function, or should I just check, say, a few representative functions

Parametrization is easy and if the tests run fast, I don't see a problem to run this for (almost) every function.

scipy/stats/_hypotests.py Outdated Show resolved Hide resolved
scipy/stats/_morestats.py Outdated Show resolved Hide resolved
scipy/stats/_morestats.py Outdated Show resolved Hide resolved
scipy/stats/_stats_py.py Outdated Show resolved Hide resolved
scipy/stats/_stats_py.py Outdated Show resolved Hide resolved
scipy/stats/tests/test_axis_nan_policy.py Show resolved Hide resolved
scipy/stats/tests/test_axis_nan_policy.py Show resolved Hide resolved
@rgommers
Copy link
Member

Agree with @h-vetinari's answers to the two questions - choices LGTM.

@mdhaber mdhaber marked this pull request as ready for review May 24, 2024 00:52
@mdhaber mdhaber requested a review from tupui as a code owner May 24, 2024 00:52
@mdhaber mdhaber requested a review from h-vetinari May 24, 2024 00:52
@mdhaber
Copy link
Contributor Author

mdhaber commented May 24, 2024

OK, I think this is now doing what I intended to do here. A few limitations:

  • It only emits the generic warning for now rather than appending a function-specific part of the message. I can try to append the function-specific part as described above in a follow-up.
  • The issue with calling f_oneway with too few arguments is resolved, but the problem existed before this PR., and it applies to other functions that accepts *args, so I think there is a more general fix to be applied. I can fix that in the same follow-up.
  • It doesn't improve the behavior for other array API backends; those still emit warnings or raise errors as before. We'll want to rework the decorator significantly for array API support (and maybe actually modify most functions to handle nan_policy with n-D arrays/axis natively using something like WIP: stats.masked_array: array API compatible masked arrays #20363), and we'll fix this then.

One remaining question: by design, this is going to produce some warnings where there were none before. Should we make the SmallSampleWarning public to make them easier to filter?

@mdhaber mdhaber changed the title WIP/MAINT: stats: make reducing functions emit consistent warning when sample is too small or empty MAINT: stats: make reducing functions emit consistent warning when sample is too small or empty May 24, 2024
@mdhaber mdhaber added this to the 1.14.0 milestone May 25, 2024
@mdhaber mdhaber modified the milestones: 1.14.0, 1.15.0 May 28, 2024
@mdhaber
Copy link
Contributor Author

mdhaber commented May 28, 2024

@tylerjereddy This is ready, but I went ahead and re-milestoned. Don't want the new warnings to cause grief in the release process.

Copy link
Member

@h-vetinari h-vetinari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a clear improvement and it would be a pity to miss the release (much less you having to keep fixing conflicts all the time). I'd vote for including this in 1.14

Copy link
Member

@h-vetinari h-vetinari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review just to make this thread more visible

scipy/stats/tests/test_axis_nan_policy.py Outdated Show resolved Hide resolved
@mdhaber
Copy link
Contributor Author

mdhaber commented May 28, 2024

Not sure why this doesn't show up in the main thread.

I clicked on the link to the outdated commit and commented there. In short, that's all set.

I wouldn't oppose a squash merge if you'll help field complaints about the new warnings (although I think we're removing many more than we're adding, and the ones we're adding are informative, so maybe this won't be necessary).

Also, I think it was a good idea to add the function-specific information to the warning message when "small sample" is an unusual number, like 7 (as opposed to the more obvious 0-2). I can work on that as an immediate follow-up, though.

Up to you @h-vetinari. In any case, thanks for the review.

@mdhaber mdhaber removed this from the 1.15.0 milestone May 28, 2024
@h-vetinari
Copy link
Member

Bombs away!

@h-vetinari h-vetinari merged commit 8593006 into scipy:main May 28, 2024
31 checks passed
@h-vetinari h-vetinari added this to the 1.14.0 milestone May 28, 2024
@h-vetinari
Copy link
Member

Could you please write a release-note, and make sure it shows up in #20784?

@mdhaber mdhaber mentioned this pull request May 28, 2024
12 tasks
@mdhaber
Copy link
Contributor Author

mdhaber commented May 28, 2024

Sure; please commit https://github.com/scipy/scipy/pull/20784/files#r1616624810 if it looks good to you.

@h-vetinari
Copy link
Member

Sure; please commit https://github.com/scipy/scipy/pull/20784/files#r1616624810 if it looks good to you.

Thank you! I'll leave the committing to Tyler. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance Items related to regular maintenance tasks scipy.stats
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants