CI: fail slow tests (not `--full`) #20672

mdhaber · 2024-05-09T00:12:29Z

Reference issue

Followup to gh-20480

What does this implement/fix?

This PR causes CI to report a failure if a test not marked slow or xslow takes over 1s to complete. Exceptions can be made using @pytest.mark.fail_slow.

Additional information

The intent is to distribute the job of keeping test times reasonable to all PR authors (rather than requiring maintainers to perform cleanups periodically).

[skip cirrus] [skip circle]

tylerjereddy · 2024-05-09T17:53:11Z

.github/workflows/windows.yml


      - name: Build
        run: |
          python dev.py build --with-scipy-openblas

      - name: Test
        run: |
-          python dev.py test -j2
+          python dev.py test -j2 -- --durations=0 --durations-min=0.25 --fail-slow=1.0


FWIW, I think the original experiment of adding the one CI job with pytest-fail-slow below, with a 5-second global threshold to catch truly egregious single test cases, was a good thing.

That said, I'm less keen on adding another CI job and a second timing threshold to be aware of. I think this is going to be subjective of course, but it seems to me that we'd just be discouraging one form of total test time creep, but there are many others like overparametrization that would add up as well.

At some point, the code reviewer will have to look the tests over on a case by case basis to see what makes sense and what doesn't. I think the 5 second cutoff for "you should take a look at that" make sense, but an additional 1 second cutoff is starting to make the automatic catching a bit complex/picky for my taste. It may be preferable to have a single 1.1 second test vs. a parametrized test with 37 0.8 second cases of course, and I don't know if I'd want the CI weighing in on that this way.

That said, I don't care that much. In general, I'd probably opt for just putting the exception markers on anything that gets flagged in spatial rather than hiding the tests in the full/slow mode, which I often don't use when iterating locally.

The total time of the spatial tests on the list is only 10s, so that seems OK to me to make an exception and give them 5s.

andyfaff · 2024-05-10T02:05:06Z

I just re-ran the duration test after the DE tweaks I made. I was surprised to see no change. What's interesting is what happens when the test ran. I expect the PR to be merged into the tip of scipy/main, currently 7e69656, before the tests run.
However, the PR was merged into bf645b7, which was before the commits that tweaked the tests.

andyfaff · 2024-05-10T04:38:16Z

@sturlamolden I'm looking over long test durations and came across test_cobyla_threadsafe which is one of the longer tests in optimize. I wanted to check on why there's a sleep in the objective functions. The sleep calls makes the test run for a lot longer than it could do. Is it there to try to ensure both threads run simultaneously?

mdhaber · 2024-05-10T05:43:14Z

@andyfaff I'll merge main manually tonight to see what effect that had.

mdhaber

Oops, guess I forgot to run this file locally.

scipy/linalg/tests/test_decomp.py

scipy/linalg/tests/test_matfuncs.py

scipy/spatial/tests/test_kdtree.py

[skip cirrus] [skip circle]

scipy/linalg/tests/test_decomp.py

andyfaff · 2024-05-10T07:35:16Z

Something weird is going on with two of those windows tests. The build step is taking twice as long (~24 mins) as the other (11 min).

The fast build uses python -m build and no pythran. THe slower builds use python dev.py and pythran. Not sure which is the issue there.

mdhaber · 2024-05-10T07:48:33Z

Hmm I didn't change that part. This run is from April 29, before the first of these PRs with fail_slow merged, and it was taking a long time then, too (e.g. 25 minutes for the fail slow, full, py3.10/npMin, dev.py job):
https://github.com/scipy/scipy/actions/runs/8872344008/job/24356554416
I'm not sure about that either.

andyfaff · 2024-05-10T09:18:30Z

Not using Pythran cuts the time from 24 mins to ~18. But I don't think it's the whole story.

sturlamolden · 2024-05-10T14:53:10Z

@sturlamolden I'm looking over long test durations and came across test_cobyla_threadsafe which is one of the longer tests in optimize. I wanted to check on why there's a sleep in the objective functions. The sleep calls makes the test run for a lot longer than it could do. Is it there to try to ensure both threads run simultaneously?

It is to force the scheduler to release the reminder of the time slice and allow another thread to execute. Otherwise we are not testing concurrent access to cobyla.

sturlamolden · 2024-05-10T14:54:26Z

It might be that sleep(0) is sufficient, but I think we needed sleep(0.1) to actually trigger the segfault with the original code.

scipy/optimize/tests/test_optimize.py

[skip cirrus] [skip circle]

mdhaber · 2024-05-19T12:36:37Z

@rgommers I modified tests and made exceptions mostly as recommended by maintainers of each subpackage, although sometimes I just gave tests an exception rather than marking them as slow. I think this accounts for most tests that take >0.5s, but the threshold for failure is 1s, so we should be pretty robust w.r.t. variation in execution time.

The ony question I have is why macOS tests / Conda & umfpack/scikit-sparse, fast, py3.11/npAny, dev.py (3.11) failed in the previous run. It looks like it failed due to pytest-fail-slow, but I don't see why pyest would be running with that option.

rgommers · 2024-05-19T13:12:31Z

The ony question I have is why macOS tests / Conda & umfpack/scikit-sparse, fast, py3.11/npAny, dev.py (3.11) failed in the previous run. It looks like it failed due to pytest-fail-slow, but I don't see why pyest would be running with that option.

Why wouldn't it? The test reads:

@pytest.mark.fail_slow(5)
def test_cobyla_threadsafe():

https://github.com/jwodder/pytest-fail-slow?tab=readme-ov-file#failing-slow-tests says: "To cause a specific test to fail if it takes too long to run, apply the fail_slow marker to it, with the desired cutoff time as the argument:". So this is used as long as pytest-fail-slow is installed. Which it is in that job.

mdhaber · 2024-05-19T13:32:03Z

I see. I was thinking of it being activated with a command line option like --slow, but I see now that the fail-slow option just sets the default fail time.

rgommers · 2024-05-19T13:41:51Z

Indeed. I think we should not over-use @pytest.mark.fail_slow. We're still counting on it not being installed for most users/packagers, because a lot of the tests that have the mark are certainly going to be failing otherwise on slow machines or under QEMU.

mdhaber · 2024-05-19T13:58:42Z

As a follow-up, I can look into options for using it only when a command line argument is passed or an environment variable is set so that it would work like I had envisioned. Or we could simply remove it from environment.yml.

rgommers · 2024-05-19T14:00:28Z

Or we could simply remove it from environment.yml.

This sounds fine to me for now, easiest to do that and merge this PR.

We only really want pytest-fail-slow to be used in certain CI jobs; this is the simple way to get that. [skip circle] [skip cirrus]

mdhaber · 2024-05-19T14:39:14Z

OK. I thought about just commenting it out with a note and/or adding a tip in the developer documentation, but for now just removed so as not to hold this up.

mdhaber · 2024-05-19T15:34:45Z

Oops @steppi I didn't add an exception for test_round.py::test_add_round_up and test_round.py::test_add_round_downlike you suggested because I meant to ask about them. They take about 0.01s on my machine; any idea why they take almost a full second on CI?

I'll mark it for now and maybe file an issue about the marked tests.

scipy/optimize/tests/test__differential_evolution.py

[skip circle] [skip cirrus]

scipy/integrate/tests/test__quad_vec.py

doc/source/dev/contributor/continuous_integration.rst

scipy/integrate/tests/test__quad_vec.py

[skip cirrus]

mdhaber · 2024-05-19T21:57:21Z

Ok, that should do it.

mdhaber · 2024-05-25T17:38:08Z

@rgommers sounded like you might be interested in merging this one. If so, LMK when you'll have a chance for a last look and I'll re-run CI before then to make sure there are no new slow tests to adjust before merging.

rgommers · 2024-05-25T18:24:54Z

Yes indeed - now or tomorrow should work.

mdhaber · 2024-05-25T23:01:30Z

Oops. I had applied an exception to test_low_dim_no_ls instead of test_high_dim_no_ls, and test_high_dim_no_ls happened to be one of those borderline tests.

rgommers

This now looks about as clean as it is going to get, and all the new instructions in the docs are clear and work as advertised in my testing.

I'm still a little worried that the amount of flaky failures is going to be on the high side, but there is only one way to find out - which is to start using this. So in it goes - thanks Matt!

WIP/CI: fail slow test (not full)

2cca08e

[skip cirrus] [skip circle]

mdhaber added the CI Items related to the CI tools such as CircleCI, GitHub Actions or Azure label May 9, 2024

mdhaber requested review from larsoner and andyfaff as code owners May 9, 2024 00:12

tylerjereddy reviewed May 9, 2024

View reviewed changes

mdhaber added 6 commits May 9, 2024 22:12

TST: special: make exceptions for slow tests

fe3e1ea

TST: spatial: make exceptions for slow tests

ec950c3

TST: interpolate: make exceptions for slow tests

6bc04be

TST: linalg: adjust and make exceptions for slow tests

76adb53

TST: integrate: adjust and make exceptions for slow tests

8e68f1d

TST: _lib, datasets, fft, fftpack: make exceptions for slow tests

eef74b9

mdhaber added 2 commits May 9, 2024 23:25

TST: stats: adjust, mark slow, and make exceptions for slow tests

4ea62ff

Merge remote-tracking branch 'upstream/main' into fail_slow_fast

8dfe8d0

mdhaber requested review from person142, steppi, peterbell10, ev-br and ilayn as code owners May 10, 2024 06:26

mdhaber commented May 10, 2024

View reviewed changes

scipy/linalg/tests/test_decomp.py Outdated Show resolved Hide resolved

scipy/linalg/tests/test_decomp.py Outdated Show resolved Hide resolved

scipy/linalg/tests/test_matfuncs.py Outdated Show resolved Hide resolved

mdhaber commented May 10, 2024

View reviewed changes

scipy/spatial/tests/test_kdtree.py Outdated Show resolved Hide resolved

Apply suggestions from code review

95b7442

[skip cirrus] [skip circle]

mdhaber commented May 10, 2024

View reviewed changes

scipy/linalg/tests/test_decomp.py Outdated Show resolved Hide resolved

Merge remote-tracking branch 'upstream/main' into fail_slow_fast

a8c9122

mdhaber commented May 19, 2024

View reviewed changes

scipy/optimize/tests/test_optimize.py Outdated Show resolved Hide resolved

Update scipy/optimize/tests/test_optimize.py

16ff93d

[skip cirrus] [skip circle]

mdhaber changed the title ~~WIP/CI: fail slow tests (not --full)~~ CI: fail slow tests (not --full) May 19, 2024

CI: remove pytest-fail-slow from environment.yml

8aada8a

We only really want pytest-fail-slow to be used in certain CI jobs; this is the simple way to get that. [skip circle] [skip cirrus]

mdhaber requested a review from rgommers as a code owner May 19, 2024 14:37

mdhaber commented May 19, 2024

View reviewed changes

scipy/optimize/tests/test__differential_evolution.py Outdated Show resolved Hide resolved

mdhaber added 3 commits May 19, 2024 16:02

DOC: improve documentation regarding slow tests

635b4b9

MAINT: final set of fail_slow exceptions, hopefully

69f2c53

STY: remove duplicate pytest import

54d39aa

[skip circle] [skip cirrus]

mdhaber commented May 19, 2024

View reviewed changes

scipy/integrate/tests/test__quad_vec.py Outdated Show resolved Hide resolved

doc/source/dev/contributor/continuous_integration.rst Outdated Show resolved Hide resolved

scipy/integrate/tests/test__quad_vec.py Outdated Show resolved Hide resolved

Apply suggestions from code review

c01b3df

[skip cirrus]

mdhaber added 2 commits May 25, 2024 15:01

Merge remote-tracking branch 'upstream/main' into fail_slow_fast

836e574

TST: optimize.dual_annealing: fix slow test exception

8e6400e

rgommers approved these changes May 26, 2024

View reviewed changes

rgommers merged commit b863eb9 into scipy:main May 26, 2024
31 checks passed

mdhaber mentioned this pull request May 27, 2024

MAINT: remove pytest-fail-slow from pyproject.toml #20804

Merged

rgommers mentioned this pull request May 27, 2024

Failures for new pytest-fail-slow check in Windows CI jobs #20806

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CI: fail slow tests (not `--full`) #20672

CI: fail slow tests (not `--full`) #20672

mdhaber commented May 9, 2024

tylerjereddy May 9, 2024

mdhaber May 10, 2024

andyfaff commented May 10, 2024

andyfaff commented May 10, 2024

mdhaber commented May 10, 2024

mdhaber left a comment

andyfaff commented May 10, 2024

mdhaber commented May 10, 2024 •

edited

andyfaff commented May 10, 2024

sturlamolden commented May 10, 2024

sturlamolden commented May 10, 2024

mdhaber commented May 19, 2024

rgommers commented May 19, 2024

mdhaber commented May 19, 2024

rgommers commented May 19, 2024

mdhaber commented May 19, 2024 •

edited

rgommers commented May 19, 2024

mdhaber commented May 19, 2024

mdhaber commented May 19, 2024 •

edited

mdhaber commented May 19, 2024

mdhaber commented May 25, 2024

rgommers commented May 25, 2024

mdhaber commented May 25, 2024

rgommers left a comment

CI: fail slow tests (not --full) #20672

CI: fail slow tests (not --full) #20672

Conversation

mdhaber commented May 9, 2024

Reference issue

What does this implement/fix?

Additional information

tylerjereddy May 9, 2024

Choose a reason for hiding this comment

mdhaber May 10, 2024

Choose a reason for hiding this comment

andyfaff commented May 10, 2024

andyfaff commented May 10, 2024

mdhaber commented May 10, 2024

mdhaber left a comment

Choose a reason for hiding this comment

andyfaff commented May 10, 2024

mdhaber commented May 10, 2024 • edited

andyfaff commented May 10, 2024

sturlamolden commented May 10, 2024

sturlamolden commented May 10, 2024

mdhaber commented May 19, 2024

rgommers commented May 19, 2024

mdhaber commented May 19, 2024

rgommers commented May 19, 2024

mdhaber commented May 19, 2024 • edited

rgommers commented May 19, 2024

mdhaber commented May 19, 2024

mdhaber commented May 19, 2024 • edited

mdhaber commented May 19, 2024

mdhaber commented May 25, 2024

rgommers commented May 25, 2024

mdhaber commented May 25, 2024

rgommers left a comment

Choose a reason for hiding this comment

CI: fail slow tests (not `--full`) #20672

CI: fail slow tests (not `--full`) #20672

mdhaber commented May 10, 2024 •

edited

mdhaber commented May 19, 2024 •

edited

mdhaber commented May 19, 2024 •

edited