Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: fail slow tests (not --full) #20672

Merged
merged 22 commits into from
May 26, 2024
Merged

CI: fail slow tests (not --full) #20672

merged 22 commits into from
May 26, 2024

Conversation

mdhaber
Copy link
Contributor

@mdhaber mdhaber commented May 9, 2024

Reference issue

Followup to gh-20480

What does this implement/fix?

This PR causes CI to report a failure if a test not marked slow or xslow takes over 1s to complete. Exceptions can be made using @pytest.mark.fail_slow.

Additional information

The intent is to distribute the job of keeping test times reasonable to all PR authors (rather than requiring maintainers to perform cleanups periodically).

[skip cirrus] [skip circle]
@mdhaber mdhaber added the CI Items related to the CI tools such as CircleCI, GitHub Actions or Azure label May 9, 2024

- name: Build
run: |
python dev.py build --with-scipy-openblas

- name: Test
run: |
python dev.py test -j2
python dev.py test -j2 -- --durations=0 --durations-min=0.25 --fail-slow=1.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I think the original experiment of adding the one CI job with pytest-fail-slow below, with a 5-second global threshold to catch truly egregious single test cases, was a good thing.

That said, I'm less keen on adding another CI job and a second timing threshold to be aware of. I think this is going to be subjective of course, but it seems to me that we'd just be discouraging one form of total test time creep, but there are many others like overparametrization that would add up as well.

At some point, the code reviewer will have to look the tests over on a case by case basis to see what makes sense and what doesn't. I think the 5 second cutoff for "you should take a look at that" make sense, but an additional 1 second cutoff is starting to make the automatic catching a bit complex/picky for my taste. It may be preferable to have a single 1.1 second test vs. a parametrized test with 37 0.8 second cases of course, and I don't know if I'd want the CI weighing in on that this way.

That said, I don't care that much. In general, I'd probably opt for just putting the exception markers on anything that gets flagged in spatial rather than hiding the tests in the full/slow mode, which I often don't use when iterating locally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The total time of the spatial tests on the list is only 10s, so that seems OK to me to make an exception and give them 5s.

@andyfaff
Copy link
Contributor

I just re-ran the duration test after the DE tweaks I made. I was surprised to see no change. What's interesting is what happens when the test ran. I expect the PR to be merged into the tip of scipy/main, currently 7e69656, before the tests run.
However, the PR was merged into bf645b7, which was before the commits that tweaked the tests.

@andyfaff
Copy link
Contributor

@sturlamolden I'm looking over long test durations and came across test_cobyla_threadsafe which is one of the longer tests in optimize. I wanted to check on why there's a sleep in the objective functions. The sleep calls makes the test run for a lot longer than it could do. Is it there to try to ensure both threads run simultaneously?

@mdhaber
Copy link
Contributor Author

mdhaber commented May 10, 2024

@andyfaff I'll merge main manually tonight to see what effect that had.

Copy link
Contributor Author

@mdhaber mdhaber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, guess I forgot to run this file locally.

scipy/linalg/tests/test_decomp.py Outdated Show resolved Hide resolved
scipy/linalg/tests/test_decomp.py Outdated Show resolved Hide resolved
scipy/linalg/tests/test_matfuncs.py Outdated Show resolved Hide resolved
[skip cirrus] [skip circle]
@andyfaff
Copy link
Contributor

Something weird is going on with two of those windows tests. The build step is taking twice as long (~24 mins) as the other (11 min).

The fast build uses python -m build and no pythran. THe slower builds use python dev.py and pythran. Not sure which is the issue there.

@mdhaber
Copy link
Contributor Author

mdhaber commented May 10, 2024

Hmm I didn't change that part. This run is from April 29, before the first of these PRs with fail_slow merged, and it was taking a long time then, too (e.g. 25 minutes for the fail slow, full, py3.10/npMin, dev.py job):
https://github.com/scipy/scipy/actions/runs/8872344008/job/24356554416
I'm not sure about that either.

@andyfaff
Copy link
Contributor

Not using Pythran cuts the time from 24 mins to ~18. But I don't think it's the whole story.

@sturlamolden
Copy link
Contributor

@sturlamolden I'm looking over long test durations and came across test_cobyla_threadsafe which is one of the longer tests in optimize. I wanted to check on why there's a sleep in the objective functions. The sleep calls makes the test run for a lot longer than it could do. Is it there to try to ensure both threads run simultaneously?

It is to force the scheduler to release the reminder of the time slice and allow another thread to execute. Otherwise we are not testing concurrent access to cobyla.

@sturlamolden
Copy link
Contributor

It might be that sleep(0) is sufficient, but I think we needed sleep(0.1) to actually trigger the segfault with the original code.

@mdhaber
Copy link
Contributor Author

mdhaber commented May 19, 2024

@rgommers I modified tests and made exceptions mostly as recommended by maintainers of each subpackage, although sometimes I just gave tests an exception rather than marking them as slow. I think this accounts for most tests that take >0.5s, but the threshold for failure is 1s, so we should be pretty robust w.r.t. variation in execution time.

The ony question I have is why macOS tests / Conda & umfpack/scikit-sparse, fast, py3.11/npAny, dev.py (3.11) failed in the previous run. It looks like it failed due to pytest-fail-slow, but I don't see why pyest would be running with that option.

@mdhaber mdhaber changed the title WIP/CI: fail slow tests (not --full) CI: fail slow tests (not --full) May 19, 2024
@rgommers
Copy link
Member

The ony question I have is why macOS tests / Conda & umfpack/scikit-sparse, fast, py3.11/npAny, dev.py (3.11) failed in the previous run. It looks like it failed due to pytest-fail-slow, but I don't see why pyest would be running with that option.

Why wouldn't it? The test reads:

@pytest.mark.fail_slow(5)
def test_cobyla_threadsafe():

https://github.com/jwodder/pytest-fail-slow?tab=readme-ov-file#failing-slow-tests says: "To cause a specific test to fail if it takes too long to run, apply the fail_slow marker to it, with the desired cutoff time as the argument:". So this is used as long as pytest-fail-slow is installed. Which it is in that job.

@mdhaber
Copy link
Contributor Author

mdhaber commented May 19, 2024

I see. I was thinking of it being activated with a command line option like --slow, but I see now that the fail-slow option just sets the default fail time.

@rgommers
Copy link
Member

Indeed. I think we should not over-use @pytest.mark.fail_slow. We're still counting on it not being installed for most users/packagers, because a lot of the tests that have the mark are certainly going to be failing otherwise on slow machines or under QEMU.

@mdhaber
Copy link
Contributor Author

mdhaber commented May 19, 2024

As a follow-up, I can look into options for using it only when a command line argument is passed or an environment variable is set so that it would work like I had envisioned. Or we could simply remove it from environment.yml.

@rgommers
Copy link
Member

Or we could simply remove it from environment.yml.

This sounds fine to me for now, easiest to do that and merge this PR.

We only really want pytest-fail-slow to be used in certain
CI jobs; this is the simple way to get that.

[skip circle] [skip cirrus]
@mdhaber mdhaber requested a review from rgommers as a code owner May 19, 2024 14:37
@mdhaber
Copy link
Contributor Author

mdhaber commented May 19, 2024

OK. I thought about just commenting it out with a note and/or adding a tip in the developer documentation, but for now just removed so as not to hold this up.

@mdhaber
Copy link
Contributor Author

mdhaber commented May 19, 2024

Oops @steppi I didn't add an exception for test_round.py::test_add_round_up and test_round.py::test_add_round_downlike you suggested because I meant to ask about them. They take about 0.01s on my machine; any idea why they take almost a full second on CI?

I'll mark it for now and maybe file an issue about the marked tests.

scipy/integrate/tests/test__quad_vec.py Outdated Show resolved Hide resolved
doc/source/dev/contributor/continuous_integration.rst Outdated Show resolved Hide resolved
scipy/integrate/tests/test__quad_vec.py Outdated Show resolved Hide resolved
@mdhaber
Copy link
Contributor Author

mdhaber commented May 19, 2024

Ok, that should do it.

@mdhaber
Copy link
Contributor Author

mdhaber commented May 25, 2024

@rgommers sounded like you might be interested in merging this one. If so, LMK when you'll have a chance for a last look and I'll re-run CI before then to make sure there are no new slow tests to adjust before merging.

@rgommers
Copy link
Member

Yes indeed - now or tomorrow should work.

@mdhaber
Copy link
Contributor Author

mdhaber commented May 25, 2024

Oops. I had applied an exception to test_low_dim_no_ls instead of test_high_dim_no_ls, and test_high_dim_no_ls happened to be one of those borderline tests.

Copy link
Member

@rgommers rgommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This now looks about as clean as it is going to get, and all the new instructions in the docs are clear and work as advertised in my testing.

I'm still a little worried that the amount of flaky failures is going to be on the high side, but there is only one way to find out - which is to start using this. So in it goes - thanks Matt!

@rgommers rgommers merged commit b863eb9 into scipy:main May 26, 2024
31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Items related to the CI tools such as CircleCI, GitHub Actions or Azure
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants