New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] long test collection time and test timeouts #6344
Comments
Skips tests for the `VARMAX` forecaster until #6344 is resolved.
VARMAX
test timeouts
@yarnabrina, I think we should also add a test for a 2min upper bound on the test collection or similar. |
according to this, coverage report generation takes 80 seconds? That is odd, why would a coverage report be generated, if we only collect? |
Can we get the profiling for all components, with and without Question: where/how are you generating these plots, and how do I read it? I'm not familiar with py-spy? |
I think/guess pytest-cov is enabled by default? https://github.com/sktime/sktime/blob/main/setup.cfg#L17-L19 |
I pasted the exact command - you need to run it from
The horizontal bars are time spent in function calls, lines in files are also given. So you are looking for public functions or similar, to see how much time is spent in there. The further down you go, the further the function calls are nested. In the svg produced, in a browser you can also magnify by subsetting on one of the bars - click on one.
I think that wlil not be as useful as investigating a component where times are short, such as classification. This is because the times there are already unexpectedly high, they should be in the range of 10s of seconds. My hunch is, identifying the issue there may also shed light on the other modules (there the times are just longer, so one diagnostic iteration takes more) |
Part of this seem to be import chains and import coulpling: |
Runtimes of historical versions for the classification line:
|
I think I understand the issue - it is If we replace For the entirety of sktime we get to 110 secs with |
I think caching indeed greatly cuts the time: #6357 |
Bumps in test time are explained by added logic to
Caching gets this down to 5 sec |
Can you please tell me which PR/changes should I make locally to test your final changes to get timings? I am also interested to know what are the timings for you after these dependencies and caching changes when you include sktime/tests along with sktime/classification, as that's what added the big time increase for me. |
(not a reply to the above, just a note following CI runs. Reply is in the next pots) Timeouts occur for the following classes:
|
This PR: #6357
All tests in |
Sorry, while I do see some improvements, I can not reproduce anything that is close to this. modified `setup.cfg`
`pip freeze` before running below commands
branch
|
@yarnabrina, I think we have different dependencies, so the number of tests differs? The CRPS failure is a bug that got fixed in a newer commit - object are not hashable, so we need to get the class. My timings on branch
|
I should also note that my timings significantly increase if I run with
Then, timings are:
I've noted the huge bump in time from coverage logic already here, in the profiler logs: #6344 (comment) What are your timings with |
... maybe we should also turn off Coverage is off anyway, due to incremental/differential testing, see #5090 |
I shared my setup.cfg and pip freeze above, all my reported times are without coverage. It's not even installed in my environment. I am really shocked seeing the difference in number of tests and timings, especially number of tests. I think I only have statsmodels, pmdarima and statsforecast installed, may be neuralforecast as well. Just these are increasing it from 6575 to 79603? This seems a bit too much. I definitely don't have any classification dependencies (not intentionally at least) and I have way more tests (36502/568) than you (1713/246). Does it match your expectations? |
Where is your Have you turned off incremental testing? That might explain the diff in tests. |
Timings and number of tests with
|
Please click on the arrows, I made those collapsible as those are big files. And yes I believe I have those subset per OS and only tun changed etc. turned off. Otherwise it won't be a full benchmarking I think? That explains number of tests difference I guess, so that's a relief. I have an unrelated question: why do you see even any tests for your branch? You haven't modified forecasting or classification right, shouldn't it be 0 then? Or are there some exceptions to increment only flag? |
The flag impacts only tests from the |
Ah, thanks for explaining. Did not see these at first. The large differences is explained by your turning off of differential/incremental testing. |
…ilities (#6357) Towards #6344. Speeds up test collection by `lru_cache`-ing all test switch utilities called from `run_test_for_class`. This gets test collection time for the classification module to 5 seconds, compared to historical test times: * 0.22.0 - 12 sec for 301 tests * 0.24.0 - 11 sec for 294 tests * in 0.24.2, parent classes were included in diff check * 0.25.0 - 38 sec for 355 tests * 0.26.0 - 38 sec for 355 tests * in 0.26.1, the condition on dep change in `pyproject` was added * 0.27.0 - 56 sec for 363 tests * 0.28.1 - 72 sec for 246 tests * this PR: 5 sec for 246 tests (timings from `python -m pytest sktime/classification -n0 --collect-only`) For all of `sktime`, the test collection time is down to 40sec for me locally, from >20 minutes.
Also skips tests for `pmdarima` `ARIMA`, in relation to #6344
Removes test step from the release action - temporary to allow quick release. Issue #6344 causes mac-13 to run longer than 6 hours and then time out - there are no known failures, just too long runtimes at the moment. To be reverted.
Update: as @yarnabrina noted, there is an issue with long test times and long test collection times. This can be repreoduced locally.
python -m pytest -n0 --collect-only
used to take <1min, now it is at 10min.Using
py-spy record -o profile.svg -- python -m pytest -n0 --collect-only
for profiling now.Related: #4900
VARMAX
seems to be sporadically timing out with >10min durations.Unsure what is causing this - previous failures were reported for some other tests, see #2997, #3176.
For testing, the skips in #6345 need to be reverted.
The text was updated successfully, but these errors were encountered: