[BUG] long test collection time and test timeouts #6344

fkiraly · 2024-04-26T21:53:26Z

Update: as @yarnabrina noted, there is an issue with long test times and long test collection times. This can be repreoduced locally.

python -m pytest -n0 --collect-only used to take <1min, now it is at 10min.

Using py-spy record -o profile.svg -- python -m pytest -n0 --collect-only for profiling now.

Related: #4900

VARMAX seems to be sporadically timing out with >10min durations.

Unsure what is causing this - previous failures were reported for some other tests, see #2997, #3176.

For testing, the skips in #6345 need to be reverted.

The text was updated successfully, but these errors were encountered:

Skips tests for the `VARMAX` forecaster until #6344 is resolved.

fkiraly · 2024-04-28T18:21:15Z

@yarnabrina, I think we should also add a test for a 2min upper bound on the test collection or similar.

fkiraly · 2024-04-28T18:39:32Z

Result of test collection for classification, using

py-spy record -o profile_classification.svg -- python -m pytest sktime/classification -n0 --collect-only - 2 minutes (!)

fkiraly · 2024-04-28T18:44:15Z

according to this, coverage report generation takes 80 seconds? That is odd, why would a coverage report be generated, if we only collect?

yarnabrina · 2024-04-28T18:45:24Z

Can we get the profiling for all components, with and without sktime/tests?

Question: where/how are you generating these plots, and how do I read it? I'm not familiar with py-spy?

yarnabrina · 2024-04-28T18:46:13Z

according to this, coverage report generation takes 80 seconds? That is odd, why would a coverage report be generated, if we only collect?

I think/guess pytest-cov is enabled by default?

https://github.com/sktime/sktime/blob/main/setup.cfg#L17-L19

fkiraly · 2024-04-28T18:49:57Z

Question: where/how are you generating these plots

I pasted the exact command - you need to run it from sktime root, in an env that has py-spy installed.

and how do I read it? I'm not familiar with py-spy?

The horizontal bars are time spent in function calls, lines in files are also given. So you are looking for public functions or similar, to see how much time is spent in there. The further down you go, the further the function calls are nested.

In the svg produced, in a browser you can also magnify by subsetting on one of the bars - click on one.

Can we get the profiling for all components, with and without sktime/tests?

I think that wlil not be as useful as investigating a component where times are short, such as classification. This is because the times there are already unexpectedly high, they should be in the range of 10s of seconds. My hunch is, identifying the issue there may also shed light on the other modules (there the times are just longer, so one diagnostic iteration takes more)

fkiraly · 2024-04-28T18:51:19Z

I think/guess pytest-cov is enabled by default?

yes, in setup.cfg.

I removed the lines to check - and collecting classification tests still takes 2 minutes. That is very odd.

fkiraly · 2024-04-28T19:31:02Z

Part of this seem to be import chains and import coulpling:
#6354

fkiraly · 2024-04-28T19:31:57Z

Pinging @tarpas, @art049 - in case you have any ideas what to do here?

The timeouts started occurring around a month or so ago.

fkiraly · 2024-04-28T19:38:11Z

Runtimes of historical versions for the classification line:

0.22.0 - 12 sec for 301 tests
0.24.0 - 11 sec for 294 tests
0.25.0 - 38 sec for 355 tests
0.26.0 - 38 sec for 355 tests
0.27.0 - 56 sec for 363 tests

fkiraly · 2024-04-28T19:46:16Z

I think I understand the issue - it is run_test_for_class. The logic on which tests to run has been expanded over the versions.

If we replace run_test_for_class with only _required_deps_present, the time for the classifcation module shrinks to 5 secs, for 352 tests at current version.

For the entirety of sktime we get to 110 secs with _required_deps_present, and 22 seconds without, that is probably the current the floor from optimizing run_test_for_class.

fkiraly · 2024-04-28T21:07:06Z

I think caching indeed greatly cuts the time: #6357

fkiraly · 2024-04-28T21:12:37Z

Bumps in test time are explained by added logic to run_test_for_class:

0.22.0 - 12 sec for 301 tests
0.24.0 - 11 sec for 294 tests
- in 0.24.2, parent classes were included in diff check
0.25.0 - 38 sec for 355 tests
0.26.0 - 38 sec for 355 tests
- in 0.26.1, the condition on dep change in pyproject was added
0.27.0 - 56 sec for 363 tests

Caching gets this down to 5 sec

yarnabrina · 2024-04-29T03:30:20Z

Can you please tell me which PR/changes should I make locally to test your final changes to get timings?

I am also interested to know what are the timings for you after these dependencies and caching changes when you include sktime/tests along with sktime/classification, as that's what added the big time increase for me.

fkiraly · 2024-04-29T08:29:58Z

(not a reply to the above, just a note following CI runs. Reply is in the next pots)

Timeouts occur for the following classes:

AutoARIMA
SARIMAX
VARMAX
StatsModelsARIMA
StatsForecastAutoARIMA
BATS
TBATS
ShapeletLearningClassifierTslearn

fkiraly · 2024-04-29T08:32:10Z

Can you please tell me which PR/changes should I make locally to test your final changes to get timings?

This PR: #6357

I am also interested to know what are the timings for you after these dependencies and caching changes when you include sktime/tests along with sktime/classification, as that's what added the big time increase for me.

All tests in sktime, including the above, take 22 sec to collect with this PR. I think it can be further reduced using the other speedup PR, but I would not add them to 0.29.0.

yarnabrina · 2024-04-29T10:28:00Z

All tests in sktime, including the above, take 22 sec to collect with this PR.

Sorry, while I do see some improvements, I can not reproduce anything that is close to this.

modified `setup.cfg`

[aliases]
test = pytest

[tool.isort]
profile = "black"
multi_line_output = 3

[tool:pytest]
# ignore certain folders and pytest warnings
addopts =
    --ignore build_tools
    --ignore examples
    --ignore docs
filterwarnings =
    ignore::UserWarning
    ignore:numpy.dtype size changed
    ignore:numpy.ufunc size changed

[flake8]
# Default flake8 3.5 ignored flags
ignore = E121, E123, E126, E226, E24, E704, W503, W504
# inline with Black code formatter
max-line-length = 88
exclude =
    sktime/_contrib/*
extend-ignore =
    # See https://github.com/PyCQA/pycodestyle/issues/373
    E203

[metadata]
description_file = README.md
long_description_content_type = text/markdown

[check-manifest]
ignore =
    .binder/**
    .all-contributorsrc
    .coveragerc
    examples/**
    build_tools/**
    __check_build/**
    docs/**
    Makefile
    CODEOWNERS
    CONTRIBUTING.md
    *.yaml
    *.yml

[isort]
profile = black

[pydocstyle]
convention = numpy
match = (?!test_).*\.py

[bdist_wheel]
universal = false

[sdist]
formats = gztar

`pip freeze` before running below commands

adagio==0.2.4
aiohttp==3.9.5
aiosignal==1.3.1
alembic==1.13.1
antlr4-python3-runtime==4.11.1
appdirs==1.4.4
async-timeout==4.0.3
attrs==23.2.0
autoflake==2.3.1
certifi==2024.2.2
cfgv==3.4.0
charset-normalizer==3.3.2
click==8.1.7
cloudpickle==3.0.0
colorlog==6.8.2
coreforecast==0.0.8
distlib==0.3.8
exceptiongroup==1.2.1
filelock==3.13.4
frozenlist==1.4.1
fs==2.4.16
fsspec==2024.3.1
fugue==0.8.7
fugue-sql-antlr==0.2.0
greenlet==3.0.3
identify==2.5.35
idna==3.7
iniconfig==2.0.0
Jinja2==3.1.3
joblib==1.4.0
jsonschema==4.21.1
jsonschema-specifications==2023.12.1
lightning-utilities==0.11.2
llvmlite==0.42.0
Mako==1.3.3
MarkupSafe==2.1.5
mpmath==1.3.0
msgpack==1.0.8
multidict==6.0.5
networkx==3.3
neuralforecast==1.7.1
nodeenv==1.8.0
numba==0.59.1
numpy==1.26.4
optuna==3.6.1
packaging==24.0
pandas==2.2.2
patsy==0.5.6
platformdirs==4.2.0
pluggy==1.4.0
pre-commit==3.7.0
protobuf==5.26.1
pyarrow==15.0.2
pyflakes==3.2.0
pytest==8.1.1
python-dateutil==2.9.0.post0
pytorch-lightning==2.2.2
pytz==2024.1
pyupgrade==3.15.2
PyYAML==6.0.1
qpd==0.4.4
ray==2.11.0
referencing==0.34.0
requests==2.31.0
rpds-py==0.18.0
scikit-base==0.7.7
scikit-learn==1.4.2
scipy==1.13.0
six==1.16.0
-e git+https://github.com/yarnabrina/sktime-fork.git@2bb0c1bb454f2e7851611b12f6425a4f19841d17#egg=sktime
SQLAlchemy==2.0.29
sqlglot==23.10.0
statsforecast==1.7.4
statsmodels==0.14.2
sympy==1.12
tensorboardX==2.6.2.2
threadpoolctl==3.4.0
tokenize-rt==5.2.0
tomli==2.0.1
torch==2.2.2+cpu
torchmetrics==1.3.2
tqdm==4.66.2
triad==0.9.6
typing_extensions==4.11.0
tzdata==2024.1
urllib3==2.2.1
utilsforecast==0.1.5
virtualenv==20.25.3
yarl==1.9.4

branch `main` (commit `2bb0c1bb45`)

component `forecasting` (command: `python -m pytest sktime/forecasting --co`)

25.88 seconds (43669 tests)

component `forecasting` with `tests` framework (command: `python -m pytest sktime/forecasting sktime/tests --co`)

322.94 seconds (79603 tests)

component `classification` (command: `python -m pytest sktime/classification --co`)

7.15 seconds (568 tests)

component `classification` with `tests` framework (command: `python -m pytest sktime/classification sktime/tests --co`)

299.09 seconds (36502 tests)

branch `cache-test-switch` (commit `c9e842c558`)

component `forecasting` (command: `python -m pytest sktime/forecasting --co`)

12.09 seconds (40400 tests)
failed with error ERROR sktime/forecasting/model_evaluation/tests/test_evaluate.py - TypeError: unhashable type: 'CRPS'

component `forecasting` with `tests` framework (command: `python -m pytest sktime/forecasting sktime/tests --co`)

239.78 seconds (79603 tests)
failed with error ERROR sktime/forecasting/model_evaluation/tests/test_evaluate.py - TypeError: unhashable type: 'CRPS'

component `classification` (command: `python -m pytest sktime/classification --co`)

4.53 seconds (568 tests)

component `classification` with `tests` framework (command: `python -m pytest sktime/classification sktime/tests --co`)

224.60 seconds (36502 tests)

fkiraly · 2024-04-29T11:37:22Z

@yarnabrina, I think we have different dependencies, so the number of tests differs?

The CRPS failure is a bug that got fixed in a newer commit - object are not hashable, so we need to get the class.

My timings on branch cache-test-switch (9e842c558ea4e4163582e1be3e50bcafb68f1b6)

python -m pytest sktime/forecasting --co - 9 secs, 5108 tests
python -m pytest sktime/forecasting sktime/tests --co - 20 secs, 6575 tests
python -m pytest sktime/classification --co - 6 secs, 246 tests
python -m pytest sktime/classification sktime/tests --co - 17 secs, 1713 tests

fkiraly · 2024-04-29T11:40:52Z

I should also note that my timings significantly increase if I run with pytest-cov present and the "coverage" flags enabled, i.e.,

    --cov sktime
    --cov-report xml
    --cov-report html

Then, timings are:

python -m pytest sktime/forecasting --co - 32 secs, 5108 tests
python -m pytest sktime/forecasting sktime/tests --co 44 secs, 6575 tests
python -m pytest sktime/classification --co - 25 secs, 246 tests
python -m pytest sktime/classification sktime/tests --co- 51 secs, 1713 tests

I've noted the huge bump in time from coverage logic already here, in the profiler logs: #6344 (comment)

What are your timings with pytest-cov off?

fkiraly · 2024-04-29T11:53:16Z

... maybe we should also turn off pytest-cov in the CI tests, and run it only on test-all?

Coverage is off anyway, due to incremental/differential testing, see #5090

yarnabrina · 2024-04-29T15:41:46Z

I shared my setup.cfg and pip freeze above, all my reported times are without coverage. It's not even installed in my environment.

I am really shocked seeing the difference in number of tests and timings, especially number of tests. I think I only have statsmodels, pmdarima and statsforecast installed, may be neuralforecast as well. Just these are increasing it from 6575 to 79603? This seems a bit too much.

I definitely don't have any classification dependencies (not intentionally at least) and I have way more tests (36502/568) than you (1713/246). Does it match your expectations?

fkiraly · 2024-04-29T16:52:53Z

I shared my setup.cfg

Where is your setup.cfg? Cannot find it, sorry.

Have you turned off incremental testing? That might explain the diff in tests.

fkiraly · 2024-04-29T17:36:35Z

Timings and number of tests with --only_changed_modules False, cache-test-switch branch:

python -m pytest sktime/forecasting sktime/tests --co, coverage turned on: 75190 tests, 455 secs
python -m pytest sktime/forecasting sktime/tests --co, coverage turned off: 75190 tests, 240 secs

yarnabrina · 2024-04-29T18:10:56Z

modified setup.cfg
pip freeze before running below commands

Please click on the arrows, I made those collapsible as those are big files.

And yes I believe I have those subset per OS and only tun changed etc. turned off. Otherwise it won't be a full benchmarking I think?

That explains number of tests difference I guess, so that's a relief. I have an unrelated question: why do you see even any tests for your branch? You haven't modified forecasting or classification right, shouldn't it be 0 then? Or are there some exceptions to increment only flag?

fkiraly · 2024-04-29T22:01:53Z

why do you see even any tests for your branch? You haven't modified forecasting or classification right, shouldn't it be 0 then?

The flag impacts only tests from the TestAll... classes, and tests decorated with run_tests_for_class pytest skips. All other tests are still collected, and run.

fkiraly · 2024-04-29T22:03:21Z

Please click on the arrows,

Ah, thanks for explaining. Did not see these at first.

The large differences is explained by your turning off of differential/incremental testing.

…ilities (#6357) Towards #6344. Speeds up test collection by `lru_cache`-ing all test switch utilities called from `run_test_for_class`. This gets test collection time for the classification module to 5 seconds, compared to historical test times: * 0.22.0 - 12 sec for 301 tests * 0.24.0 - 11 sec for 294 tests * in 0.24.2, parent classes were included in diff check * 0.25.0 - 38 sec for 355 tests * 0.26.0 - 38 sec for 355 tests * in 0.26.1, the condition on dep change in `pyproject` was added * 0.27.0 - 56 sec for 363 tests * 0.28.1 - 72 sec for 246 tests * this PR: 5 sec for 246 tests (timings from `python -m pytest sktime/classification -n0 --collect-only`) For all of `sktime`, the test collection time is down to 40sec for me locally, from >20 minutes.

This PR adds temporary skips for estimators involved in timeouts, see #6344, until the problem is fully diagnosed and addressed.

Also skips tests for `pmdarima` `ARIMA`, in relation to #6344

Removes test step from the release action - temporary to allow quick release. Issue #6344 causes mac-13 to run longer than 6 hours and then time out - there are no known failures, just too long runtimes at the moment. To be reverted.

fkiraly added bug Something isn't working module:forecasting forecasting module: forecasting, incl probabilistic and hierarchical forecasting labels Apr 26, 2024

fkiraly mentioned this issue Apr 26, 2024

[MNT] skip VARMAX until sporadic failures are resolved #6345

Merged

fkiraly added a commit that referenced this issue Apr 26, 2024

[MNT] skip VARMAX until sporadic failures are resolved (#6345)

ec2a90b

Skips tests for the `VARMAX` forecaster until #6344 is resolved.

fkiraly changed the title ~~[BUG] VARMAX test timeouts~~ [BUG] long test collection time and test timeouts Apr 28, 2024

fkiraly mentioned this issue Apr 28, 2024

[BUG] logic problem with new CI - coverage of TestAllEstimators and TestAllObjects incorrect #6352

Closed

fkiraly mentioned this issue Apr 28, 2024

[MNT] Move dependency checkers to separate module in utils #6354

Draft

This was referenced Apr 28, 2024

[ENH] performance optimization for _check_soft_dependencies, speed up test collection time #6355

Open

[MNT] speed up test collection - cache differential testing switch utilities #6357

Merged

fkiraly mentioned this issue Apr 29, 2024

[MNT] temporary skip of estimators involved in timeouts #6344 #6361

Merged

fkiraly mentioned this issue Apr 29, 2024

[MNT] remove coverage reporting and pytest-cov from PR CI #6363

Draft

fkiraly mentioned this issue Apr 30, 2024

[MNT] final change cycle (0.30.0) for renaming cINNForecaster to CINNForecaster #6367

Open

fkiraly added a commit that referenced this issue Apr 30, 2024

[MNT] temporary skip of estimators involved in timeouts #6344 (#6361)

e6e35ba

This PR adds temporary skips for estimators involved in timeouts, see #6344, until the problem is fully diagnosed and addressed.

fkiraly mentioned this issue May 1, 2024

[MNT] temporary skip tests for ARIMA in relation to #6344 #6368

Merged

fkiraly added a commit that referenced this issue May 1, 2024

[MNT] temporary skip tests for ARIMA in relation to #6344 (#6368)

15f41c1

Also skips tests for `pmdarima` `ARIMA`, in relation to #6344

fkiraly mentioned this issue May 1, 2024

[MNT] instant release action - temporary #6371

Merged

fkiraly mentioned this issue May 3, 2024

[ENH] set random seed in TestAllForecasters data generation - potential solution for sporadic failures #6382

Draft

fkiraly mentioned this issue May 11, 2024

[MNT] improve local testing experience, setup.cfg feedback #6407

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] long test collection time and test timeouts #6344

[BUG] long test collection time and test timeouts #6344

fkiraly commented Apr 26, 2024 •

edited

fkiraly commented Apr 28, 2024

fkiraly commented Apr 28, 2024 •

edited

fkiraly commented Apr 28, 2024

yarnabrina commented Apr 28, 2024

yarnabrina commented Apr 28, 2024 •

edited

fkiraly commented Apr 28, 2024 •

edited

fkiraly commented Apr 28, 2024

fkiraly commented Apr 28, 2024

fkiraly commented Apr 28, 2024 •

edited

fkiraly commented Apr 28, 2024

fkiraly commented Apr 28, 2024 •

edited

fkiraly commented Apr 28, 2024

fkiraly commented Apr 28, 2024 •

edited

yarnabrina commented Apr 29, 2024

fkiraly commented Apr 29, 2024 •

edited

fkiraly commented Apr 29, 2024

yarnabrina commented Apr 29, 2024

fkiraly commented Apr 29, 2024 •

edited

fkiraly commented Apr 29, 2024 •

edited

fkiraly commented Apr 29, 2024 •

edited

yarnabrina commented Apr 29, 2024

fkiraly commented Apr 29, 2024

fkiraly commented Apr 29, 2024 •

edited

yarnabrina commented Apr 29, 2024

fkiraly commented Apr 29, 2024

fkiraly commented Apr 29, 2024 •

edited

[BUG] long test collection time and test timeouts #6344

[BUG] long test collection time and test timeouts #6344

Comments

fkiraly commented Apr 26, 2024 • edited

fkiraly commented Apr 28, 2024

fkiraly commented Apr 28, 2024 • edited

fkiraly commented Apr 28, 2024

yarnabrina commented Apr 28, 2024

yarnabrina commented Apr 28, 2024 • edited

fkiraly commented Apr 28, 2024 • edited

fkiraly commented Apr 28, 2024

fkiraly commented Apr 28, 2024

fkiraly commented Apr 28, 2024 • edited

fkiraly commented Apr 28, 2024

fkiraly commented Apr 28, 2024 • edited

fkiraly commented Apr 28, 2024

fkiraly commented Apr 28, 2024 • edited

yarnabrina commented Apr 29, 2024

fkiraly commented Apr 29, 2024 • edited

fkiraly commented Apr 29, 2024

yarnabrina commented Apr 29, 2024

branch main (commit 2bb0c1bb45)

component forecasting (command: python -m pytest sktime/forecasting --co)

component forecasting with tests framework (command: python -m pytest sktime/forecasting sktime/tests --co)

component classification (command: python -m pytest sktime/classification --co)

component classification with tests framework (command: python -m pytest sktime/classification sktime/tests --co)

branch cache-test-switch (commit c9e842c558)

component forecasting (command: python -m pytest sktime/forecasting --co)

component forecasting with tests framework (command: python -m pytest sktime/forecasting sktime/tests --co)

component classification (command: python -m pytest sktime/classification --co)

component classification with tests framework (command: python -m pytest sktime/classification sktime/tests --co)

fkiraly commented Apr 29, 2024 • edited

fkiraly commented Apr 29, 2024 • edited

fkiraly commented Apr 29, 2024 • edited

yarnabrina commented Apr 29, 2024

fkiraly commented Apr 29, 2024

fkiraly commented Apr 29, 2024 • edited

yarnabrina commented Apr 29, 2024

fkiraly commented Apr 29, 2024

fkiraly commented Apr 29, 2024 • edited

fkiraly commented Apr 26, 2024 •

edited

fkiraly commented Apr 28, 2024 •

edited

yarnabrina commented Apr 28, 2024 •

edited

fkiraly commented Apr 28, 2024 •

edited

fkiraly commented Apr 28, 2024 •

edited

fkiraly commented Apr 28, 2024 •

edited

fkiraly commented Apr 28, 2024 •

edited

fkiraly commented Apr 29, 2024 •

edited

branch `main` (commit `2bb0c1bb45`)

component `forecasting` (command: `python -m pytest sktime/forecasting --co`)

component `forecasting` with `tests` framework (command: `python -m pytest sktime/forecasting sktime/tests --co`)

component `classification` (command: `python -m pytest sktime/classification --co`)

component `classification` with `tests` framework (command: `python -m pytest sktime/classification sktime/tests --co`)

branch `cache-test-switch` (commit `c9e842c558`)

component `forecasting` (command: `python -m pytest sktime/forecasting --co`)

component `forecasting` with `tests` framework (command: `python -m pytest sktime/forecasting sktime/tests --co`)

component `classification` (command: `python -m pytest sktime/classification --co`)

component `classification` with `tests` framework (command: `python -m pytest sktime/classification sktime/tests --co`)

fkiraly commented Apr 29, 2024 •

edited

fkiraly commented Apr 29, 2024 •

edited

fkiraly commented Apr 29, 2024 •

edited

fkiraly commented Apr 29, 2024 •

edited

fkiraly commented Apr 29, 2024 •

edited