Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] set random seed in TestAllForecasters data generation - potential solution for sporadic failures #6382

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

fkiraly
Copy link
Collaborator

@fkiraly fkiraly commented May 3, 2024

@benHeid suggested that sporadic test failures and long test times in #6344 could be related to LU decomposition or similar issues in ARIMA - compare #6201.

This PR aims to help with diagnosis, by setting the random seed in data generation in TestAllForecasters. This should greatly reduce the number of different time series occurring, and hopefully make the failure behaviour - if impacted - deterministic.

FYI @yarnabrina

The CI should run all forecaster tests because of the "test class has changed" criterion.

@fkiraly fkiraly added module:forecasting forecasting module: forecasting, incl probabilistic and hierarchical forecasting do not merge should not be merged - e.g., CI diagnostic, experimental diagnostics diagnostic PR to run CI with a modification, e.g., pre-release of dependencies labels May 3, 2024
@fkiraly
Copy link
Collaborator Author

fkiraly commented May 3, 2024

hm, no failures but long runtimes - StatsForecastAutoTBATS and StatsForecastAutoTheta specifically.

@benHeid
Copy link
Contributor

benHeid commented May 5, 2024

hm, no failures but long runtimes - StatsForecastAutoTBATS and StatsForecastAutoTheta specifically.

Was this one run or multiples?

After taking a look into the runtimes. Based on the one action run I observe:

  • MacOS is slow on python 3.8. The other python versions seems to be much faster..
  • In some tests, I think ubuntu the HFForecaster is quite slow. (Taking more than 50 secs...) Not sure what the reason is... I suppose the download speed should not be the issue since the model used for testing shouldn't be that large.

@fkiraly
Copy link
Collaborator Author

fkiraly commented May 5, 2024

I suppose the download speed should not be the issue since the model used for testing shouldn't be that large.

Maybe it is latency? Perhaps this is some DDoS protection kicking in, not allowing too many downloads from the same IP address?

This is perhaps related to a new problem I have been seeing: sometimes when I try to access the logs, my virus scanner says the IP has been blacklisted.

Hypothesis:

  • there are many hugging face model downloads - each individual fit in the test suite triggers it
  • hugging face detects this with a DDoS filter - false positive, but this also causes the IP to be blocked, temporarily, or blacklisted in shared IP blacklists by antivirus providers
  • that causes the load to fail or to hang.

Could it be this, @benHeid, @yarnabrina?

If yes, it may indicate we need to think carefully about testing of hugging face based models - we had similar issues with downloads earlier, so we moved them out to a separate "downloads" CI element. Only that this time, these are downloads attached to models.

@benHeid
Copy link
Contributor

benHeid commented May 5, 2024

Hypothesis:

  • there are many hugging face model downloads - each individual fit in the test suite triggers it
  • hugging face detects this with a DDoS filter - false positive, but this also causes the IP to be blocked, temporarily, or blacklisted in shared IP blacklists by antivirus providers
  • that causes the load to fail or to hang.

Could it be this, @benHeid, @yarnabrina?

We need to test if there are really that many downloads. As far as I know, hugging face is caching downloads. Thus, once the model is downloaded it shouldn't be downloaded again..

@benHeid
Copy link
Contributor

benHeid commented May 5, 2024

Regarding the long runtimes of the AutoModel from Statsforecast. I suppose that there is something strange with python 3.8 (and Mac). Furthermore, I think that this issue is not located in sktime:

Local measurement of the execution time of the unit test with different initializations of the model:

  • [237.04926705360413, 227.80348587036133, 171.04005408287048, 161.69240593910217] Python 3.8.19
  • [65.56261396408081, 56.138838052749634, 43.59553575515747, 46.759567975997925] Python 3.10.13

Measurements of the direct fit time. (20 fits with a random time series with a length 1000). I executed it multiple times, since the numbers are fluctuating a lot... Random data 20 fits

  • 24 - 54 sec Python 3.8
  • 11 - 22 sec Python 3.10

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
diagnostics diagnostic PR to run CI with a modification, e.g., pre-release of dependencies do not merge should not be merged - e.g., CI diagnostic, experimental module:forecasting forecasting module: forecasting, incl probabilistic and hierarchical forecasting
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants