Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] load_forecastingdata with wind_4_seconds_dataset fails #6388

Open
benHeid opened this issue May 5, 2024 · 7 comments
Open

[BUG] load_forecastingdata with wind_4_seconds_dataset fails #6388

benHeid opened this issue May 5, 2024 · 7 comments
Labels
bug Something isn't working module:datasets&loaders data sets and data loaders
Projects

Comments

@benHeid
Copy link
Contributor

benHeid commented May 5, 2024

load_forecastingdata("wind_4_seconds_dataset", return_type="pd_multiindex_hier") fails with KeyError: '4_seconds'
To Reproduce

from sktime.datasets import load_forecastingdata
load_forecastingdata("solar_4_seconds_dataset", return_type="pd_multiindex_hier")

Expected behavior
Should return the time series instead of failing.

@benHeid benHeid added bug Something isn't working module:datasets&loaders data sets and data loaders labels May 5, 2024
@benHeid benHeid added this to Needs triage & validation in Bugfixing via automation May 5, 2024
@benHeid benHeid changed the title [BUG] load_forecastingdata with "wind_4_seconds_dataset" fails [BUG] load_forecastingdata with wind_4_seconds_dataset fails May 5, 2024
@fkiraly fkiraly moved this from Needs triage & validation to Reproduced/confirmed in Bugfixing May 6, 2024
@fkiraly
Copy link
Collaborator

fkiraly commented May 6, 2024

I can confirm this on windows, python 3.11, current main.

It seems to me the data source has changed, and is no longer adhering to the original specification?

This is the failure cause:

image

Who is maintaining the specification or the module currently?

My guess is, @achieveordie, @hazrulakmal, @yarnabrina? Perhaps @ciaran-g?

@fkiraly
Copy link
Collaborator

fkiraly commented May 6, 2024

Plus, why did the tests not catch this - any ideas, @yarnabrina?

@yarnabrina
Copy link
Collaborator

We skip datasets folder in both new and old CI, if I remember correctly. Unless there's a change in that folder, those are not run in regular CI.

That being said, the CRON job did not fail so may be we have to check if this is covered or not.

https://github.com/sktime/sktime/actions/workflows/test_datasets.yml

@fkiraly
Copy link
Collaborator

fkiraly commented May 6, 2024

Hm, looks like there is no test that would actually attempt downloads from forecastingdata? We should at least make spot checks.

@achieveordie
Copy link
Collaborator

I was not involved in this part of the codebase. It doesn't seem like there's any change in the dataset (the last change appears to be in 2020) and the previous code changes were made by @hazrulakmal almost a year ago.

I suspect that frequency='4_seconds' was never incorporated in the code or tests and was found by @benHeid when he manually called this dataset.

@achieveordie
Copy link
Collaborator

Hm, looks like there is no test that would actually attempt downloads from forecastingdata? We should at least make spot checks.

The closest test we have seems to be test_load_forecastingdata() from test_datadownload.py but that only checks for a "UnitTest" file.

There's also a test called test_check_link_downloadable() from the same file but it only checks whether the link is active. It is also marked as expected to fail in #5462.

@achieveordie
Copy link
Collaborator

The easiest solution is to add "4_seconds": "4s" in freq_map although I'm not sure if there are more such datasets that might warrant a more programmatic approach (dynamically creating freq_map based on a regex).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working module:datasets&loaders data sets and data loaders
Projects
Bugfixing
Reproduced/confirmed
Development

No branches or pull requests

4 participants