[BUG] `load_forecastingdata` with `wind_4_seconds_dataset` fails #6388

benHeid · 2024-05-05T15:36:56Z

load_forecastingdata("wind_4_seconds_dataset", return_type="pd_multiindex_hier") fails with KeyError: '4_seconds'
To Reproduce

from sktime.datasets import load_forecastingdata
load_forecastingdata("solar_4_seconds_dataset", return_type="pd_multiindex_hier")

Expected behavior
Should return the time series instead of failing.

The text was updated successfully, but these errors were encountered:

fkiraly · 2024-05-06T09:09:59Z

I can confirm this on windows, python 3.11, current main.

It seems to me the data source has changed, and is no longer adhering to the original specification?

This is the failure cause:

Who is maintaining the specification or the module currently?

My guess is, @achieveordie, @hazrulakmal, @yarnabrina? Perhaps @ciaran-g?

fkiraly · 2024-05-06T09:10:36Z

Plus, why did the tests not catch this - any ideas, @yarnabrina?

yarnabrina · 2024-05-06T10:23:02Z

We skip datasets folder in both new and old CI, if I remember correctly. Unless there's a change in that folder, those are not run in regular CI.

That being said, the CRON job did not fail so may be we have to check if this is covered or not.

https://github.com/sktime/sktime/actions/workflows/test_datasets.yml

fkiraly · 2024-05-06T10:29:11Z

Hm, looks like there is no test that would actually attempt downloads from forecastingdata? We should at least make spot checks.

achieveordie · 2024-05-06T10:50:39Z

I was not involved in this part of the codebase. It doesn't seem like there's any change in the dataset (the last change appears to be in 2020) and the previous code changes were made by @hazrulakmal almost a year ago.

I suspect that frequency='4_seconds' was never incorporated in the code or tests and was found by @benHeid when he manually called this dataset.

achieveordie · 2024-05-06T10:56:04Z

Hm, looks like there is no test that would actually attempt downloads from forecastingdata? We should at least make spot checks.

The closest test we have seems to be test_load_forecastingdata() from test_datadownload.py but that only checks for a "UnitTest" file.

There's also a test called test_check_link_downloadable() from the same file but it only checks whether the link is active. It is also marked as expected to fail in #5462.

achieveordie · 2024-05-06T11:15:07Z

The easiest solution is to add "4_seconds": "4s" in freq_map although I'm not sure if there are more such datasets that might warrant a more programmatic approach (dynamically creating freq_map based on a regex).

benHeid added bug Something isn't working module:datasets&loaders data sets and data loaders labels May 5, 2024

benHeid added this to Needs triage & validation in Bugfixing via automation May 5, 2024

benHeid changed the title ~~[BUG] load_forecastingdata with "wind_4_seconds_dataset" fails~~ [BUG] load_forecastingdata with wind_4_seconds_dataset fails May 5, 2024

fkiraly moved this from Needs triage & validation to Reproduced/confirmed in Bugfixing May 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] `load_forecastingdata` with `wind_4_seconds_dataset` fails #6388

[BUG] `load_forecastingdata` with `wind_4_seconds_dataset` fails #6388

benHeid commented May 5, 2024

fkiraly commented May 6, 2024

fkiraly commented May 6, 2024

yarnabrina commented May 6, 2024

fkiraly commented May 6, 2024

achieveordie commented May 6, 2024

achieveordie commented May 6, 2024

achieveordie commented May 6, 2024

[BUG] load_forecastingdata with wind_4_seconds_dataset fails #6388

[BUG] load_forecastingdata with wind_4_seconds_dataset fails #6388

Comments

benHeid commented May 5, 2024

fkiraly commented May 6, 2024

fkiraly commented May 6, 2024

yarnabrina commented May 6, 2024

fkiraly commented May 6, 2024

achieveordie commented May 6, 2024

achieveordie commented May 6, 2024

achieveordie commented May 6, 2024

[BUG] `load_forecastingdata` with `wind_4_seconds_dataset` fails #6388

[BUG] `load_forecastingdata` with `wind_4_seconds_dataset` fails #6388