Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] HolidayFeatures crashes if dataframe doesn't contain specified date #6399

Closed
guimalo opened this issue May 8, 2024 · 4 comments
Closed
Labels
bug Something isn't working module:transformations transformations module: time series transformation, feature extraction, pre-/post-processing
Projects

Comments

@guimalo
Copy link

guimalo commented May 8, 2024

The following code:

from sktime.transformations.series.holiday import HolidayFeatures
from datetime import date

calendar = {
        date(2024, 12, 25): 'Natal',
        date(2023, 12, 25): 'Natal',
        date(2022, 12, 25): 'Natal',
        date(2021, 12, 25): 'Natal',
        date(2024, 11, 29): 'Black Friday',
        date(2023, 11, 29): 'Black Friday',
        date(2022, 11, 29): 'Black Friday',
        date(2021, 11, 29): 'Black Friday',
        date(2024, 10, 1): 'Dia Internacional do Café',
        date(2023, 10, 1): 'Dia Internacional do Café',
        date(2022, 10, 1): 'Dia Internacional do Café',
        date(2021, 10, 1): 'Dia Internacional do Café',
        date(2024, 5, 12): 'Dia das Mães',
        date(2023, 5, 12): 'Dia das Mães',
        date(2022, 5, 12): 'Dia das Mães',
        date(2021, 5, 12): 'Dia das Mães',
        date(2024, 8, 11): 'Dia dos Pais',
        date(2023, 8, 11): 'Dia dos Pais',
        date(2022, 8, 11): 'Dia dos Pais',
        date(2021, 8, 11): 'Dia dos Pais',
        date(2024, 3, 15): 'Semana do Consumidor',
        date(2023, 3, 15): 'Semana do Consumidor',
        date(2022, 3, 15): 'Semana do Consumidor',
        date(2021, 3, 15): 'Semana do Consumidor',
    }
holiday_transformer = HolidayFeatures(calendar=calendar, holiday_windows = {
        "Natal": (5, 2),
        "Black Friday": (5, 2),
        "Dia Internacional do Café": (5, 2),
        "Dia das Mães": (5, 2),
        "Dia dos Pais": (5, 2),
        "Semana do Consumidor": (0, 6)
        }
)

import pandas as pd
ix = pd.date_range("2022-01-01", end="2022-05-31")
X = pd.Series(14, index=ix)
holiday_transformer.fit_transform(X)

Gives the error: ValueError: holiday: Natal not found in calendar.

Upon describing this issue on the discord, @fkiraly suggested that it might be related to the fact that X does not contain one of the holidays in calendar.

@VyomkeshVyas could you kindly look into it? It may be related with functionalities that you wrote.

Thank you.

@guimalo guimalo added the bug Something isn't working label May 8, 2024
@fkiraly fkiraly added the module:transformations transformations module: time series transformation, feature extraction, pre-/post-processing label May 8, 2024
@fkiraly
Copy link
Collaborator

fkiraly commented May 8, 2024

made minor edits to include the fully reproducible example, hope that is ok.

Confirmed on current main, windows, python 3.11

@fkiraly fkiraly added this to Needs triage & validation in Bugfixing via automation May 8, 2024
@fkiraly fkiraly moved this from Needs triage & validation to Reproduced/confirmed in Bugfixing May 8, 2024
@VyomkeshVyas
Copy link
Contributor

@guimalo As per the current logic, "holidays" present in the "holiday_windows" argument is expected to be present in the index of test dataframe as well, otherwise ValueError is raised.

@fkiraly
Copy link
Collaborator

fkiraly commented May 10, 2024

@VyomkeshVyas, thanks for clarifying!

I would still consider this a "bug" in the sense that it is not obvious that the estimator will break (e.g., from docstring), and when it breaks it does not give clear feedback.

It is also not expected, as the holiday_windows could in-principle intersect with the observed period, even if the event itself is not there.

I do think, hence, that the optimal state is:

  • allow events outside the observed period - otherwise you will have to update the calendar continuously, and the estimator will fail for naive windowing or backtesting (!), because some events end up outside windows or folds
    • a naive solution would be just to subset the calendar
  • ensure that holiday_windows intersecting with observed data are properly treated
    • that might be more difficult, especially for non-contiguous dates. A simple solution might be to "grow" the holiday dates by their windows, then check if intersections come up empty

The above is my "optimal" resolution, the "minimal" would be a clear error message that explains to the user what they have to do. I would still consider that quite sub-optimal because the calendar has to vary with the data.

@guimalo guimalo closed this as completed May 13, 2024
Bugfixing automation moved this from Reproduced/confirmed to Fixed/resolved May 13, 2024
@fnhirwa
Copy link
Contributor

fnhirwa commented May 16, 2024

I am going to start looking into this item if no one is currently working on it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working module:transformations transformations module: time series transformation, feature extraction, pre-/post-processing
Projects
Bugfixing
Fixed/resolved
Development

No branches or pull requests

4 participants