Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Syntetos/Boylan ADI/CV feature extractor for different types of demand (intermittent etc) #6286

Closed
fkiraly opened this issue Apr 11, 2024 · 8 comments · Fixed by #6336
Labels
enhancement Adding new functionality good first issue Good for newcomers implementing algorithms Implementing algorithms, estimators, objects native to sktime module:transformations transformations module: time series transformation, feature extraction, pre-/post-processing

Comments

@fkiraly
Copy link
Collaborator

fkiraly commented Apr 11, 2024

Derived feature request from discussion wiht @ggjx22 in #6279.

The request is to implement the Syntetos/Boylan expert classification of time series, from Syntetos/Boylan (2005), The accuracy of
intermittent demand estimates, IJF.

Good first issue, should be simple to implement, so no need to interface from anywhere - recipe is here: https://www.sktime.net/en/latest/developer_guide/add_estimators.html

I would specify the estimator as follows:

Type

Series-to-primitives transformer. Per-instance.

Parameters

  • optional parameters "adi_threshold", "cv_threshold", default values are 1.32, 0.49, as in the paper.
  • optional parameter features, by default all parameters are computed. If not None, List of str, must contain "adi", "cv2", "class".

Behaviour

Computes three features or a subset thereof, as columns of the return of transform:

  • adi - average demand interval. This is the same as last index minus first index, divided by number of non-zero values minus one. For time like indinces, the unit should be in number of periods. Not sure what to do for non-periodic - if freq is unavailable, I would just drop the index.
    • there are some random references on the internet, which give adi as simply the fraction of non-zero values. Afaik that is not accurate in comparison to the original reference, the "minus one" does not cancel.
  • cv2 - this is just variance/(mean squared), but taken on the sample of values that are non-zero, in the series. The reference uses the biased estimator for variance, i.e., divide by number of values (not minus one)
  • class - derived class, string column, depending on whether adi <= adi_threshold and cv <= cv_threshold. Yes/yes is called "smooth", yes/no "erratic", no/yes "intermittent", no/no "lumpy", by the authors.
@fkiraly fkiraly added implementing algorithms Implementing algorithms, estimators, objects native to sktime module:transformations transformations module: time series transformation, feature extraction, pre-/post-processing enhancement Adding new functionality good first issue Good for newcomers labels Apr 11, 2024
@shlok191
Copy link
Contributor

shlok191 commented Apr 12, 2024

@fkiraly, I am trying to get a deeper understanding of time series, and I would love to work on this enhancement. If it is okay, could I take a crack at this? Thank you so much!

@fkiraly
Copy link
Collaborator Author

fkiraly commented Apr 12, 2024

Absolutely! That's what good first issues are for!

Let us know if you need any help with the "new estimator" guide, or if you have suggestions for improvement.

@fkiraly
Copy link
Collaborator Author

fkiraly commented Apr 16, 2024

do you need any help, @shlok191? Happy to review a draft PR if you have partial code

@shlok191
Copy link
Contributor

@fkiraly, I am so sorry about the delay! I have some midterms this week and the prior which took up all of my time! Would it be okay if I could make a PR in a couple of days?

@fkiraly
Copy link
Collaborator Author

fkiraly commented Apr 16, 2024

sure, take your time, there's no rush!

Just wanted to make sure you're not stuck somewhere.
I only wanted to check if you need help.

@shlok191
Copy link
Contributor

Thank you so much! I'll come back with an update soon and communicate if I run into any road-blocks :)

@shlok191
Copy link
Contributor

shlok191 commented Apr 25, 2024

@fkiraly, I'm sorry about the delay, I just got done with my final exams! I've made a first PR related to this and I'll make sure to complete this by this week. I've got all the free time now! 😄

@fkiraly
Copy link
Collaborator Author

fkiraly commented Apr 25, 2024

great! I'm sure @ggjx22 is looking forward to it!

fkiraly pushed a commit that referenced this issue May 19, 2024
#### Reference Issues/PRs
Fixes #6286. See also #6279 for more information about the original
request!

#### What does this implement/fix? Explain your changes.
This PR implements a feature extractor that has the capability to
process time series data representing
demand over time into one of 4 categories (smooth, intermittent,
erratic, lumpy) based on the guidelines
detailed in the paper: **"The accuracy of Intermittent Demand Estimates"
by J. Boylan, A. Syntetos.**
 
#### Did you add any tests for the change?

Yes, I added 3 test parameters with their own `ADI` and `CV` threshold
values to test how varying thresholds
can impact classification. I also set some thresholds to 0.0 to see how
that might impact the labels given!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Adding new functionality good first issue Good for newcomers implementing algorithms Implementing algorithms, estimators, objects native to sktime module:transformations transformations module: time series transformation, feature extraction, pre-/post-processing
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants