New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Syntetos/Boylan ADI/CV feature extractor for different types of demand (intermittent etc) #6286
Comments
@fkiraly, I am trying to get a deeper understanding of time series, and I would love to work on this enhancement. If it is okay, could I take a crack at this? Thank you so much! |
Absolutely! That's what good first issues are for! Let us know if you need any help with the "new estimator" guide, or if you have suggestions for improvement. |
do you need any help, @shlok191? Happy to review a draft PR if you have partial code |
@fkiraly, I am so sorry about the delay! I have some midterms this week and the prior which took up all of my time! Would it be okay if I could make a PR in a couple of days? |
sure, take your time, there's no rush! Just wanted to make sure you're not stuck somewhere. |
Thank you so much! I'll come back with an update soon and communicate if I run into any road-blocks :) |
@fkiraly, I'm sorry about the delay, I just got done with my final exams! I've made a first PR related to this and I'll make sure to complete this by this week. I've got all the free time now! 😄 |
great! I'm sure @ggjx22 is looking forward to it! |
#### Reference Issues/PRs Fixes #6286. See also #6279 for more information about the original request! #### What does this implement/fix? Explain your changes. This PR implements a feature extractor that has the capability to process time series data representing demand over time into one of 4 categories (smooth, intermittent, erratic, lumpy) based on the guidelines detailed in the paper: **"The accuracy of Intermittent Demand Estimates" by J. Boylan, A. Syntetos.** #### Did you add any tests for the change? Yes, I added 3 test parameters with their own `ADI` and `CV` threshold values to test how varying thresholds can impact classification. I also set some thresholds to 0.0 to see how that might impact the labels given!
Derived feature request from discussion wiht @ggjx22 in #6279.
The request is to implement the Syntetos/Boylan expert classification of time series, from Syntetos/Boylan (2005), The accuracy of
intermittent demand estimates, IJF.
Good first issue, should be simple to implement, so no need to interface from anywhere - recipe is here: https://www.sktime.net/en/latest/developer_guide/add_estimators.html
I would specify the estimator as follows:
Type
Series-to-primitives transformer. Per-instance.
Parameters
"adi_threshold"
,"cv_threshold"
, default values are 1.32, 0.49, as in the paper.None
, List of str, must contain"adi", "cv2", "class"
.Behaviour
Computes three features or a subset thereof, as columns of the return of
transform
:adi
- average demand interval. This is the same as last index minus first index, divided by number of non-zero values minus one. For time like indinces, the unit should be in number of periods. Not sure what to do for non-periodic - iffreq
is unavailable, I would just drop the index.adi
as simply the fraction of non-zero values. Afaik that is not accurate in comparison to the original reference, the "minus one" does not cancel.cv2
- this is just variance/(mean squared), but taken on the sample of values that are non-zero, in the series. The reference uses the biased estimator for variance, i.e., divide by number of values (not minus one)class
- derived class, string column, depending on whetheradi <= adi_threshold
andcv <= cv_threshold
. Yes/yes is called"smooth"
, yes/no"erratic"
, no/yes"intermittent"
, no/no"lumpy"
, by the authors.The text was updated successfully, but these errors were encountered: