Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Adding ADI/CV Feature Extractor #6336

Merged
merged 20 commits into from May 19, 2024

Conversation

shlok191
Copy link
Contributor

@shlok191 shlok191 commented Apr 26, 2024

Reference Issues/PRs

Fixes #6286. See also #6279 for more information about the original request!

What does this implement/fix? Explain your changes.

This PR implements a feature extractor that has the capability to process time series data representing
demand over time into one of 4 categories (smooth, intermittent, erratic, lumpy) based on the guidelines
detailed in the paper: "The accuracy of Intermittent Demand Estimates" by J. Boylan, A. Syntetos.

Does your contribution introduce a new dependency? If yes, which one?

The PR only requires pandas, which I do not believe will be a new dependency!

What should a reviewer concentrate their feedback on?

This is my first time implementing a transformer, so I would really appreciate any feedback on how I specifically am processing my input and output types and if the way I calculate the class labels is the way someone with more experience would do it!

Did you add any tests for the change?

Yes, I added 3 test parameters with their own ADI and CV threshold values to test how varying thresholds
can impact classification. I also set some thresholds to 0.0 to see how that might impact the labels given!

Requests for next steps

I am really hoping to test out this estimator locally (I am sure I must have some mistakes right now) so I would really appreciate some advice on some test data I could potentially try out the estimator on and if there might be a recommended approach to testing the new feature.

This would also tie to the remaining items in my PR checklist, and will help me complete this issue!

PR checklist

For all contributions
  • I've added myself to the list of contributors with any new badges I've earned :-)
  • Optionally, for added estimators: I've added myself and possibly to the maintainers tag - do this if you want to become the owner or maintainer of an estimator you added.
  • The PR title starts with either [ENH], [MNT], [DOC], or [BUG]. [BUG] - bugfix, [MNT] - CI, test framework, [ENH] - adding or improving code, [DOC] - writing or improving documentation or docstrings.
For new estimators
  • I've added the estimator to the API reference - in docs/source/api_reference/taskname.rst, follow the pattern.
  • I've added one or more illustrative usage examples to the docstring, in a pydocstyle compliant Examples section.
  • If the estimator relies on a soft dependency, I've set the python_dependencies tag and ensured
    dependency isolation, see the estimator dependencies guide.

Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice! This is almost ready, there are small but blocking issues:

  • this is a transformer, so it should be in the transformations module. I would suggest transformations.series.adi_cv as a file to put it
  • for testing, we need to add tests, since the test suite data currently does not generate expressly data in the four classes. It should not be too difficult to make synthetic data that fits in either of the four categories. For consistency, I would suggest a new file transformations.series.tests.test_adi_cv.
  • I would make features the first parameter, as that is probably the most "interesting" for users
  • does the ADI value in your formula have the fraction the wrong way round?
  • the changes to the data loaders seem unrelated but potentially valid - can you kindly make these in a different PR, and describe what these are about?

@fkiraly
Copy link
Collaborator

fkiraly commented Apr 26, 2024

Just to help concretely with synthetic generation:

  • for high CV2, sampling from normal and squaring should work
  • for low CV2, constant or almost constant should work
  • for high intermittency, set many values to 0

@shlok191
Copy link
Contributor Author

@fkiraly, I just pushed a new commit that addresses the suggestions! I had some points that I wanted to quickly discuss:

  1. I have updated the location of the transformer to the appropriate location: sktime/transformations/series

  2. I have also added tests! I have added a file sktime/transformations/series/tests/test_adi_cv.py that generates time Series of all 4 formats. For maintaining low variance, I set standard deviation to 0.25 to the np.random.normal() function, I hope that is okay!` Also, for high variance, I set standard deviation to 1.0 and square the generated values like you suggested

  3. I have made features the first parameter for the transformer's initialization now, I do agree that it will likely be more interesting for users. 😄

  4. I have reversed the numerator and denominator for my ADI calculations

  5. I accidentally reverted the merge from [BUG] Fix tsf data error log and make it more precise #6258. I had made my first commit from VS Code which did not show me the code quality tests. So, I attempted to revert my commit and make them from the command prompt but I accidentally reverted back the [BUG] Fix tsf data error log and make it more precise #6258 commit as well. Sorry about that, and I will fix it in the next commit when I add examples and API documentation!

Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks! Great addition!

  • the changes to the datasets module are stlil present. If git behaviour is confusing, you can copy the files from main over the modified files and commit. PRs are squashed, so the individual commits around those files will not be visible.
  • your get_test_params seems non-compliant. If you add a new estimator, please always check that check_estimator runs.

@shlok191
Copy link
Contributor Author

Thank you! I've made a new commit that adds in the reverted commit. Also, I believe I have fixed the error originating from the get_test_params() function. I really hope that it works now!

Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Some comments:

  • please do not delete other people from the "all contributors" file
  • you are still making changes to the data loaders file. Kindly move these to a separate PR.
  • your test test_adi_cv is not runing - I would suggest you try running it locally on your computer before you push
  • the failure related to get_params is genuine. Parameters of the estimator should be written to self in __init__ and not changed afterwards.

@shlok191
Copy link
Contributor Author

shlok191 commented May 4, 2024

@fkiraly, I have some updates!

  1. Sorry about the changes to the contributors file. I think I'm doing something wrong with how the PR merges with main which is leading to this issue with certain files. However, I copied all files from main this time. I still see some changes in .all-contributorsrc which I did not make, but I'll try to resolve them if they are still different from main.

  2. I have worked out the changes in test_adi_cv and I tested it locally before pushing it. The test seem to be running okay locally at least!

  3. Yes, I'm starting to work on the get_params issue as well. I'm hoping that once I can get the tests working I'll start making progress on that end

  4. Could we possibly get on a call sometime to discuss adding test parameters? I'm struggling to think of some good ones but I think talking this through with you will be helpful for me to understand how I can find good edge cases! 😃

EDIT: From the CI, it seems that the tests are sporadic and only work sometimes...I'll look further into this!

@fkiraly
Copy link
Collaborator

fkiraly commented May 4, 2024

Great! If you have any question about resolving the get_params issue, feel free to follow up.

@shlok191
Copy link
Contributor Author

shlok191 commented May 6, 2024

@fkiraly, I believe that my code's working better now!

For my test data generation, I increased the variance associated with lumpy / erratic Series to assist with the test cases. I have also made sure not to modify any of the variables from the __init__ function of the transformer which has helped with the get_test_params related errors.

I'll start work on the final items in the checklists!

@fkiraly fkiraly added enhancement Adding new functionality module:transformations transformations module: time series transformation, feature extraction, pre-/post-processing labels May 7, 2024
Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very nice!

This could be merged, I am just requesting improved documentation.

  • docstring should mention the standard thresholds in the preamble.
  • docstring should be clearer on the return of transform, e.g., what is in the columns
  • the paper should be cited with its full reference in a References section, see other estimators that do that
  • the estimator should be added to the API reference, docs/source/api_reference/transformations.rst, I would add it in the "Summarization" section of Series-to-Features transformers

@shlok191
Copy link
Contributor Author

shlok191 commented May 7, 2024

Got it! I'll make the changes and push them soon! :)

@shlok191
Copy link
Contributor Author

shlok191 commented May 9, 2024

@fkiraly, I just added in the required changes for the documentation, included the research paper's citation and added the changes to the docstrings like you requested!

Update: I understand that there are some failing checks but I am not sure if they are stemming from the ADICVTransformer. I am not sure of the reason for these...have you seen them before?

Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that there are some failing checks but I am not sure if they are stemming from the ADICVTransformer. I am not sure of the reason for these...have you seen them before?

These are probably related to the transformer, what it means is that somewhere in the file you used characters that cannot be parsed by python.

I suspect the apostrophes in the citation you added, and that removing them will fix the issue.

fkiraly
fkiraly previously approved these changes May 10, 2024
Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixed it

@shlok191
Copy link
Contributor Author

shlok191 commented May 10, 2024

@fkiraly, thanks! Closing the PR now.

@shlok191 shlok191 closed this May 10, 2024
@fkiraly
Copy link
Collaborator

fkiraly commented May 10, 2024

closed by mistake, should get merged

@fkiraly fkiraly reopened this May 10, 2024
fkiraly
fkiraly previously approved these changes May 11, 2024
Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving again.

Also added a full description in the docstring, and fixed an error with the "minus one" in the computation of the ADI.

@shlok191
Copy link
Contributor Author

@fkiraly, I have added a very rough first commit for the compositor discussed in #6279. I have implemented the _fit and _predict functions and am going to try and make progress on additional functions.

Could you please let me know if my approach for this problem seems okay? Also, please let me know if you'd like me to work in a new branch. I can try and make a fork from the primitives_transformer branch or possibly make a branch off main again. Thank you!

@fkiraly fkiraly merged commit 76725c5 into sktime:main May 19, 2024
1 of 2 checks passed
@fkiraly
Copy link
Collaborator

fkiraly commented May 19, 2024

@fkiraly, I have added a very rough first commit for the compositor discussed in #6279. I have implemented the _fit and _predict functions and am going to try and make progress on additional functions.

Could you please let me know if my approach for this problem seems okay? Also, please let me know if you'd like me to work in a new branch. I can try and make a fork from the primitives_transformer branch or possibly make a branch off main again. Thank you!

@shlok191, that's great!

Yes, it would be great if you could work on this on a new branch. To make things easier, I have merged the parts with the ADI/CV transformer.

If you do not update from the branch, you should be able to continue on it.
If you do update, you may have to revert my last commit, to get your changes back.

@fkiraly
Copy link
Collaborator

fkiraly commented May 19, 2024

Regarding the approach - this is nice, but I was imagining it more general.

That is, I thought it could take any series-to-primitives transformer, including the ADI/CV transformer - the transformer should be also an argument, assume it produces categories out of the time series in a column, first. ADI/CV could be a default though.

@shlok191
Copy link
Contributor Author

Thank you so much for the merge, I'll make a new branch and copy over any new changes I made 🙂
Also, got it. I can definitely make changes to make this more general. I'll get to the updates!

@shlok191 shlok191 deleted the primitives_transformer branch May 19, 2024 22:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Adding new functionality module:transformations transformations module: time series transformation, feature extraction, pre-/post-processing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ENH] Syntetos/Boylan ADI/CV feature extractor for different types of demand (intermittent etc)
3 participants