Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOC] Feedback thread on tutorial notebooks #1447

Open
fkiraly opened this issue Sep 24, 2021 · 34 comments
Open

[DOC] Feedback thread on tutorial notebooks #1447

fkiraly opened this issue Sep 24, 2021 · 34 comments
Labels
documentation Documentation & tutorials good first issue Good for newcomers

Comments

@fkiraly
Copy link
Collaborator

fkiraly commented Sep 24, 2021

This issue is for collecting feedback on the tutorial notebooks.

Any feedback is highly appreciated, positive or critical; can be high-level (e.g., too long where, helpful why, confusing how), or concrete, e.g., content that should be added, changed, removed, shortened, structured differently, etc.

@fkiraly
Copy link
Collaborator Author

fkiraly commented Sep 24, 2021

from #1446, we ought to include sections on tags and estimator lookup in all tutorials;
we ought to include a section on multivariate forecasting in the forecasting tutorial

@fkiraly fkiraly added the good first issue Good for newcomers label Oct 8, 2021
@fkiraly fkiraly pinned this issue Oct 8, 2021
@fkiraly
Copy link
Collaborator Author

fkiraly commented Jan 8, 2022

@gepitis, did you want to post some feedback on the tutorial notebooks? We've opened this just for you! (well, and also others who may like to provide such feedback, but you triggered the issue)

@shubhamkarande13
Copy link
Contributor

Hi all!
I have gone through the first Forecasting notebook which is intended for people new to forecasting.
One suggestion for the notebooks; Is it possible to add a small definition or glossary section for easy reference of terms that a beginner like me would find difficult to understand?

For example: terms like forecasting horizon, prediction intervals etc.

@fkiraly
Copy link
Collaborator Author

fkiraly commented Feb 16, 2022

For example: terms like forecasting horizon, prediction intervals etc.

Great idea, @shubhamkarande13!

There is already a glossary in the sktime docs, but it´s a bit incomplete
https://www.sktime.org/en/latest/glossary.html
and non-formal.

I was working on a sci reference, but got sidetracked by maintenance efforts.

@claudia-hm
Copy link

Hello!
I've just gone through the Loading and working with data in sktime tutorial and I thought that I could provide some feedback. In overall, the tutorial is easy-to read and provides many examples of transforming time series datasets to different typical formats. Some comments:

  • All examples have integer time-index, and very often time series come with datetime format. I would appreciate an explanation about what date formats .ts files accept and an example.
  • I believe this tutorial would be a great place where to introduce the sktime built-in datasets, and the load_<dataset> methods, which I found very useful.

Regards!

@shubhamkarande13
Copy link
Contributor

For example: terms like forecasting horizon, prediction intervals etc.

Great idea, @shubhamkarande13!

There is already a glossary in the sktime docs, but it´s a bit incomplete https://www.sktime.org/en/latest/glossary.html and non-formal.

I was working on a sci reference, but got sidetracked by maintenance efforts.

@fkiraly I would like to contribute by adding relevant terms to the glossary and learn in the process! Please let me know how we can move forward!

@fkiraly
Copy link
Collaborator Author

fkiraly commented Feb 23, 2022

@claudia-hm, thanks for the feedback!

I just looked and I feel the tutorials are outdated, and there is a duplication with "user guide".
For instance, the "data formats" explained in the tutorial are old, see a newer notebook here:
https://github.com/alan-turing-institute/sktime/blob/main/examples/AA_datatypes_and_datasets.ipynb
Let me open an issue about this!

@fkiraly
Copy link
Collaborator Author

fkiraly commented Feb 23, 2022

@claudia-hm, I've just reviewed the three different places where there are notebooks or user guides...
There are actually tree places 😱 how did this happen.

Anyway, I've suggested a plan how this could be improved, nice contribution opportunity: #2127

@fkiraly
Copy link
Collaborator Author

fkiraly commented Feb 23, 2022

@shubhamkarande13, thanks!

@fkiraly I would like to contribute by adding relevant terms to the glossary and learn in the process! Please let me know how we can move forward!

May I suggest a very similar alternative? The tutorials and user guides are a bit of a mess, see #2127, would you like to start moving them to a clean state, we could then increase the level of content over splicing/merging, up to writing the missing tutorials?

@h-t-w
Copy link

h-t-w commented Apr 8, 2022

Hi.

I am currently working through the forecasting tutorial. Quick observations:

NOTE: at current time (v0.9x), forecasting of multivariate time series is a stable functionality, but not covered in this tutorial. Contributions to extend the tutorial are welcome.

NOTE: if your favourite format is not properly converted or coerced, kindly consider to contribute that functionality to sktime.

This being said, the first note is also most likely not correct anymore, given that there is a section on multivariate ts.

  • Referenced sktime version. In serveral places there are references to v.0.9x. Given that the current version is > 0.9 it is unclear to me if the mentioned limitations/issues are still present are not. I understand that documentation has the tendency of lacking behind other development tasks, but if the tutorial is meant for new users, than it should reflect the capabilities and quirks of the latest version, because this is generally the one people will install.

Best

Edit:
linked issues
clarification comment on Notes
version numbers in tutorial

@fkiraly
Copy link
Collaborator Author

fkiraly commented May 9, 2022

@h-t-w, thanks for your feedback!

I've updated the forecaster tutorial here: #2620

Feedback on the updated notebook would be appreciated.

@hilalgenc
Copy link

hilalgenc commented Jul 2, 2022

Hello,

I read through the Loading data into sktime tutorial, and I have a few suggestions.

  1. Replace A .ts file include two main parts: * header information * data with A .ts file includes two main parts: 1) header information and 2) data.
  2. This one is a question regarding instances. What is the maximum number of dimensions?
  3. The section "Representing data with .ts files" talks about the differences between a @timestamps label of True and False. A possible update would be to add headings (i.e. @timestamps=True and @timestamps=False) with the relevant description for each condition below its respective heading.
  4. As far as I understand, the word obser refers to observation.

Best regards

@fkiraly
Copy link
Collaborator Author

fkiraly commented Jul 2, 2022

Thanks, @hilal-g!

@CadePGCM
Copy link

Can you add a notebook that shows the functionality/use of MultiplexTransformer, OptionalPassThrough, TransformerPipeline, FeatureUnion, YtoX, etc.

Something exploring the space of transformation unions / compositions would be extremely helpful. This also might make sense as general functions to add to the library (union all transformations in the library), or some kind of (greedy/genetic?) algorithm to span the combinations space.

@fkiraly
Copy link
Collaborator Author

fkiraly commented Aug 12, 2022

Can you add a notebook that shows the functionality/use of MultiplexTransformer, OptionalPassThrough, TransformerPipeline, FeatureUnion, YtoX, etc.

Yes! I was working on it:
#1705
but there is always something more urgent - bugs, releases, docstrings, admin, etc... :-(

Help is appreciated (search for "good first issues" 😄 ) - anything that takes pressure off the various other places makes it more likely that we write nice notebooks.

Meanwhile, the docstrings should, hopefully, be helpful.

Something exploring the space of transformation unions / compositions would be extremely helpful.

You can already use things like the ForecastingGridSearch and the MultiplexForecaster to do AutoML in the space, see forecasting tutorial.

@fkiraly
Copy link
Collaborator Author

fkiraly commented Sep 11, 2022

@CadePGCM, good news for you!

We are writing a transformers & pipelines tutorial together with @miraep8 and the pywatts team (@benHeid, @kalebphipps, @SMEISEN) in preparation for a potential presentation at pydata.
Notebook and video recording should be ready by Christmas 😄

And thanks for the suggestion, this is frequently requested indeed!

@janjbat
Copy link

janjbat commented Jul 28, 2023

Hello,

I am currently working on a forecasting task where I am interested to run tuning of hyperparameters and backtesting simultaneously. I might have overlooked it, but could not find anything similar in the docs or tutorials, so far.
The main idea is to do a weekly hyperparamter tuning (taking past 12 weeks data and spiting it into training (11 weeks) and test/validation (1week) sets) and use the tuned model to produce daily forecasts for the whole next week by refitting the already tuned model each day. At the end of the next week repeat the process again. It would be much appreciated if you could suggest or provide examples how backtesting can be implemented with sktime for such use cases?

Thank you and best regards,

B

@3lle4
Copy link

3lle4 commented Nov 15, 2023

Hello!
I'm doing some of your tutorials and I found a small bug in 02c_classification_multivariate_inceptiontime:
In the current version, the last cell does not work. It seems that the parameter name nb_filters of the InceptionTimeClassifier() has been changed to n_filters.

Best

@fkiraly
Copy link
Collaborator Author

fkiraly commented Nov 17, 2023

I'm doing some of your tutorials and I found a small bug

Thanks for reporting. It's odd that this did not get caught by the tests, afaik they run all notebook cells. We'll look into this.

@jmwhyte
Copy link

jmwhyte commented Dec 3, 2023

Following some of my recent Discord comments on panel data and time-series classification, I've prepared the following file. I'm a novice Python programmer, so it might be a bit flabby, but there are two points that are probably of interest to other novices.
AA_JW_issues(1).ipynb.zip

@jmwhyte
Copy link

jmwhyte commented Jan 16, 2024

With so many functions available for time-series classification, it would be good to have some more guidance, say in the introductory text at the top of help pages. Here's one good example of a useful tip, from the page for ComposableTimeSeriesForestClassifier:
"Parameters:

estimator: Pipeline 
A pipeline consisting of series-to-tabular transformations and a decision tree classifier as final estimator. 

"
We know that there is a section on "series-to-tabular transformations", so we might realise that this is the place to start reading.

Compare this with something like the page for TimeSeriersForestClassifier:
"A time series forest is an ensemble of decision trees built on random intervals. Overview: Input n series length m."

If we have panel data, we can maybe work out that we need to combine our multivariate data for each instance into a univariate series. So, we can use ColumnConcatenator. But, is there anything else we can do? How would we work out the valid choices of transformers to pipeline together? As we try to extend our knowledge from sklearn methods to sktime, do we need to be careful to use a sktime pipeline, rather than a sklearn one?

I'm sure that once people gain more experience, this is easier to work out. But at the start, with so many different approaches (and other issues around reproducibility, see other comments), there's quite a knowledge gap to bridge just getting started.

I suspect it might be possible to give other guidelines on which conditions are suitable for certain choices of transformer/classifier? Maybe some choices are better depending on whether you have thousands of data points or just hundreds (and find that some transformers produce NaN that stop the fitting process)?

@fkiraly
Copy link
Collaborator Author

fkiraly commented Jan 16, 2024

@jmwhyte, are you thinking of sth like this

https://scikit-learn.org/stable/tutorial/machine_learning_map/index.html

just for sktime?

@fkiraly
Copy link
Collaborator Author

fkiraly commented Jan 16, 2024

FYI @marrov - similar to what you have suggested as well

@jmwhyte
Copy link

jmwhyte commented Jan 16, 2024

Something like that sklearn diagram would be a good start, but I am sure there is also benefit in more detail at the function level.

@madhuri723
Copy link

Hi,
When I was learning from the tutorial, I found ForecastingHorizon a bit confusing, especially the 'is_relative' part. Other forecasting tools usually need just one thing, but this one was different. It took me a bit to figure out that 'is_relative' is there to make things easier. I suggest making the explanation of 'is_relative' clearer so that it's easier for everyone to understand.

@fkiraly
Copy link
Collaborator Author

fkiraly commented Mar 12, 2024

thanks, useful feedback, @madhuri723. Let's see if we can improve the explanation.

@madhuri723
Copy link

@fkiraly I can take up this work.

@fkiraly
Copy link
Collaborator Author

fkiraly commented Mar 13, 2024

@madhuri723, thanks! Please go ahead!

@Prakruthi12345
Copy link

Hi, I have gone over the introductory notebook (00_sktime_intro), and it was very informative! I have come up with a couple of suggestions for the notebook and have included them in this document: https://docs.google.com/document/d/1w4lI7m5kWupSZCipr7N7HRPGbNimKAWCnzcdTvnVni4/edit

Thanks, and please let me know what you think!

@fkiraly
Copy link
Collaborator Author

fkiraly commented Mar 28, 2024

@Prakruthi12345, nice!

Could you please move these suggestions to a GitHub issue?

@Prakruthi12345
Copy link

@fkiraly will do, thanks!

@iamSathishR
Copy link
Contributor

iamSathishR commented Apr 4, 2024

Hi there! The code snippet (of 2.2.4 Time Series Classification - simple evaluation vignette) seems that the import statement for KNeighborsTimeSeriesClassifier (from sktime.classification.distance_based import KNeighborsTimeSeriesClassifier) is duplicated, appearing both in line 1 & line 9. Suggesting to remove the redundant (line 9) for any possible confusion

@fkiraly
Copy link
Collaborator Author

fkiraly commented Apr 5, 2024

thanks! Would you like to make a PR?
Otherwise we can do it later too.

@iamSathishR
Copy link
Contributor

thanks! Would you like to make a PR? Otherwise we can do it later too.

Sure!! I would do a PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Documentation & tutorials good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests