New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Specification of xticks for matplotlib axis returned by plot_series does not work as expected #5895
Comments
I hacked the plot_series() function in sktime/utils/plotting.py in a way that fixes this specific problem. (attached |
ah, sorry, missed this - could you open a pull request with this? You can mark it as draft if incomplete or you would like a reivew first. |
I decided to look into how this was treated in other packages. I found the plot_timeseries() function in pytimetk. That function takes a parameter 'engine', which you can set to one of 'plotly', 'plotnine', 'matplotlib'. They have done a nice job. The bug I found with sktime's plot_series() does not occur in plot_timeseries(). plotnine is a port of R's ggplot2 package to python, and is built on top of matplotlib. I am wondering if it makes sense to convert plot_series() to be a wrapper to plot_timeseries at some lower level. I have to investigate this further. I may be tied up in the next few weeks, though, on other projects. |
I would tend towards not doing that, in order to keep core dependencies small and the set of important soft dependencies too - if the package were dedicated to time series ploting and a gold standard, then yes - but the various packages menioned don't seem to be that popular at the moment? Perhaps using |
Hi @fkiraly, I started working on this. I went the direction of using plotnine and, so far, have hacked a version of plot_series() that replaces the matplotlib calls by plotnine calls. I had planned to use the method ggplot.draw() which converts the plotnine object to a matlib object so that this change would not be a breaking change for existing users who manipulate the matplotlib object returned by plot_series. Unfortunately, I discovered that this doesn't really work for a few reasons (this was confirmed by the author of plotnine). This got me thinking that the pytimetk approach I mentioned above could work. Namely, add a new parameter to plot_series, named 'engine' which would default to 'matplotlib' and would introduce no change to the existing implementation when called that way. However, setting engine='plotnine' would return a plotnine object (class ggplot I think). If this approach would be acceptable to you, it would also meet my needs for another project I am working on and I will continue to work on it. Otherwise, I am not sure what to do. The bug in the current implementation relates to how the x-axis is determined (based on all the components in the plot such as the time series, the forecasts, the prediction intervals) and I think that so much effort has gone into this question at the lower level (e.g. in plotnine lower levels) there is no reason to poorly re-implement some buggy version at the level of plot_series. |
Hm, I suppose that makes sense. Having an Of course we should be very careful with changing signatures etc, as So if we can make sure the plots look very similar, and soft dependencies are mangaed well, personally I would not have too many problems with this - as long as we handle things consistently. Perhaps there is a more general discussion to be had about plotting. |
Great. Adding a named parameter with default value does not change the signature (in the sense that this will not break existing code). |
The intention is to have a couple of standardized, simple plotting functions, mostly for tutorials and simple illustratoin. When Probably current state of the ecosystem is closer to having sth like this, but it's still not perfect imo. I don't think it makes sense to start replacing other plotting packages, that's out of the scope of |
I may have a much simpler approach. My first simple test seems to validate it. As I stated yesterday, mature graphics packages know how to define the x-axis given multiple objects (e.g. curves) to plot. This functionality should be left to the graphics package (which is not the case currently in plot_series). Additionally, a great feature of pandas is its use of an index for DataFrames or Series. The index should not be ignored or replaced (which it sort of is currently in plot_series). I ripped out the use of seaborn (except for the colors) and ripped out the union of the indexes across the series, converted them to timestamp (if necessary), and used standard plt.plot() calls. Seems to work just fine in an admittedly limited test. Will now work on a more complete version. |
hm, interesting. It is very possible that things have changed over the last 5 years in the upstream dependencies. |
I have completed a replacement version of plotting.py in sktime.utils that seems to work. I did not touch the function plot_windows() as this seems to be deprecated. It is not referred to anywhere in the documentation. (In fact, the latest documentation defines its own version as part of an example.) I also did not touch the function plot_calibration() which uses plotting methods from pandas. I believe I need to do a pull request. Some guidance would be appreciated. Thanks |
Great! How about you open the pull request with your current state of the code base, and we take it from there? |
Pull request opened |
plot_series does some internal manipulations that seems to confuse matplotlib.dates utilities such as YearLocator()
The first code block below creates a plot that puts an xtick and xticklabel at every Jan. 1. YearLocator() finds these points. It works.
The code calls plot() and scatter() directly. In the succeeding code block the plotting is done with plot_series(). After that the YearLocator() function does not work as expected.
The text was updated successfully, but these errors were encountered: