Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Specification of xticks for matplotlib axis returned by plot_series does not work as expected #5895

Open
ericjb opened this issue Feb 5, 2024 · 13 comments · May be fixed by #6404 or #6416
Open
Labels
bug Something isn't working module:plotting&utilities utilities including plottinng
Projects

Comments

@ericjb
Copy link

ericjb commented Feb 5, 2024

plot_series does some internal manipulations that seems to confuse matplotlib.dates utilities such as YearLocator()

The first code block below creates a plot that puts an xtick and xticklabel at every Jan. 1. YearLocator() finds these points. It works.
The code calls plot() and scatter() directly. In the succeeding code block the plotting is done with plot_series(). After that the YearLocator() function does not work as expected.

import matplotlib.dates as mdates
y = load_airline()
y.index = y.index.to_timestamp()
fig1b, ax1b = plt.subplots()
ax1b.plot(y, label="all")
y_jan = y[y.index.month == 1]
plt.scatter(y_jan.index, y_jan.values, label="Jan", marker="o")
plt.legend()
years = mdates.YearLocator()   # every year
ax1b.xaxis.set_major_locator(years)
ax1b.xaxis.set_major_formatter(mdates.DateFormatter('%Y'))
plt.show()

import matplotlib.dates as mdates
fig4, ax4 = plt.subplots()
ax4 = plot_series(y, y_jan, labels=["all", "Jan"], markers=["", "o"], ax=ax4)
plt.legend()
years = mdates.YearLocator()   # every year
ax4.xaxis.set_major_locator(years)
ax4.xaxis.set_major_formatter(mdates.DateFormatter('%Y'))
plt.xticks(rotation=90)
plt.show()

@ericjb ericjb added the bug Something isn't working label Feb 5, 2024
@fkiraly fkiraly added the module:plotting&utilities utilities including plottinng label Feb 5, 2024
@fkiraly fkiraly added this to Needs triage & validation in Bugfixing via automation Feb 5, 2024
@ericjb
Copy link
Author

ericjb commented Feb 8, 2024

I hacked the plot_series() function in sktime/utils/plotting.py in a way that fixes this specific problem. (attached
plotting.py.gz
) Someone more experienced with matplotlib should review and improve it. The main point is that plot_series() was not following the matplotlib convention of using date2num and num2date from matplotlib.dates. My version follows this convention (in some cases) and thus standard matplotlib functions - as in my MWE above - should now work as expected.

@fkiraly
Copy link
Collaborator

fkiraly commented Feb 15, 2024

ah, sorry, missed this - could you open a pull request with this? You can mark it as draft if incomplete or you would like a reivew first.

@fkiraly fkiraly moved this from Needs triage & validation to fixing in Bugfixing Feb 15, 2024
@ericjb
Copy link
Author

ericjb commented Feb 17, 2024

I decided to look into how this was treated in other packages. I found the plot_timeseries() function in pytimetk. That function takes a parameter 'engine', which you can set to one of 'plotly', 'plotnine', 'matplotlib'. They have done a nice job. The bug I found with sktime's plot_series() does not occur in plot_timeseries(). plotnine is a port of R's ggplot2 package to python, and is built on top of matplotlib. I am wondering if it makes sense to convert plot_series() to be a wrapper to plot_timeseries at some lower level. I have to investigate this further. I may be tied up in the next few weeks, though, on other projects.

@fkiraly
Copy link
Collaborator

fkiraly commented Feb 17, 2024

I am wondering if it makes sense to convert plot_series() to be a wrapper to plot_timeseries at some lower level.

I would tend towards not doing that, in order to keep core dependencies small and the set of important soft dependencies too - if the package were dedicated to time series ploting and a gold standard, then yes - but the various packages menioned don't seem to be that popular at the moment? Perhaps using plotnine directly might be a compromise, though personally I'd still prefer clean direct use of matplotlib.

@ericjb
Copy link
Author

ericjb commented Mar 10, 2024

Hi @fkiraly, I started working on this. I went the direction of using plotnine and, so far, have hacked a version of plot_series() that replaces the matplotlib calls by plotnine calls. I had planned to use the method ggplot.draw() which converts the plotnine object to a matlib object so that this change would not be a breaking change for existing users who manipulate the matplotlib object returned by plot_series. Unfortunately, I discovered that this doesn't really work for a few reasons (this was confirmed by the author of plotnine).

This got me thinking that the pytimetk approach I mentioned above could work. Namely, add a new parameter to plot_series, named 'engine' which would default to 'matplotlib' and would introduce no change to the existing implementation when called that way. However, setting engine='plotnine' would return a plotnine object (class ggplot I think).

If this approach would be acceptable to you, it would also meet my needs for another project I am working on and I will continue to work on it. Otherwise, I am not sure what to do. The bug in the current implementation relates to how the x-axis is determined (based on all the components in the plot such as the time series, the forecasts, the prediction intervals) and I think that so much effort has gone into this question at the lower level (e.g. in plotnine lower levels) there is no reason to poorly re-implement some buggy version at the level of plot_series.

@fkiraly
Copy link
Collaborator

fkiraly commented Mar 10, 2024

Hm, I suppose that makes sense.

Having an engine arg or backend makes sense, if it meets your use case.

Of course we should be very careful with changing signatures etc, as plot_series is one of the widely used methods by users and in tutorials, even if it does not beling to the "core machinery" of sktime.

So if we can make sure the plots look very similar, and soft dependencies are mangaed well, personally I would not have too many problems with this - as long as we handle things consistently. Perhaps there is a more general discussion to be had about plotting.

@ericjb
Copy link
Author

ericjb commented Mar 10, 2024

Great. Adding a named parameter with default value does not change the signature (in the sense that this will not break existing code).
I think the central question for sktime w.r.t. plotting is what is the intended goal. There are packages devoted to graphics (matplotlib, plotnine, plotly, ...) and sktime cannot (and should not) compete with these. Providing convenience functions to make things easy for users for "vanilla" and "typical" cases is great, but there has to be care that these are not simply a layer over the backend that blocks the user from the full functionality of the backend.

@fkiraly
Copy link
Collaborator

fkiraly commented Mar 10, 2024

The intention is to have a couple of standardized, simple plotting functions, mostly for tutorials and simple illustratoin.

When sktime started, there was no one-liner for "plot me this series", so we wrote our own. More generally "plot this common object" with lots of sensible defaults is what utils.plotting should provide.

Probably current state of the ecosystem is closer to having sth like this, but it's still not perfect imo.

I don't think it makes sense to start replacing other plotting packages, that's out of the scope of sktime.

@ericjb
Copy link
Author

ericjb commented Mar 11, 2024

I may have a much simpler approach. My first simple test seems to validate it. As I stated yesterday, mature graphics packages know how to define the x-axis given multiple objects (e.g. curves) to plot. This functionality should be left to the graphics package (which is not the case currently in plot_series). Additionally, a great feature of pandas is its use of an index for DataFrames or Series. The index should not be ignored or replaced (which it sort of is currently in plot_series). I ripped out the use of seaborn (except for the colors) and ripped out the union of the indexes across the series, converted them to timestamp (if necessary), and used standard plt.plot() calls. Seems to work just fine in an admittedly limited test. Will now work on a more complete version.

@fkiraly
Copy link
Collaborator

fkiraly commented Mar 12, 2024

hm, interesting. It is very possible that things have changed over the last 5 years in the upstream dependencies. plot_series was originally created as a demo and tutorial showcase utility, at the very beginning.

@ericjb
Copy link
Author

ericjb commented Mar 15, 2024

I have completed a replacement version of plotting.py in sktime.utils that seems to work. I did not touch the function plot_windows() as this seems to be deprecated. It is not referred to anywhere in the documentation. (In fact, the latest documentation defines its own version as part of an example.) I also did not touch the function plot_calibration() which uses plotting methods from pandas. I believe I need to do a pull request. Some guidance would be appreciated. Thanks

@fkiraly
Copy link
Collaborator

fkiraly commented Mar 15, 2024

Some guidance would be appreciated. Thanks

Great! How about you open the pull request with your current state of the code base, and we take it from there?
Sounds like a neat contribution!

@ericjb
Copy link
Author

ericjb commented Mar 15, 2024

Pull request opened

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment