[ENH] Refactor base series annotator #6265

Alex-JG3 · 2024-04-06T09:20:54Z

Reference Issues/PRs

See #3214 .

What does this implement/fix? Explain your changes.

Refactors the BaseSeriesAnnotator class:

Removes the fmt and label attributes.
Adds the learning_type and task attributes.
Adds default predict and transform method.
Adds default predict_points and predict_segments` methods.
Adds methods for converting between dense and sparse output formats.
Adds default methods for predict and transform.

Does your contribution introduce a new dependency? If yes, which one?

No.

What should a reviewer concentrate their feedback on?

Did you add any tests for the change?

Any other comments?

PR checklist

For all contributions

I've added myself to the list of contributors with any new badges I've earned :-)
How to: add yourself to the all-contributors file in the sktime root directory (not the CONTRIBUTORS.md). Common badges: code - fixing a bug, or adding code logic. doc - writing or improving documentation or docstrings. bug - reporting or diagnosing a bug (get this plus code if you also fixed the bug in the PR).maintenance - CI, test framework, release.
See here for full badge reference
Optionally, for added estimators: I've added myself and possibly to the maintainers tag - do this if you want to become the owner or maintainer of an estimator you added.
See here for further details on the algorithm maintainer role.
The PR title starts with either [ENH], [MNT], [DOC], or [BUG]. [BUG] - bugfix, [MNT] - CI, test framework, [ENH] - adding or improving code, [DOC] - writing or improving documentation or docstrings.

For new estimators

I've added the estimator to the API reference - in docs/source/api_reference/taskname.rst, follow the pattern.
I've added one or more illustrative usage examples to the docstring, in a pydocstyle compliant Examples section.
If the estimator relies on a soft dependency, I've set the python_dependencies tag and ensured
dependency isolation, see the estimator dependencies guide.

ClaSP uses `fmt` to change the output format, so it has been kept as a attribute of the `ClaSPSegmentation` class.

fkiraly · 2024-04-06T18:23:00Z

sktime/annotation/adapters/_pyod.py

    """

    _tags = {"python_dependencies": "pyod"}

-    def __init__(self, estimator, fmt="dense", labels="indicator"):
+    def __init__(self, estimator, learning_type="unsupervised"):


isn't pyod always unsupervised?

fkiraly · 2024-04-06T19:30:58Z

As discussed in the last developer meeting, the approach using tags is showcased here: #6271

This allows to keep all "property"-like attributes in one, inspectable place.

fkiraly · 2024-04-11T19:24:57Z

PS: if you migrate to tags, you need to merge the parts of #6271 as well that add tags to the registry.

fkiraly · 2024-05-18T20:49:25Z

I am currently reviewing and will write a longer review.

Regarding the last question, would a solution not listed clear things up: transforming the last interval to [4, 6) ?

Alex-JG3 · 2024-05-18T22:16:45Z

I am currently reviewing and will write a longer review.

Regarding the last question, would a solution not listed clear things up: transforming the last interval to [4, 6) ?

For that example yes but I think it becomes less clear if we have an index that is not equally spaces. For example, taking the example from before and making it into a proper pandas series we get,

the interval of the final segment could be [4, 6) and that would be sensible. But suppose the index is not equally spaces.

For example, I don’t think an interval of [4.1, 5.1) would be sensible for the third segment. For completeness, the sparse format of the series above would in the current implementation would be,

[0.1, 2.2) 1
[2.2, 4.1) 2
[4.1, 4.3) 3

…lex-JG3/sktime into pr/6265

fkiraly

Very nice!

I think this will be the foundation of finally sorting this module out.
I also like how this removes boilerplate from the concrete classes.

The one worry I have is about removal of arguments and that the changes here are not deprecation safe.
The module is clearly marked as "experimental", so we could do that, I wonder though whether we can do sth, like leaving the old args, and raising warnings that they will disappear, instead.

fkiraly · 2024-05-20T01:28:25Z

For that example yes but I think it becomes less clear if we have an index that is not equally spaces.

I see. How about making the last interval run over by the last difference, then?
In the unequally spaced example, [4.1, 4.5) then. Bit of a hack, but it is a heuristic that seems to agree with a good solution in the regularly spaced case.

I would also like to summon @VascoSch92 who might have a useful opinion here, given both sequences and sets are involved at the same time - hope you enjoy this 😁

The greater question is coming up with mutually inverse functions from a value representation to a segment representation, as above. This is not possible if the target index is not known (only partial inverses), but the function assume we know X.index - see above, discussion starts here in a self-contained form: #6265 (comment)

VascoSch92 · 2024-05-20T12:33:35Z

I would also like to summon @VascoSch92 who might have a useful opinion here, given both sequences and sets are involved at the same time - hope you enjoy this 😁
:-)

Just some thoughts from my side:

I think the solution of @Alex-JG3 is good. What are you doing here is to represent in an intelligent way the values of a column/serie, i.e., you want to group interval of contiguous indexes where the column/serie is constant.

After that if you want to consider closed/open/right-closed/... interval, it is just taste, but it should be written clearly in the documentation or/and in the docstring.

The choice of taking left-closed and right-open interval ( for example [4.1, 4.5) ) seems legit, as it is also what Python do with lists and slicing, i.e., a list start from index 0 and ends at index length-1.

For that example yes but I think it becomes less clear if we have an index that is not equally spaces.
I see your point but

the index of a df is always monotonically increasing (is correct? I'm not sure now)
you said that you consider interval of contiguous indices with the same value.
Therefore, it should be clear.

fkiraly · 2024-05-20T13:19:30Z

the index of a df is always monotonically increasing (is correct? I'm not sure now)

Not for pandas.DataFrame in general, but we can enforce this in our output assumptions (should be tested, let's make a note of this!)

Warn that fmt and labels will be deprecated

Alex-JG3 · 2024-05-20T17:59:13Z

the index of a df is always monotonically increasing (is correct? I'm not sure now)

No the index is not currently always monotonically increasing but it should be. In fact I think some of the new methods will break if index is not monotonically increasing. We need to enforce this.

I have added deprecation warnings in clasp and the pyod annotator. I think those are the only two classes that are affected by deprecation. Currently, the deprecation message feels very brief:

The fmt argument is going to be removed.

Is this sufficient? Do we want to point a specific version when these arguments will be depracted?

fkiraly · 2024-05-20T23:35:17Z

Is this sufficient? Do we want to point a specific version when these arguments will be depracted?

I would say sth like

f"Warning from {type(self).__name__}: fmt argument will be removed in 0.31.0. For behaviour equivalent to current fmt=a usage, do X; for fmt=b usage, do Y",

using warn from sktime.utils.warning.

Alex-JG3 · 2024-05-21T19:03:44Z

I have rewritten the deprecation warning. I have also removed the deprecation warning for labels for the pyod adapter since the changes to BaseSeriesAnnotator do not deal with scores properly yet. This will come in a future PR.

If everyone else is happy are we ready to merge?

Alex-JG3 · 2024-05-22T16:54:29Z

Could someone please rerun the CI/CD pipeline? Looks like the docs jobs has failed due to running out of time to build. I don't think this as a result of changes introduced in this PR.

fkiraly · 2024-05-22T18:50:24Z

Sure - rerunning

fkiraly · 2024-05-22T19:01:07Z

Restarted - I also updated the deprecation messages and added release manager notes.

Question, it seems we want to remove the labels param in PyODAnnotator too - should there be a deprecation warning as well?

This reverts commit fdaf7af.

fkiraly

I think we are ready to merge for release with 0.30.0.

@Alex-JG3, it would be great if you could check the updated warnings and release manager comments.

Alex-JG3 and others added 2 commits April 4, 2024 16:39

Add functions to check for valid tasks and learning types

cbb611f

Merge branch 'sktime:main' into refactor_base_series_annotator

5eb5c5c

Alex-JG3 requested review from achieveordie, benHeid, fkiraly and yarnabrina as code owners April 6, 2024 09:20

Alex-JG3 marked this pull request as draft April 6, 2024 09:21

Alex-JG3 added 6 commits April 6, 2024 11:47

Remove fmt and dense and add learning_type and task

54f3087

Remove fmt and labels from PyOD adapter

0290a87

Add task and learning_type to __init__

26c064c

ClaSP uses `fmt` to change the output format, so it has been kept as a attribute of the `ClaSPSegmentation` class.

Remove fmt and labels and add task and learning_type

dc7095f

Change task and learning_type for the HMM classes

a4cbb45

Remove fmt and labels, add task and learning_type

0f750e2

fkiraly reviewed Apr 6, 2024

View reviewed changes

moved to tag

e7895a7

fkiraly mentioned this pull request Apr 6, 2024

[ENH] Refactor base series annotator - design with tags #6271

Closed

fkiraly added module:annotation enhancement Adding new functionality API design API design & software architecture labels Apr 6, 2024

Alex-JG3 added 3 commits April 10, 2024 21:29

Fix whitespace

737cf3d

Convert PyOD to tags

48f3c55

Convert to tags

b960cdc

Alex-JG3 added 6 commits April 11, 2024 21:53

Start a function to convert from sparse to dense

d9e0908

Initialise a default transform method

9b0e9ca

Update docstring example

9920265

Add method for converting from dense to sparse formats

26795a3

Update docstring examples

25d40d7

Fix typos

fe0ab78

fkiraly added 2 commits May 19, 2024 19:34

Merge branch 'main' into pr/6265

2f7cfc1

Merge branch 'refactor_base_series_annotator' of https://github.com/A…

b8f8ee6

…lex-JG3/sktime into pr/6265

fkiraly previously approved these changes May 20, 2024

View reviewed changes

Add fmt argument back in for backward compatibility

d8c865d

Alex-JG3 dismissed fkiraly’s stale review via d8c865d May 20, 2024 17:13

Alex-JG3 added 4 commits May 20, 2024 18:20

Add deprecation warning.

e86d7c8

Add the fmt and labels arguments back in for backwards compatibility

fe617bd

Warn that fmt and labels will be deprecated

Remove DeprecationWarning import

d319798

Add fmt back to _predict_scores

4ce5d5e

Alex-JG3 added 2 commits May 21, 2024 19:57

Rewrite deprecation warning for fmt argument

c519a2e

Fix formatting

43f22a8

fkiraly added 3 commits May 22, 2024 19:56

clasp warn

ada50a9

pyod

bf2feda

Update _pyod.py

379e077

fkiraly added 4 commits May 22, 2024 20:21

Merge branch 'main' into pr/6265

030678d

linting

66f2bab

Update base.py

fdaf7af

Revert "Update base.py"

6cd11b3

This reverts commit fdaf7af.

fkiraly approved these changes May 23, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH] Refactor base series annotator #6265

[ENH] Refactor base series annotator #6265

Alex-JG3 commented Apr 6, 2024 •

edited by fkiraly

fkiraly Apr 6, 2024

fkiraly commented Apr 6, 2024 •

edited

fkiraly commented Apr 11, 2024

fkiraly commented May 18, 2024

Alex-JG3 commented May 18, 2024

fkiraly left a comment

fkiraly commented May 20, 2024 •

edited

VascoSch92 commented May 20, 2024

fkiraly commented May 20, 2024

Alex-JG3 commented May 20, 2024

fkiraly commented May 20, 2024 •

edited

Alex-JG3 commented May 21, 2024

Alex-JG3 commented May 22, 2024

fkiraly commented May 22, 2024

fkiraly commented May 22, 2024

fkiraly left a comment

[ENH] Refactor base series annotator #6265

Are you sure you want to change the base?

[ENH] Refactor base series annotator #6265

Conversation

Alex-JG3 commented Apr 6, 2024 • edited by fkiraly

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Does your contribution introduce a new dependency? If yes, which one?

What should a reviewer concentrate their feedback on?

Did you add any tests for the change?

Any other comments?

PR checklist

For all contributions

For new estimators

fkiraly Apr 6, 2024

Choose a reason for hiding this comment

fkiraly commented Apr 6, 2024 • edited

fkiraly commented Apr 11, 2024

fkiraly commented May 18, 2024

Alex-JG3 commented May 18, 2024

fkiraly left a comment

Choose a reason for hiding this comment

fkiraly commented May 20, 2024 • edited

VascoSch92 commented May 20, 2024

fkiraly commented May 20, 2024

Alex-JG3 commented May 20, 2024

fkiraly commented May 20, 2024 • edited

Alex-JG3 commented May 21, 2024

Alex-JG3 commented May 22, 2024

fkiraly commented May 22, 2024

fkiraly commented May 22, 2024

fkiraly left a comment

Choose a reason for hiding this comment

Alex-JG3 commented Apr 6, 2024 •

edited by fkiraly

fkiraly commented Apr 6, 2024 •

edited

fkiraly commented May 20, 2024 •

edited

fkiraly commented May 20, 2024 •

edited