Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] classification test scenario with three classes #6374

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

fkiraly
Copy link
Collaborator

@fkiraly fkiraly commented May 1, 2024

This adds a classification test scenario with three classes.

Currently, only two classes were tested.

@fkiraly fkiraly added module:classification classification module: time series classification enhancement Adding new functionality labels May 1, 2024
"X_univariate": True,
"X_unequal_length": False,
"is_enabled": True,
"n_classes": 3,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would there be any harm in going to more than three classes? I understand that we are interested in multi-class prediction rather than three-class prediction, so e.g., five classes might catch more errors. Or perhaps this test can be made parametric with anywhere from three to e.g., 15 classes to increase coverage further (at the cost of more runtime)?

Copy link
Collaborator Author

@fkiraly fkiraly May 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right, of course - it is runtime that worries me as you say, and the minimum sample size requirement.

Some classifiers have grid search internally, 5-fold as default. So you'd want to see at least 3 instances of each in the training set, which gets you to n_instances = n_classes * 4 (or better 5, 6).

Many classifiers have between second and third power scaling on the number of instances, and there are a large number of classifiers, on each of which the scenario is run. So, going from 3 to about 4 doubles the runtime caused by this scenario, I would guess, which is about 1/3 or 1/4 of the total classifier runtime already.

We can of course check how much it is really, empirically, if you would like to, I do not mind - though I wonder if 5 classes gives that much more coverage than 3. My gut feeling is, if sth breaks with 5, it already breaks with 3, e.g., the one-hot encoder example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Adding new functionality module:classification classification module: time series classification
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants