Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC Move _more_tags to "developer API" via __sklearn_tags__ #28910

Open
adrinjalali opened this issue Apr 29, 2024 · 4 comments
Open

RFC Move _more_tags to "developer API" via __sklearn_tags__ #28910

adrinjalali opened this issue Apr 29, 2024 · 4 comments
Labels
API Developer API Third party developer API related RFC

Comments

@adrinjalali
Copy link
Member

As a part of making it easier and more "standard" to write scikit-learn estimators by third party developers, we have been slowly developing a "developer API" kind of thing, which are useful for third party developers, but not end users of the estimators.

Some of the work has been:

  • __sklearn_clone__
  • __metadata_request__fit, ...
  • get_metadata_routing

What I'm proposing here, is to create a new __sklearn__tags__ method instead of the existing _more_tags.

We've had a lot of discussions when we designed the current system, which goes through the MRO and the _more_tags adds to the tag set instead of returning the tags. Now the question is do we want to keep the current system or do we want __sklearn_tags__ to return the estimator's tags instead, and call parent's __sklearn_tags__ inside itself? As in, instead of:

class Estimator(BaseEstimator):
    ...
    def _more_tags(self):
        return {...}

to have:

class Estimator(BaseEstimator):
    ...
    def __sklearn_tags__(self):
        tags = super().__sklearn_tags__()
        # update tags
        return return tags

Inside our tags, we also have some starting with _ such as _xfail, and the question is do we want to make those public.

cc @scikit-learn/core-devs

@adrinjalali adrinjalali added API RFC Developer API Third party developer API related labels Apr 29, 2024
@lorentzenchr
Copy link
Member

+1 on

do we want __sklearn_tags__ to return the estimator's tags

and removing _more_tags

@thomasjpfan
Copy link
Member

I am +1. I updated #22606 that implements __sklearn_tags__.

@glemaitre
Copy link
Member

Inside our tags, we also have some starting with _ such as _xfail, and the question is do we want to make those public.

I don't see the reason why we need to have it public.

Making the switch to __sklearn_tags__ would be a step to make the system public which is fine with me. However, I think that we also need to revisit the tags and their meaning. I'm fine with most of them but I have the following issues:

  • multioutput_only and multioutput could be confusing. We should have a single tag. An alternative solution is to have a tag that contains a set of string defining the type of problems supported, e.g. target_type = {"binary", "multiclass"}
  • X_types might be expanded. Maybe the refactoring of _convert_container in RFC _convert_container #28681 could help at designing a set of tags.

@adrinjalali
Copy link
Member Author

I agree, before merging #22606 I think it'd be nice to clean up our tags. I'll work on it and open PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Developer API Third party developer API related RFC
Development

No branches or pull requests

4 participants