Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Q ExponentiatedGradient results vary using the same model and data #1261

Open
CarlaFernandez opened this issue Jul 20, 2023 · 19 comments
Open
Labels
question Further information is requested

Comments

@CarlaFernandez
Copy link

Problem

Hello,
I'm using the ExponentiatedGradient to perform mitigations on a Catboost classifier. Each time I run the fit and obtain the MetricsFrame for my sensitive features, I get different results. I would like to know if this is normal, or if I may be missing some seed parameter somewhere.

Reproducible example

I'm working with the census data, and this is my code:

from catboost import CatBoostClassifier
from fairlearn.metrics import MetricFrame
from fairlearn.reductions import ExponentiatedGradient, DemographicParity
from functools import partial
from sklearn.base import BaseEstimator, ClassifierMixin
from sklearn.datasets import fetch_openml
from sklearn.metrics import accuracy_score, recall_score, precision_score
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
import fairlearn
import pandas as pd

data = fetch_openml(data_id=1590, as_frame=True)
X = data.data
y = LabelEncoder().fit_transform(data.target) 

params = {
    "iterations": 100,
    "learning_rate": 0.1,
    "depth": 7,
    "loss_function": 'Logloss',
    "eval_metric": 'Accuracy',
    "verbose": False,
    "random_state": 42
}

categorical_features = list(X.select_dtypes(include=['category', 'object']).columns)
numeric_features = list(X.select_dtypes(include=['int64', 'float64']).columns)
for feat in categorical_features:
    X[feat] = X[feat].astype(str)
    
X.loc[:, categorical_features] = X.loc[:, categorical_features].fillna("Nulo")

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

class CatBoostClassifierAdapter(BaseEstimator, ClassifierMixin):
    def __init__(self, cat_features=None, text_features=None, **kwargs):
        self.cat_features = cat_features
        self.text_features = text_features
        if "sample_weight" in kwargs.keys():
            kwargs["class_weights"] = kwargs["sample_weight"]
            kwargs.pop("sample_weight")
        self.model = CatBoostClassifier(**kwargs)
        
    def fit(self, X, y, **kwargs):
        self.model.fit(
            X, y, 
            cat_features=self.cat_features, 
            text_features=self.text_features,
            verbose=False, **kwargs
        )
        return self.model
    
    def predict(self, X):
        check_is_fitted(self.model)
        return self.model.predict(X)

model = CatBoostClassifierAdapter(cat_features=categorical_features, **params)

expgrad = ExponentiatedGradient(
    model,
    constraints=DemographicParity(),
    max_iter=5
)

expgrad.fit(X_train, y_train, sensitive_features=X_train[['sex', 'race']])

predictions = expgrad.predict(X_test)

accuracy = accuracy_score(y_test, predictions)
print(f"Accuracy: {accuracy}")

sensitive_features_test = X_test[['sex', 'race']]

recall_func = partial(recall_score, zero_division=0.0)
recall_func.__name__ = "recall"

precision_func = partial(precision_score, zero_division=0.0)
precision_func.__name__ = "precision"

metrics = MetricFrame(metrics={"recall": recall_func, "count": fairlearn.metrics.count},
                      y_true=y_test,
                      y_pred=predictions,
                      sensitive_features=sensitive_features_test)

by_group_df = metrics.by_group.reset_index()
by_group_df
@romanlutz
Copy link
Member

Yes, you're missing the random_state parameter on predict:

image

https://fairlearn.org/v0.8/api_reference/fairlearn.reductions.html#fairlearn.reductions.ExponentiatedGradient.predict

Please reopen if this doesn't fully address your question.

@romanlutz romanlutz changed the title ExponendiatedGradient results vary using the same model and data Q ExponentiatedGradient results vary using the same model and data Jul 21, 2023
@romanlutz romanlutz added the question Further information is requested label Jul 21, 2023
@CarlaFernandez
Copy link
Author

Hello @romanlutz, indeed by setting the random_state parameter I can obtain consistent predictions using the same fitted instance of ExponentiatedGradient. However, if I refit the ExponentiatedGradient I lose reproducibility again. I can partly solve it by setting a numpy seed before fitting, and that way the results are reproducible in a single system, even restarting my jupyter kernel, but not if I move for example from my windows to linux PCs.

I was especially interested in this kind of reproducibility because I'm writing a technical post and I would have liked the readers to be able to reexecute the notebook and obtain exactly the same results. But I guess it's just due to the nature of the algorithm that it's not possible.

Thanks anyways!

@romanlutz
Copy link
Member

@MiroDudik this sounds like a question for you.

@romanlutz romanlutz reopened this Jul 24, 2023
@MiroDudik
Copy link
Member

@CarlaFernandez -- there shouldn't be any randomness in the exponentiated gradient during fitting. we tried hard to only use deterministic optimizers. my guess is that what you see is somehow coming from CatBoostClassifier and/or possibly from how we clone the classifiers internally here:

estimator = clone(estimator=self.estimator, safe=False)

So, one idea would be to compare whether the following three classifiers are returning the same model:

from sklearn import clone

# your code setting up CatBoostClassifierAdapter and params

model1 = CatBoostClassifierAdapter(cat_features=categorical_features, **params)
model2 = CatBoostClassifierAdapter(cat_features=categorical_features, **params)
model3 = clone(estimator=model1, safe=False)

model1.fit(...)
model2.fit(...)
model3.fit(...)

@CarlaFernandez
Copy link
Author

Hello @MiroDudik , how would I go about checking that model1, model2 and model3 are the same model? I can see that all of them return the same predictions, is that enough to ensure they are the same model? (I would think so)

@hildeweerts
Copy link
Contributor

Hi @CarlaFernandez, I am not super familiar with the catboost library, but you could inspect the calculated feature importances or plot the tree to check whether the models are the same.

@CarlaFernandez
Copy link
Author

Thanks Hilde, good idea! Doing that, I see that the model importances change after cloning:
image

@MiroDudik
Copy link
Member

MiroDudik commented Aug 1, 2023

It seems that sklearn.clone doesn't work as expected. Can you also compare with the following model?

import copy

model4 = copy.deepcopy(model1)  # do this before model1 is fitted!!!

Thanks so much for helping us chase this bug!

@CarlaFernandez
Copy link
Author

Using deepcopy the model gets copied correctly!
image

@MiroDudik
Copy link
Member

We have a bugfix. We'll just need to come up with a small regression test that would not create a dependence on catboost...

@hildeweerts
Copy link
Contributor

I suspect that there is some kind of bug in the CatBoostClassifier.get_params() related to the random_state parameter, which is used as a synonym for the random_seed parameter.

For some reason i have some issues with installing catboost, @CarlaFernandez could you let me know what you get when you do:

model1.get_params()

You get a very similar result if you delete the random_state of get_params() of a RandomForestClassifier:

import copy
from sklearn import clone
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier

data=load_iris()
X = data.data
y = data.target

class RandomForestClassifierTest(RandomForestClassifier):
    def get_params(self,**kwargs):
        params = super().get_params(**kwargs)
        del params['random_state']
        return params
    
model1 = RandomForestClassifierTest(random_state=0)
model2 = RandomForestClassifierTest(random_state=0)
model3 = clone(estimator=model1, safe=False)
model4 = copy.deepcopy(model1)

print(model1.fit(X,y).feature_importances_)
print(model2.fit(X,y).feature_importances_)
print(model3.fit(X,y).feature_importances_)
print(model4.fit(X,y).feature_importances_)

which outputs:

[0.09090795 0.02453104 0.46044474 0.42411627]
[0.09090795 0.02453104 0.46044474 0.42411627]
[0.10589901 0.02723903 0.42592277 0.44093919]
[0.09090795 0.02453104 0.46044474 0.42411627]

@CarlaFernandez
Copy link
Author

Hi @hildeweerts , this is the output I obtain from model1.get_params():

{'cat_features': ['workclass',
  'education',
  'marital-status',
  'occupation',
  'relationship',
  'race',
  'sex',
  'native-country'],
 'text_features': None}

I guess what you wanted to see are the hyperparameters for the actual catboost model, not the adapter, so here they are (model1.model.get_params()):

{'iterations': 100,
 'learning_rate': 0.1,
 'depth': 7,
 'loss_function': 'Logloss',
 'verbose': False,
 'eval_metric': 'Accuracy',
 'random_state': 42}

model1, model2 and model4 all have those same params. model3 returns an empty dictionary, but I think that's expected giving the sklearn clone functionality.

I'm attaching my requirements.txt file in case it helps you with the catboost installation or debugging the problem:
requirements.zip

Thank you :)

@MiroDudik
Copy link
Member

MiroDudik commented Aug 3, 2023

The fact that model3.model.get_params() returns an empty dictionary seems to be a symptom of a bug. Perhaps this line of clone is the culprit:

https://github.com/scikit-learn/scikit-learn/blob/7f9bad99d6e0a3e8ddf92a7e5561245224dab102/sklearn/base.py#L105

What does this line return:

model1.get_params(deep=False)          # similarly from model2,3,4

BTW, even if the implementation of CatBoostClassifier or CatBoostClassifierAdapter somehow break things by not implementing all the requirements of the sklearn estimator API, I would still be in favor of replacing clone by deepcopy. We would like to be robust to slightly non-compliant implementations of estimators (as long as they support fit and predict). I am thinking about any use case where people want to write a quick wrapper of some non-sklearn classifier/regressor. Wrapping fit and predict is usually straightforward, but it's easy to make mistakes around handling of parameters and cloning (or not even be aware that it's something that the wrapper should pay attention to).

@MiroDudik
Copy link
Member

@CarlaFernandez -- together with @romanlutz and @hildeweerts , we played a bit with your implementation, and the issue is in how get_params is implemented by BaseEstimator--and inhereted by CatBoostClassifierAdapter. More precisely, get_params only returns the keyword parameters from the construtor (not kwargs) and only those parameters get cloned (with some caveats) by clone. This means that the only two parameters that get cloned in CatBoostClassifierAdapter are cat_features and text_features. If you want to work with the current fairlearn code, you might want to try to refactor CatBoostClassifierAdapter as follows:

  1. replace kwargs in the __init__ with a dictionary argument (so you would directly pass params rather than **params to CatBoostClassifierAdapter)
  2. in __init__ only save the arguments into the fields, and postpone any parameter processing (including construction of CatBoostClassifier) to fit

That said, I still think that we should still replace clone by deepcopy (for the reason I mention in my previous comment). Are there any reasons why we shouldn't do it? [tagging @adrinjalali]

@romanlutz
Copy link
Member

I agree with @MiroDudik on replacing clone with deepcopy, but curious what @adrinjalali thinks as well.

Adding a bit to what @MiroDudik wrote regarding the code. Perhaps we can resolve this for your particular problem @CarlaFernandez, too, so that you don't have to wait for a release.

Can you elaborate on the purpose of the adapter class?
It seems like a sample_weight arg in the constructor is remapped to class_weights. Usually, sample_weight is passed in fit, though. Also, it doesn't seem like it's actually used. For the code below I've removed that part.
I also moved some things around and got it to work just fine as far as I can tell. Essentially, no kwargs in the wrapper constructor (as @MiroDudik wrote you can make it a dict or add them individually as I did below, your choice), no model initialization in the wrapper constructor. The model constructor args go on the wrapper constructor explicitly and are passed to the model constructor in the fit method.

class CatBoostClassifierAdapter(BaseEstimator, ClassifierMixin):
    def __init__(self, cat_features=None, text_features=None,
                 iterations=100, learning_rate=0.1, depth=7,
                 loss_function='Logloss', eval_metric="Accuracy",
                 verbose=False, random_state=42):
        self.cat_features = cat_features
        self.text_features = text_features
        self.model = None
        self.iterations = iterations
        self.learning_rate = learning_rate
        self.depth = depth
        self.loss_function = loss_function
        self.eval_metric = eval_metric
        self.verbose = verbose
        self.random_state = random_state

    def fit(self, X, y):
        self.model = CatBoostClassifier(
            iterations=self.iterations,
            learning_rate=self.learning_rate,
            depth=self.depth,
            loss_function=self.loss_function,
            eval_metric=self.eval_metric,
            verbose=self.verbose,
            random_state=self.random_state)
        self.model.fit(
            X, y,
            cat_features=self.cat_features,
            text_features=self.text_features,
            verbose=self.verbose
        )
        return self.model

    def predict(self, X):
        check_is_fitted(self.model)
        return self.model.predict(X)

model1 = CatBoostClassifierAdapter(cat_features=categorical_features)
model2 = CatBoostClassifierAdapter(cat_features=categorical_features)
model3 = clone(estimator=model1, safe=False)
model4 = copy.deepcopy(model1)

model1.fit(X_train, y_train)
model2.fit(X_train, y_train)
model3.fit(X_train, y_train)
model4.fit(X_train, y_train)

y1 = model1.predict(X_test)
y2 = model2.predict(X_test)
y3 = model3.predict(X_test)
y4 = model4.predict(X_test)

print((y1 == y2).all())
print((y1 == y3).all())
print((y1 == y4).all())

# prints True True True

and if I look into the wrapper's model (e.g., model1.model.__dict__) for each of them they are identical, too.

BTW I ran the same with the unchanged wrapper class and the y3 was different, just as one would expect given what's been explained here so far.

@CarlaFernandez I hope that makes sense. Feel free to respond if you have thoughts, comments, or questions. I'd love to hear if that unblocks you 🙂

@CarlaFernandez
Copy link
Author

Hello @romanlutz and all, thanks for the support on this issue.

I originally created the Adapter class simply because I thought I needed my model to inherit from sklearn's BaseEstimator to make it work with fairlearn. Reading the docs more carefully, I now see it just needed to implement fit and predict. I also didn't notice that sample_weight was a valid parameter for CatBoostClassifier already, hence the reassignment inside my fit method.

I now think my issue was more due to a lack of attention to detail while reading than anything else! But, I still think it is a really useful thing to document, since people might be using custom models and need to pass extra parameters to the constructor. And of course the fix substituting copy for deepcopy is a nice find!

Thanks again and regards

@romanlutz
Copy link
Member

It's great feedback, thanks for being so responsive and providing a sample that is easy to reproduce.

@fairlearn/fairlearn-maintainers this is a good reminder to make our requirements of estimators more visible in the docs.

@adrinjalali
Copy link
Member

I really wouldn't replace clone with deepcopy.

  • clone creates a fresh unfitted estimator, whereas deepcopy would copy a fitted estimator.
  • There is a lot of discussions going on regarding cloning and random state, which we also don't want to deal with here
  • users can customize cloning behavior by implementing __sklearn_clone__: https://scikit-learn.org/dev/developers/develop.html#cloning

So I'd suggest document well what we expect, how the estimator works, and how people could potentially fix this issue.

@hildeweerts
Copy link
Contributor

Thanks @adrinjalali!

i'm not sure whether this was already discussed during one of the calls during my vacation, but @MiroDudik what do you think given adrin's points?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants