Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't use ExponentiatedGradient with GridSearchCV #1196

Open
tmcarvalho opened this issue Jan 24, 2023 · 6 comments
Open

Can't use ExponentiatedGradient with GridSearchCV #1196

tmcarvalho opened this issue Jan 24, 2023 · 6 comments
Labels
documentation question Further information is requested

Comments

@tmcarvalho
Copy link

I want all the results of GridSearchCV and therefore, I need the cv_results_ from it. However, I passed the GridSearch to ExponentiatedGradient, and then I called the fit but after fitting the ExponentiatedGradient the cv_results_ were not returned.

Here is my code:

gs = GridSearchCV(
                estimator=model,
                param_grid=param,
                cv=RepeatedKFold(n_splits=5, n_repeats=2),
                scoring=scoring,
                return_train_score=True,
                refit='roc_auc_curve',
                n_jobs=-1)
mitigator = ExponentiatedGradient(gs, constraints=EqualizedOdds())
    
mitigator.fit(train_X, y_train, sensitive_features=x_train[set_sa])

And here is the issue.

print(mitigator.estimator)

GridSearchCV(cv=RepeatedKFold(n_repeats=2, n_splits=5, random_state=None),
             estimator=RandomForestClassifier(), n_jobs=-1,
             param_grid={'max_depth': [4, 7, 10],
                         'n_estimators': [100, 250, 500]},
             refit='roc_auc_curve', return_train_score=True,
             scoring={'acc': 'accuracy', 'bal_acc': 'balanced_accuracy',
                      'f1': 'f1', 'f1_weighted': 'f1_weighted',
                      'gmean': make_scorer(geometric_mean_score),
                      'roc_auc_curve': make_scorer(roc_auc_score, needs_proba=True, max_fpr=0.001)})
print(mitigator.estimator.cv_results_)

AttributeError: 'GridSearchCV' object has no attribute 'cv_results_'

Any suggestion?

@romanlutz
Copy link
Member

Excellent question @tmcarvalho ! It appears that the gaping hole in our documentation for ExponentiatedGradient is responsible for this. If we had a proper user guide it would tell you that estimator is just the original one you passed in. ExponentiatedGradient clones it in each of its iterations and saves the resulting GridSearchCV objects under predictors_. There's an example in https://fairlearn.org/main/auto_examples/plot_credit_loan_decisions.html#reductions-approach-to-unfairness-mitigation that has the following code

    exp_grad_est = ExponentiatedGradient(
        estimator=estimator,
        sample_weight_name='classifier__sample_weight',
        constraints=EqualizedOdds(difference_bound=epsilon),
    )
    exp_grad_est.fit(X_train, y_train, sensitive_features=A_train)
    predictors = exp_grad_est.predictors_

The last line is the one of interest as you can see predictors_ being extracted from the ExponentiatedGradient object.

Aside from that, there's another problem, though. ExponentiatedGradient runs for some number of iterations until convergence, let's call that n iterations. After that, it has n predictors_ (in your case of type GridSearchCV) and associated weights_. Any predictions it makes are probabilistic with weights_ being the set of probabilities that you pick the corresponding model to make the prediction. That means if you call predict with the same input multiple times you can get different outputs. This behavior is required to fulfill the fairness constraints.

Typically, very few of the models actually have non-zero weights, so it's unlikely that you have hundreds of models from which you choose randomly. Additionally, you could just check those models in predictors_ and evaluate if one of them might be useful for your purposes. You could use plot_model_comparison to compare them.

Let me know how it goes and if you have further questions!

@romanlutz romanlutz added question Further information is requested documentation labels Jan 25, 2023
@tmcarvalho
Copy link
Author

Thank you @romanlutz for your answer! It worked for me.

@romanlutz
Copy link
Member

Excellent. I think we should leave this issue open as a reminder to add documentation for this.

@tmcarvalho
Copy link
Author

Hi @romanlutz
Could you help me with another question please? -_-
The experiments using ExponentiatedGradient with GridSearchCV take too much time to run. For one dataset, GridSearchCV takes at most 1h to run. When I pass GridSearchCV to ExponentiatedGradient, it takes at least more than 6h... I don't even know the required time to run it because I have killed the process.
Do you have any idea why this happen?

@MiroDudik
Copy link
Member

It sounds like you are tuning hyperparameters inside exponentiated gradient. I would recommend tuning the hyperparameters on your base model (in your example it's the RandomForestClassifier) without any unfairness mitigation--and then using the resulting hyperparameters in the call to exponentiated gradient.

@CarlaFernandez
Copy link

Hello @MiroDudik , just commenting to say that it would be really useful to have the information from the Credit loan decisions example notebook in the ExponentiatedGradient user guide entry. I've been looking for this everywhere and didn't find it until now!
Thank you very much :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants