Can't use ExponentiatedGradient with GridSearchCV #1196

tmcarvalho · 2023-01-24T09:47:11Z

I want all the results of GridSearchCV and therefore, I need the cv_results_ from it. However, I passed the GridSearch to ExponentiatedGradient, and then I called the fit but after fitting the ExponentiatedGradient the cv_results_ were not returned.

Here is my code:

gs = GridSearchCV(
                estimator=model,
                param_grid=param,
                cv=RepeatedKFold(n_splits=5, n_repeats=2),
                scoring=scoring,
                return_train_score=True,
                refit='roc_auc_curve',
                n_jobs=-1)
mitigator = ExponentiatedGradient(gs, constraints=EqualizedOdds())
    
mitigator.fit(train_X, y_train, sensitive_features=x_train[set_sa])

And here is the issue.

print(mitigator.estimator)

GridSearchCV(cv=RepeatedKFold(n_repeats=2, n_splits=5, random_state=None),
             estimator=RandomForestClassifier(), n_jobs=-1,
             param_grid={'max_depth': [4, 7, 10],
                         'n_estimators': [100, 250, 500]},
             refit='roc_auc_curve', return_train_score=True,
             scoring={'acc': 'accuracy', 'bal_acc': 'balanced_accuracy',
                      'f1': 'f1', 'f1_weighted': 'f1_weighted',
                      'gmean': make_scorer(geometric_mean_score),
                      'roc_auc_curve': make_scorer(roc_auc_score, needs_proba=True, max_fpr=0.001)})

print(mitigator.estimator.cv_results_)

AttributeError: 'GridSearchCV' object has no attribute 'cv_results_'

Any suggestion?

The text was updated successfully, but these errors were encountered:

romanlutz · 2023-01-25T00:29:22Z

Excellent question @tmcarvalho ! It appears that the gaping hole in our documentation for ExponentiatedGradient is responsible for this. If we had a proper user guide it would tell you that estimator is just the original one you passed in. ExponentiatedGradient clones it in each of its iterations and saves the resulting GridSearchCV objects under predictors_. There's an example in https://fairlearn.org/main/auto_examples/plot_credit_loan_decisions.html#reductions-approach-to-unfairness-mitigation that has the following code

    exp_grad_est = ExponentiatedGradient(
        estimator=estimator,
        sample_weight_name='classifier__sample_weight',
        constraints=EqualizedOdds(difference_bound=epsilon),
    )
    exp_grad_est.fit(X_train, y_train, sensitive_features=A_train)
    predictors = exp_grad_est.predictors_

The last line is the one of interest as you can see predictors_ being extracted from the ExponentiatedGradient object.

Aside from that, there's another problem, though. ExponentiatedGradient runs for some number of iterations until convergence, let's call that n iterations. After that, it has n predictors_ (in your case of type GridSearchCV) and associated weights_. Any predictions it makes are probabilistic with weights_ being the set of probabilities that you pick the corresponding model to make the prediction. That means if you call predict with the same input multiple times you can get different outputs. This behavior is required to fulfill the fairness constraints.

Typically, very few of the models actually have non-zero weights, so it's unlikely that you have hundreds of models from which you choose randomly. Additionally, you could just check those models in predictors_ and evaluate if one of them might be useful for your purposes. You could use plot_model_comparison to compare them.

Let me know how it goes and if you have further questions!

tmcarvalho · 2023-01-25T13:52:10Z

Thank you @romanlutz for your answer! It worked for me.

romanlutz · 2023-01-25T14:22:06Z

Excellent. I think we should leave this issue open as a reminder to add documentation for this.

tmcarvalho · 2023-01-26T18:58:41Z

Hi @romanlutz
Could you help me with another question please? -_-
The experiments using ExponentiatedGradient with GridSearchCV take too much time to run. For one dataset, GridSearchCV takes at most 1h to run. When I pass GridSearchCV to ExponentiatedGradient, it takes at least more than 6h... I don't even know the required time to run it because I have killed the process.
Do you have any idea why this happen?

MiroDudik · 2023-01-26T19:15:17Z

It sounds like you are tuning hyperparameters inside exponentiated gradient. I would recommend tuning the hyperparameters on your base model (in your example it's the RandomForestClassifier) without any unfairness mitigation--and then using the resulting hyperparameters in the call to exponentiated gradient.

CarlaFernandez · 2023-07-20T11:20:29Z

Hello @MiroDudik , just commenting to say that it would be really useful to have the information from the Credit loan decisions example notebook in the ExponentiatedGradient user guide entry. I've been looking for this everywhere and didn't find it until now!
Thank you very much :)

romanlutz added question Further information is requested documentation labels Jan 25, 2023

hildeweerts mentioned this issue Jul 24, 2023

DOC update reductions user guide #1263

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't use ExponentiatedGradient with GridSearchCV #1196

Can't use ExponentiatedGradient with GridSearchCV #1196

tmcarvalho commented Jan 24, 2023

romanlutz commented Jan 25, 2023

tmcarvalho commented Jan 25, 2023

romanlutz commented Jan 25, 2023

tmcarvalho commented Jan 26, 2023

MiroDudik commented Jan 26, 2023

CarlaFernandez commented Jul 20, 2023

Can't use ExponentiatedGradient with GridSearchCV #1196

Can't use ExponentiatedGradient with GridSearchCV #1196

Comments

tmcarvalho commented Jan 24, 2023

romanlutz commented Jan 25, 2023

tmcarvalho commented Jan 25, 2023

romanlutz commented Jan 25, 2023

tmcarvalho commented Jan 26, 2023

MiroDudik commented Jan 26, 2023

CarlaFernandez commented Jul 20, 2023