Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data can be loaded only once #1210

Open
anilsh opened this issue Feb 11, 2023 · 2 comments
Open

data can be loaded only once #1210

anilsh opened this issue Feb 11, 2023 · 2 comments
Labels
Waiting for OP's Response Waiting for original poster's response, and will close if that doesn't happen for a while.

Comments

@anilsh
Copy link

anilsh commented Feb 11, 2023

When I try to run the GridSearch twice or ExponentiatedGradient after GridSearch, the constraints returns the following error.

AssertionError: data can be loaded only once

Full stack trace is:

File ~\OneDrive - EY\fairness2\FSRM-shEYzam-repos\shazamlib\group_fairness.py:313, in GroupFairness.reduction_grid_search(self, base_model)
    311 # train model 
    312 print ('Training base model on specified constraints..')
--> 313 model_gridsearch.fit(self.X_train, self.y_train, sensitive_features=self.S_train)
    314 self.inprocess['model'] = model_gridsearch
    316 # make predictions

File ~\Anaconda3\envs\sheyzam-fairness-env\lib\site-packages\fairlearn\reductions\_grid_search\grid_search.py:143, in GridSearch.fit(self, X, y, **kwargs)
    141 # Prep the parity constraints and objective
    142 logger.debug("Preparing constraints and objective")
--> 143 self.constraints.load_data(X, y, **kwargs)
    144 objective = self.constraints.default_objective()
    145 objective.load_data(X, y, **kwargs)

File ~\Anaconda3\envs\sheyzam-fairness-env\lib\site-packages\fairlearn\reductions\_moments\utility_parity.py:333, in DemographicParity.load_data(self, X, y, sensitive_features, control_features)
    331 base_event = pd.Series(data=_ALL, index=y_train.index)
    332 event = _merge_event_and_control_columns(base_event, cf_train)
--> 333 super().load_data(X, y_train, event=event, sensitive_features=sf_train)

File ~\Anaconda3\envs\sheyzam-fairness-env\lib\site-packages\fairlearn\reductions\_moments\utility_parity.py:146, in UtilityParity.load_data(self, X, y, sensitive_features, event, utilities)
    123 def load_data(
    124     self,
    125     X,
   (...)
    130     utilities=None,
    131 ):
    132     """Load the specified data into this object.
    133 
    134     This adds a column `event` to the `tags` field.
   (...)
    144 
    145     """
--> 146     super().load_data(X, y, sensitive_features=sensitive_features)
    147     self.tags[_EVENT] = event
    148     if utilities is None:

File ~\Anaconda3\envs\sheyzam-fairness-env\lib\site-packages\fairlearn\reductions\_moments\moment.py:42, in Moment.load_data(self, X, y, sensitive_features)
     30 def load_data(self, X, y: pd.Series, *, sensitive_features: pd.Series = None):
     31     """Load a set of data for use by this object.
     32 
     33     Parameters
   (...)
     40         The sensitive feature vector (default None)
     41     """
---> 42     assert self.data_loaded is False, "data can be loaded only once"
     43     if sensitive_features is not None:
     44         assert isinstance(sensitive_features, pd.Series)

AssertionError: data can be loaded only once
@hildeweerts hildeweerts added the Waiting for OP's Response Waiting for original poster's response, and will close if that doesn't happen for a while. label Apr 17, 2023
@hildeweerts
Copy link
Contributor

Hi @anilsh. Please follow the instructions from the bug report template to print all dependencies.

@romanlutz
Copy link
Member

This is by design AFAIK. If we allowed loading multiple times you could have something like

constraint = DemographicParity()
eg = ExponentiatedGradient(constraints=constraint, ...)
eg.fit(...)  # calls load_data and sets fields internal to the moment
constraint.load_data(different_data)

In other words, one could mess up the constraint object in weird ways. I can see two changes we could make

  1. Perhaps load_data should be _load_data to avoid giving people the impression that it's something they could use (?), and
  2. perhaps we should clone the constraints object before using it internally. That way, we could pass the same constraints object to several different mitigators without corrupting it in the process.

@MiroDudik wdyt?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Waiting for OP's Response Waiting for original poster's response, and will close if that doesn't happen for a while.
Projects
None yet
Development

No branches or pull requests

3 participants