Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with UtilityParity indexing #1338

Open
adrinjalali opened this issue Jan 24, 2024 · 4 comments
Open

Issue with UtilityParity indexing #1338

adrinjalali opened this issue Jan 24, 2024 · 4 comments

Comments

@adrinjalali
Copy link
Member

adrinjalali commented Jan 24, 2024

We have this code:

        self.pos_basis = pd.DataFrame()
        self.neg_basis = pd.DataFrame()
        self.neg_basis_present = pd.Series(dtype="float64")
        zero_vec = pd.Series(0.0, self.index)
        i = 0

        for e in event_vals:
            # Constraints on the final group are redundant, so they are not
            # included in the basis.
            for g in group_vals[:-1]:
                self.pos_basis[i] = 0 + zero_vec
                self.neg_basis[i] = 0 + zero_vec
                self.pos_basis[i]["+", e, g] = 1
                self.neg_basis[i]["-", e, g] = 1
                self.neg_basis_present.at[i] = True
                i += 1

which causes a few issues, due to indexing and copy-on-write, so a better way of writing the same code seems to be:

        self.neg_basis_present = pd.Series(dtype="float64")
        col_count = len(event_vals) * (len(group_vals) - 1)
        self.pos_basis = pd.DataFrame(0.0, index=self.index, columns=range(col_count))
        self.neg_basis = pd.DataFrame(0.0, index=self.index, columns=range(col_count))

        i = 0

        for e in event_vals:
            # Constraints on the final group are redundant, so they are not
            # included in the basis.
            for g in group_vals[:-1]:
                self.pos_basis.loc[("+", e, g), i] = 1
                self.neg_basis.loc[("-", e, g), i] = 1
                self.neg_basis_present.at[i] = True
                i += 1

However, the former produces:

                         0    1    2    3    4    5    6    7    8    9
sign event   group_id                                                  
+    label=0 0,0       1.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
             1,1       0.0  1.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
             2,2       0.0  0.0  1.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
     label=1 1,1       0.0  0.0  0.0  0.0  0.0  0.0  1.0  0.0  0.0  0.0
             3,3       0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  1.0  0.0
             4,4       0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  1.0
             5,5       0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
-    label=0 0,0       0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
             1,1       0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
             2,2       0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
     label=1 1,1       0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
             3,3       0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
             4,4       0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
             5,5       0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0

while the second code produces:

                         0    1    2    3    4    5    6    7    8    9
sign event   group_id                                                  
+    label=0 0,0       1.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
             1,1       0.0  1.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
             2,2       0.0  0.0  1.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
     label=1 1,1       0.0  0.0  0.0  0.0  0.0  0.0  1.0  0.0  0.0  0.0
             3,3       0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  1.0  0.0
             4,4       0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  1.0
             5,5       0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
-    label=0 0,0       0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
             1,1       0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
             2,2       0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
     label=1 1,1       0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
             3,3       0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
             4,4       0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
             5,5       0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
+    label=0 3,3       NaN  NaN  NaN  1.0  NaN  NaN  NaN  NaN  NaN  NaN
             4,4       NaN  NaN  NaN  NaN  1.0  NaN  NaN  NaN  NaN  NaN
     label=1 0,0       NaN  NaN  NaN  NaN  NaN  1.0  NaN  NaN  NaN  NaN
             2,2       NaN  NaN  NaN  NaN  NaN  NaN  NaN  1.0  NaN  NaN

I don't really understand this part of the code, and it seems @MiroDudik wrote it, but I'm not sure if he's got time to check. Anybody? @fairlearn/fairlearn-maintainers

Note that this change is necessary in new pandas releases.

@romanlutz
Copy link
Member

Do you have the example code which produces this output? Looks like the last 4 rows are added in the second one. Reading the code, it not obvious to me why that is. Might need to step through.

@adrinjalali
Copy link
Member Author

This test is the one generating the data for the above matrices: TestEqualizedOdds::test_many_sensitive_feature_groups_warning

Here you have the code to trigger the issue: #1339, althrough the issue was triggered in the previous pandas PR when merged, so to compare you need to checkout the PR where pyarrow was added, or anything before that.

@riedgar-ms
Copy link
Member

I have a feeling this is related to the trouble I'm having with #1351 . That's eventually failing due to a vector being the wrong size for multiplication... and the reason it's the wrong size seems to be a few NaN entries

@riedgar-ms
Copy link
Member

Weirdly, pushing the pandas version back isn't 'fixing' the issue either. I'm not sure why that is :-(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants