Comparing base rates with selection rates #1317

fountaindive · 2023-11-20T16:56:42Z

fountaindive
Nov 20, 2023

Hi fairlearn community,

One thing I have seen is that sometimes the distribution of base-rates (i.e. fraction of positive outcomes in the test data) is different to the distribution of selection-rates (i.e., fraction of positive outcomes predicted by a model on the test data).

In this case the selection-rate distribution could still satisfy demographic parity however, the distributions could be wrong.

Do you think a useful metric to check for these cases would be to compute the difference between the base-rate and selection-rate in the test data (conditioned on sensitive feature)?

Does it make sense?

Hopefully that makes sense!

Thanks!

Answered by hildeweerts

Nov 22, 2023

Hi, @fountaindive - thanks for your question!

When it comes to fairness metrics, it is usually helpful to consider the specific type of harm you're trying to measure.

For example, we are usually interested in disparities in selection rates because they can tell us something about how the deployment of the model would affect the distribution of a particular resource (getting hired, receiving a loan, being selected for a care program, etc.). Depending on the causes and consequences of a disparity in the specific context of an application, different interventions could be appropriate (collecting data less affected by measurement bias, deciding to abandon development, etc.).

In the scenario y…

View full answer

hildeweerts · 2023-11-22T13:28:32Z

hildeweerts
Nov 22, 2023
Maintainer

Hi, @fountaindive - thanks for your question!

When it comes to fairness metrics, it is usually helpful to consider the specific type of harm you're trying to measure.

For example, we are usually interested in disparities in selection rates because they can tell us something about how the deployment of the model would affect the distribution of a particular resource (getting hired, receiving a loan, being selected for a care program, etc.). Depending on the causes and consequences of a disparity in the specific context of an application, different interventions could be appropriate (collecting data less affected by measurement bias, deciding to abandon development, etc.).

In the scenario you're describing, it seems differences between the group-specific base rates and selection rates are problematic because these would imply that some sensitive groups receive more or less of a particular resource than implied by their base rate. In other words, you would measure the extent to which the group 'misses out' or 'receives unduly' at a group level. This requires the assumption that the base rates accurately correspond to what each group 'deserves', i.e., the ground truth corresponds to the 'fair' distribution of resources.

However, differences in base rate/selection rate at a group level could correspond to a host of different scenarios. To give an (unrealistic and crude) example, if you know that 40% of the members of a protected group would benefit from a care program, a selection rate of 0.6 could include all of these people (0.4) plus a bunch of negatives (0.2), but it could also include all negatives (0.6) and none of the people who would benefit from the program. In contexts where we can reasonably assume that the ground-truth variable corresponds to a fair distribution, it IMO makes sense to (also) consider metrics related to predictive performance (disparities in accuracy, false positive rates, false negative rates, or whatever metric corresponds best to the potential harm people are facing.). This would allow you to quantify whether the "right" people are receiving the "right" prediction at an individual level, which you can subsequently aggregate to the group level.

In any case, it is always very hard to make any strong statements on what is a sensible fairness metric without considering a specific deployment context.

Let me know if this helps :)

4 replies

fountaindive Nov 22, 2023
Author

Hi @hildeweerts,

Thank you very much for your detailed response I really appreciate it.

I completely agree with your response and really like your example.

In the scenario you're describing, it seems differences between the group-specific base rates and selection rates are problematic because these would imply that some sensitive groups receive more or less of a particular resource than implied by their base rate. In other words, you would measure the extent to which the group 'misses out' or 'receives unduly' at a group level. This requires the assumption that the base rates accurately correspond to what each group 'deserves', i.e., the ground truth corresponds to the 'fair' distribution of resources.

Yes this is what I'm concerned about. Perhaps I can elaborate my thoughts with an example. Let's suppose that we have a sensitive feature $A$ and two groups $a_0$ and $a_1$. Also let $B(a_i)$ be the base rate for group $i$ and $S(a_i)$ be the selection rate for group $i$.

What if we have a situation where $B(a_0) \neq B(a_1)$ and we have a justifiable reason for this disparity. I think that when a model (e.g. binary classifier) is trained on this data that a reasonable fairness goal would be to achieve $B(a_0) \sim S(a_0)$ and $B(a_1) \sim S(a_1)$. Additionally, measuring the true positive rate and false positive rate for each group to then check to what degree, as you say, the "right" people are getting the "right" outcome.

I hope that makes sense! Thanks again!

hildeweerts Nov 22, 2023
Maintainer

[...] a reasonable fairness goal would be to achieve $B_(a_0) \sim S(a_0)$ [...]

I might be overly cautious in latching on to your specific wording, but I would strongly urge you to consider whether you are looking for a metric or an optimization constraint. In my view, measuring, let's say, $|S(a_i) - B_(a_i)|$ can be a reasonable sanity check to quantify potential problems in the distribution of resources between sensitive groups. It becomes a lot less suitable if you turn it into an optimization objective.

To be clear, a similar problem holds when you use demographic parity as a fairness constraint (i.e., how do you decide which people ought to receive a prediction different from the ground-truth label?), which makes it extremely tricky to use demographic parity as an optimization constraint. Generally, I would therefore always recommend people to investigate the cause of the disparity, before jumping on fair-ml algorithms.

fountaindive Nov 22, 2023
Author

Thanks for your reply, very thought provoking.

Thinking out loud here, just to check that my understanding about the difference between 'metric' vs 'optimization constraint'. A metric is just something we use to measure something. The constraint would be some condition we are trying to enforce e.g. by changing decision thresholds. Right?

In that case I'm looking for a metric and not an optimization constraint.

Thinking about your second paragraph clears up some of my confusion about the optimisation issues and not using the ground truth label so thank you for that.

Generally, I would therefore always recommend people to investigate the cause of the disparity, before jumping on fair-ml algorithms.

Completely agree with that.

I think in conclusion, given some level of understanding about the your belief in the accuracy of your data and distribution of base rates it is a reasonable sanity check to use something like $|S(a_{i})-B(a_{i})|$ as another performance metric. Checking how similar your selection rates are to your base rates. However, you have to have good evidence to justify the distribution of base rates.

hildeweerts Nov 22, 2023
Maintainer

Thinking out loud here, just to check that my understanding about the difference between 'metric' vs 'optimization constraint'. A metric is just something we use to measure something. The constraint would be some condition we are trying to enforce e.g. by changing decision thresholds. Right?

Exactly!

fountaindive · 2023-11-22T17:26:43Z

fountaindive
Nov 22, 2023
Author

Thank you so much for the discussion!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comparing base rates with selection rates #1317

{{title}}

Replies: 2 comments 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Comparing base rates with selection rates #1317

fountaindive Nov 20, 2023

Replies: 2 comments · 4 replies

hildeweerts Nov 22, 2023 Maintainer

fountaindive Nov 22, 2023 Author

hildeweerts Nov 22, 2023 Maintainer

fountaindive Nov 22, 2023 Author

hildeweerts Nov 22, 2023 Maintainer

fountaindive Nov 22, 2023 Author

fountaindive
Nov 20, 2023

Replies: 2 comments 4 replies

hildeweerts
Nov 22, 2023
Maintainer

fountaindive Nov 22, 2023
Author

hildeweerts Nov 22, 2023
Maintainer

fountaindive Nov 22, 2023
Author

hildeweerts Nov 22, 2023
Maintainer

fountaindive
Nov 22, 2023
Author