Comparing base rates with selection rates #1317
-
Hi fairlearn community, One thing I have seen is that sometimes the distribution of base-rates (i.e. fraction of positive outcomes in the test data) is different to the distribution of selection-rates (i.e., fraction of positive outcomes predicted by a model on the test data). In this case the selection-rate distribution could still satisfy demographic parity however, the distributions could be wrong. Do you think a useful metric to check for these cases would be to compute the difference between the base-rate and selection-rate in the test data (conditioned on sensitive feature)? Does it make sense? Hopefully that makes sense! Thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 4 replies
-
Hi, @fountaindive - thanks for your question! When it comes to fairness metrics, it is usually helpful to consider the specific type of harm you're trying to measure. For example, we are usually interested in disparities in selection rates because they can tell us something about how the deployment of the model would affect the distribution of a particular resource (getting hired, receiving a loan, being selected for a care program, etc.). Depending on the causes and consequences of a disparity in the specific context of an application, different interventions could be appropriate (collecting data less affected by measurement bias, deciding to abandon development, etc.). In the scenario you're describing, it seems differences between the group-specific base rates and selection rates are problematic because these would imply that some sensitive groups receive more or less of a particular resource than implied by their base rate. In other words, you would measure the extent to which the group 'misses out' or 'receives unduly' at a group level. This requires the assumption that the base rates accurately correspond to what each group 'deserves', i.e., the ground truth corresponds to the 'fair' distribution of resources. However, differences in base rate/selection rate at a group level could correspond to a host of different scenarios. To give an (unrealistic and crude) example, if you know that 40% of the members of a protected group would benefit from a care program, a selection rate of 0.6 could include all of these people (0.4) plus a bunch of negatives (0.2), but it could also include all negatives (0.6) and none of the people who would benefit from the program. In contexts where we can reasonably assume that the ground-truth variable corresponds to a fair distribution, it IMO makes sense to (also) consider metrics related to predictive performance (disparities in accuracy, false positive rates, false negative rates, or whatever metric corresponds best to the potential harm people are facing.). This would allow you to quantify whether the "right" people are receiving the "right" prediction at an individual level, which you can subsequently aggregate to the group level. In any case, it is always very hard to make any strong statements on what is a sensible fairness metric without considering a specific deployment context. Let me know if this helps :) |
Beta Was this translation helpful? Give feedback.
-
Thank you so much for the discussion! |
Beta Was this translation helpful? Give feedback.
Hi, @fountaindive - thanks for your question!
When it comes to fairness metrics, it is usually helpful to consider the specific type of harm you're trying to measure.
For example, we are usually interested in disparities in selection rates because they can tell us something about how the deployment of the model would affect the distribution of a particular resource (getting hired, receiving a loan, being selected for a care program, etc.). Depending on the causes and consequences of a disparity in the specific context of an application, different interventions could be appropriate (collecting data less affected by measurement bias, deciding to abandon development, etc.).
In the scenario y…