Skip to content

tamlhp/awesome-privex

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

61 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Awesome PrivEx: Privacy-Preserving Explainable AI (PPXAI)

Awesome arXiv Website GitHub stars Hits Contrib

A collection of academic articles, published methodology, and datasets on the subject of Privacy-Preserving Explainable AI.

A sortable version is available here: https://awesome-privex.github.io/

Please read and cite our paper: arXiv

Nguyen, T.T., Huynh, T.T., Ren, Z., Nguyen, T.T., Nguyen, P.L., Yin, H. and Nguyen, Q.V.H., 2024. A Survey of Privacy-Preserving Model Explanations: Privacy Risks, Attacks, and Countermeasures. arXiv preprint arXiv:2404.00673.

Citation

@article{nguyen2024survey,
  title={A Survey of Privacy-Preserving Model Explanations: Privacy Risks, Attacks, and Countermeasures},
  author={Nguyen, Thanh Tam and Huynh, Thanh Trung and Ren, Zhao and Nguyen, Thanh Toan and Nguyen, Phi Le and Yin, Hongzhi and Nguyen, Quoc Viet Hung},
  journal={arXiv preprint arXiv:2404.00673},
  year={2024}
}

Existing Surveys

Paper Title Venue Year
Adversarial attacks and defenses in explainable artificial intelligence: A survey Information Fusion 2024
A Survey of Privacy Attacks in Machine Learning CSUR 2023
SoK: Taming the Triangle -- On the Interplays between Fairness, Interpretability and Privacy in Machine Learning arXiv 2023
When Machine Learning Meets Privacy: A Survey and Outlook CSUR 2021
Explaining Explanations: An Overview of Interpretability of Machine Learning DSAA 2018
A Survey of Methods for Explaining Black Box Models CSUR 2018
Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning CCS 2018

Taxonomy

taxonomy


Approaches

Title Year Venue Target Explanations Attacks Defenses Code
Please Tell Me More: Privacy Impact of Explainability through the Lens of Membership Inference Attack 2024 SP Feature-based Membership Inference Differential Privacy, Privacy-Preserving Models, DP-SGD -
On the Privacy Risks of Algorithmic Recourse 2023 AISTATS Counterfactual Membership Inference Differential Privacy -
The Privacy Issue of Counterfactual Explanations: Explanation Linkage Attacks 2023 TIST Counterfactual Linkage Anonymisaion -
Feature-based Learning for Diverse and Privacy-Preserving Counterfactual Explanations 2023 KDD Counterfactual - Perturbation [Code]
Private Graph Extraction via Feature Explanations 2023 PETS Feature-based Graph Extraction Perturbation [Code]
Privacy-Preserving Algorithmic Recourse 2023 ICAIF Counterfactual - Differential Privacy -
Accurate, Explainable, and Private Models: Providing Recourse While Minimizing Training Data Leakage 2023 ICML-Workshop Counterfactual Membership Inference Differential Privacy -
Probabilistic Dataset Reconstruction from Interpretable Models 2023 arXiv Interpretable Surrogates Data Reconstruction - [Code]
DeepFixCX: Explainable privacy-preserving image compression for medical image analysis 2023 WIREs-DMKD Case-based Identity recognition Anonymisation [Code]
XorSHAP: Privacy-Preserving Explainable AI for Decision Tree Models 2023 Preprint Shapley - Multi-party Computation -
DP-XAI 2023 Github ALE plot - Differential Privacy [Code]
Inferring Sensitive Attributes from Model Explanations 2022 CIKM Gradient-based, Perturbation-based Attribute Inference - [Code]
Model explanations with differential privacy 2022 FAccT Feature-based - Differential Privacy -
DualCF: Efficient Model Extraction Attack from Counterfactual Explanations 2022 FAccT Counterfactual Model Extraction - -
Feature Inference Attack on Shapley Values 2022 CCS Shapley Attribute/Feature Inference Low-dimensional -
Evaluating the privacy exposure of interpretable global explainers, Privacy Risk of Global Explainers 2022 CogMI Interpretable Surrogates Membership Inference - -
Privacy-Preserving Case-Based Explanations: Enabling Visual Interpretability by Protecting Privacy 2022 IEEE Access Example-based - Anonymisation -
On the amplification of security and privacy risks by post-hoc explanations in machine learning models 2022 arXiv Feature-based Membership Inference - -
Differentially Private Counterfactuals via Functional Mechanism 2022 arXiv Counterfactual - Differential Privacy -
Differentially Private Shapley Values for Data Evaluation 2022 arXiv Shapley - Differential Privacy [Code]
Exploiting Explanations for Model Inversion Attacks 2021 ICCV Gradient-based, Interpretable Surrogates Model Inversion - -
On the Privacy Risks of Model Explanations 2021 AIES Feature-based, Shapley, Counterfactual Membership Inference - -
Adversarial XAI Methods in Cybersecurity 2021 TIFS Counterfactual Membership Inference - -
MEGEX: Data-Free Model Extraction Attack against Gradient-Based Explainable AI 2021 arXiv Gradient-based Model Extraction - [Code]
Robust Counterfactual Explanations for Privacy-Preserving SVM, Robust Explanations for Private Support Vector Machines 2021 ICML-Workshop Counterfactual - Private SVM [Code]
When Differential Privacy Meets Interpretability: A Case Study 2021 RCV-CVPR Interpretable Models - Differential Privacy -
Differentially Private Quantiles 2021 ICML Quantiles - Differential Privacy [Code]
FOX: Fooling with Explanations : Privacy Protection with Adversarial Reactions in Social Media 2021 PST - Attribute Inference Privacy-Protecting Explanation -
Privacy-preserving generative adversarial network for case-based explainability in medical image analysis 2021 IEEE Access Example-based - Generative Anonymisation -
Interpretable and Differentially Private Predictions 2020 AAAI Locally linear maps - Differential Privacy [Code]
Model extraction from counterfactual explanations 2020 arXiv Counterfactual Model Extraction - [Code]
Model Reconstruction from Model Explanations 2019 FAT* Gradient-based Model Reconstruction, Model Extraction - -
Interpret Federated Learning with Shapley Values 2019 __ Shapley - Federated [Code]
Collaborative Explanation of Deep Models with Limited Interaction for Trade Secret and Privacy Preservation 2019 WWW Feature-based - Collaborative rule-based model -
Model inversion attacks that exploit confidence information and basic countermeasures 2015 CCS Confidence scores Reconstruction, Model Inversion - -

Datasets

Type: Image

Dataset #Items Disk Size Downstream Explanations #Papers Used
MNIST 70K 11MB Counterfactuals, Gradient 4
CIFAR 60K 163MB Gradient 4
SVHN 600K 400MB+ Gradient 1
Food101 100K+ 10GB Case-based 1
Flowers102 8K+ 300MB+ Case-based 1
Cervical 8K+ 46GB+ Case-based, Interpretable Models 1
CheXpert 220K+ GBs Black-box 1
Facial Expression 12K+ 63MB Gradient 1
Celeb 200K GBs Counterfactuals, Shapley, Gradient, Perturbation 1

Type: Tablular

Dataset #Items Disk Size Downstream Explanations #Papers Used
Adult 48K+ 10MB Counterfactuals, Shapley 10+
COMPAS 7K+ 25MB Counterfactuals, Shapley 2
FICO 10K+ ≤ 1MB Counterfactuals, Shapley 4
Boston Housing 500+ ≤ 1MB Counterfactuals, Shapley 1
German Credit 1K ≤ 1MB Counterfactuals, Shapley 4
Student Admission 500 ≤ 1MB Counterfactuals, Shapley, Gradient, Perturbation 1
Student Performance 10K ≤ 1MB Counterfactuals, Shapley 1
GMSC 150K+ 15MB Interpretable models, Counterfactuals 2
Diabetes 100K+ 20MB Feature-based 5
Breast Cancer 569 < 1MB Feature-based 1

Type: Graph

Dataset #Items Disk Size Downstream Explanations #Papers Used
Cora 2K+ 4.5MB Feature-based 1
Bitcoin 30K ≤ 1MB Counterfactuals 1
CIC-IDS2017 2.8M+ 500MB Black-box 1

Type: Text

Dataset #Items Disk Size Downstream Explanations #Papers Used
IMDB Review 50K 66MB Black-box 1

Evaluation Metrics

Category Evaluation Metrics Formula/Description Usage
Explanation Utility Counterfactual validity $\text{Pureness} = \frac{\text{no. value combinations with desired outcome}}{\text{no. value combinations}}$
Assess the range of attribute values within k-anonymous counterfactual instances. Consider all attributes, including those beyond quasi-identifiers
Classification metric $CM = \frac{\sum\limits_{i=1}^{N} \text{penalty}(tuple_i)}{N}$
Assess equivalence classes within anonymized datasets, focusing on class label uniformity.
Faithfulness
(RDT-Fidelity)
$\mathcal{F}(\mathcal{E}_X)$ (see our paper)
Reflect how often the model's predictions are unchanged despite perturbations to the input, which would suggest that the explanation is effectively capturing the reasoning behind the model's predictions.
Sparsity $H(p) = -\sum_{f \in M} p(f) \log p(f)$
A complete and faithful explanation to the model should inherently be sparse, focusing only on a select subset of features that are most predictive of the model's decision.
Information Loss Normalised Certainty Penalty
(NCP)
$\text{NCP}(G) = \sum\limits_{i=1}^{d} w_i \cdot \text{NCP}_{A_i}(G)$
Higher NCP values indicate a greater degree of generalization and more information loss. This metric helps in assessing the balance between data privacy and utility.
Discernibility
$C_{DM}(g, k) = \sum_{VE s.t. |E| \geq k} |E|^2 + \sum_{VE ,s.t., |E| &lt; k} |D||E|$
Measure the penalties on tuples in a dataset after k-anonymization, reflecting how indistinguishable they are post-anonymization
Approximation Loss
$\mathcal{E}(\hat{\phi}, \mathcal{Z}, f(X)) \triangleq \mathbb{E} [\mathcal{L}(\hat{\phi}, \mathcal{Z}, f(X)) - \mathcal{L}(\phi^*, \mathcal{Z}, f(X))].$
Measure the error caused by randomness added when minimizing the privacy loss as the expected deviation of the randomized explanation from the best local approximation
Explanation Intersection
The percentage of bits in the original explanation that is retained in the privatised explanation after using differential privacy
The higher the better but due to privacy-utility trade-off, this metric should not be 100%.
Privacy Degree k-anonymity
A person's information is indistinguishable from at least k-1 other individuals.
Refers to the number of individuals in the training dataset to whom a given explanation could potentially be linked.
Information Leakage
$Pr_{i=1..k}\hat{\phi}(\mathbf{z_i}, X, f_D(X)) \leq e^{\hat{\varepsilon}} \cdot Pr[\hat{\phi}(\mathbf{z_i}, X, f'_D(X)) : \forall i] + \hat{\delta}$
If an adversary can access model explanations, they would not gain any additional information that could help in inferring something about the training data beyond what could be learned from the model predictions alone.
Privacy Budget The total privacy budget for all queries is fixed at $(\varepsilon, \delta)$.
The explanation algorithm must not exceed the overall budget across all queries. Stricter requirement $(\varepsilon_{min}, \delta_{min})$ is set for each individual query.
Attack Success Precision/Recall/F1 $Prec = \frac{TP}{TP+FP}$,
$Rec = \frac{TP}{TP+FN}$,
$F1 = 2 \times \frac{\text{precision} \times \text{recall}}{\text{precision} + \text{recall}}$
Evaluate an attack's effectiveness in correctly and completely identifying the properties it is designed to infer.
Balanced Accuracy $BA = \frac{TPR + TNR}{2}$
Measures the accuracy of attack (e.g., membership prediction in membership inference attacks), on a balanced dataset of members and non-members.
ROC/AUC
The ROC curve plots the true positive rate against the false positive rate at various threshold settings.
An AUC near 1 indicates a highly successful privacy attack, while an AUC close to 0.5 suggests no better performance than random guessing.
TPR at Low FPR Report TPR at a fixed FPR (e.g., 0.1%).
If an attack can pinpoint even a minuscule fraction of the training dataset with high precision, then the attack ought to be deemed effective.
Mean Absolute Error (MAE) $\ell_1 (\hat{x}, x) = \frac{1}{mn} \sum\limits_{j=1}^{m} \sum\limits_{i=1}^{n} | \hat{x}_i^j - x_i^j |,$
Gives an overview of how accurately an attack can reconstruct private inputs by averaging the absolute differences across all samples and features.
Success Rate (SR) $SR = \frac{|\hat{X}_{val} \neq \perp|}{mn}$
The ratio of successfully reconstructed features to the total number of features across all samples
Model Agreement $\text{Agreement} = \frac{1}{n} \sum\limits_{i=1}^{n} 1_{f_\theta(x_i) = h_\phi(x_i)}.$
A higher agreement indicates that the substitute model is more similar to the original model. When comparing two model extraction methods with the same agreement, the one with the lower standard deviation is preferred.
Average Uncertainty Reduction $Dist(\mathcal{D}^M, \mathcal{D}^{Orig}) = \frac{1}{n \cdot d} \sum\limits_{i=1}^{n} \sum\limits_{k=1}^{d} \frac{H(\mathcal{D}^M_{i,k})}{H(\mathcal{D}_{i,k})}$
The degree to which a data reconstruction attack is accurate, measured by the reduction in uncertainty across all features of all samples in the dataset

Disclaimer

Feel free to contact us if you have any queries or exciting news. In addition, we welcome all researchers to contribute to this repository and further contribute to the knowledge of this field.

If you have some other related references, please feel free to create a Github issue with the paper information. We will glady update the repos according to your suggestions. (You can also create pull requests, but it might take some time for us to do the merge)

HitCount visitors