Collection of Technical Solutions by Domain Risk Category
Discrimination & Toxicity
- Local Explainability: The paper leverages local explanation techniques, specifically LIME (Local Interpretable Model-agnostic Explanations), to approximate the model's behavior. LIME generates interpretable models, such as decision trees, that explain predictions for a specific instance. In this approach, the local explanation is used as the "path" in the symbolic execution, and the decisions in the decision tree are toggled to generate new constraints
Main Source: https://arxiv.org/pdf/1809.03260
- FACET is a benchmark for evaluating fairness and performance disparities in computer vision models across demographic attributes.
Main Source: https://arxiv.org/pdf/2309.00035
Privacy & Security
- This paper proposes a new benchmark called CONFAIDE to evaluate the privacy reasoning capabilities of large language models (LLMs) in interactive settings. CONFAIDE is grounded in the theory of contextual integrity, which defines privacy as the appropriate flow of information within specific social contexts
Main Source: https://arxiv.org/pdf/2310.17884