Skip to content

Collection of Technical Solutions by Domain Risk Category

Discrimination & Toxicity

  • Local Explainability: The paper leverages local explanation techniques, specifically LIME (Local Interpretable Model-agnostic Explanations), to approximate the model's behavior. LIME generates interpretable models, such as decision trees, that explain predictions for a specific instance. In this approach, the local explanation is used as the "path" in the symbolic execution, and the decisions in the decision tree are toggled to generate new constraints

Main Source: https://arxiv.org/pdf/1809.03260

  • FACET is a benchmark for evaluating fairness and performance disparities in computer vision models across demographic attributes.

Main Source: https://arxiv.org/pdf/2309.00035

Privacy & Security

  • This paper proposes a new benchmark called CONFAIDE to evaluate the privacy reasoning capabilities of large language models (LLMs) in interactive settings. CONFAIDE is grounded in the theory of contextual integrity, which defines privacy as the appropriate flow of information within specific social contexts

Main Source: https://arxiv.org/pdf/2310.17884