Objective criteria for explanations of machine learning models

Abstract

Objective criteria to evaluate the performance of machine learning (ML) model explanations are a critical ingredient in bringing greater rigor to the field of explainable artificial intelligence. In this article, we survey three of our proposed criteria that each target different classes of explanations. In the first, targeted at real-valued feature importance explanations, we define a class of “infidelity” measures that capture how well the explanations match the ML models. We show that instances of such infidelity minimizing explanations correspond to many popular recently proposed explanations and, moreover, can be shown to satisfy well-known game-theoretic axiomatic properties. In the second, targeted to feature set explanations, we define a robustness analysis-based criterion and show that deriving explainable feature sets based on the robustness criterion yields more qualitatively impressive explanations. Lastly, for sample explanations, we provide a decomposition-based criterion that allows us to provide very scalable and compelling classes of sample-based explanations.

Publication
In Applied AI letters 2021