Publication:
Proxy tasks and subjective measures can be misleading in evaluating explainable AI systems

Thumbnail Image

Date

2020-03-17

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

ACM
The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Buçinca, Zana, Lin, Phoebe, Gajos, Krzysztof, and Glassman, Elena. 2020. "Proxy Tasks and Subjective Measures Can Be Misleading in Evaluating Explainable AI Systems." Proceedings of the 25th International Conference on Intelligent User Interfaces (IUI’20), March 17–20, 2020, Cagliari, Italy: 454-64.

Research Data

Abstract

Explainable artificially intelligent (XAI) systems form part of sociotechnical systems, e.g., human+AI teams tasked with making decisions. Yet, current XAI systems are rarely evaluated by measuring the performance of human+AI teams on actual decision-making tasks. We conducted two online experiments and one in-person think-aloud study to evaluate two currently common techniques for evaluating XAI systems: (1) using proxy, artificial tasks such as how well humans predict the AI's decision from the given explanations, and (2) using subjective measures of trust and preference as predictors of actual performance. The results of our experiments demonstrate that evaluations with proxy tasks did not predict the results of the evaluations with the actual decision-making tasks. Further, the subjective measures on evaluations with actual decision-making tasks did not predict the objective performance on those same tasks. Our results suggest that by employing misleading evaluation methods, our field may be inadvertently slowing its progress toward developing human+AI teams that can reliably perform better than humans or AIs alone.

Description

Other Available Sources

Keywords

explanations, artificial intelligence, trust

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Referenced By

Related Stories