Proxy tasks and subjective measures can be misleading in evaluating explainable AI systems

Bucinca, Zana; Lin, Phoebe; Gajos, Krzysztof; Glassman, Elena

dc.contributor.author	Bucinca, Zana
dc.contributor.author	Lin, Phoebe
dc.contributor.author	Gajos, Krzysztof
dc.contributor.author	Glassman, Elena
dc.date.accessioned	2021-04-13T14:29:20Z
dc.date.issued	2020-03-17
dc.identifier.citation	Buçinca, Zana, Lin, Phoebe, Gajos, Krzysztof, and Glassman, Elena. 2020. "Proxy Tasks and Subjective Measures Can Be Misleading in Evaluating Explainable AI Systems." Proceedings of the 25th International Conference on Intelligent User Interfaces (IUI’20), March 17–20, 2020, Cagliari, Italy: 454-64.	en_US
dc.identifier.isbn	9781450371186	en_US
dc.identifier.uri	https://nrs.harvard.edu/URN-3:HUL.INSTREPOS:37367251	*
dc.description.abstract	Explainable artificially intelligent (XAI) systems form part of sociotechnical systems, e.g., human+AI teams tasked with making decisions. Yet, current XAI systems are rarely evaluated by measuring the performance of human+AI teams on actual decision-making tasks. We conducted two online experiments and one in-person think-aloud study to evaluate two currently common techniques for evaluating XAI systems: (1) using proxy, artificial tasks such as how well humans predict the AI's decision from the given explanations, and (2) using subjective measures of trust and preference as predictors of actual performance. The results of our experiments demonstrate that evaluations with proxy tasks did not predict the results of the evaluations with the actual decision-making tasks. Further, the subjective measures on evaluations with actual decision-making tasks did not predict the objective performance on those same tasks. Our results suggest that by employing misleading evaluation methods, our field may be inadvertently slowing its progress toward developing human+AI teams that can reliably perform better than humans or AIs alone.	en_US
dc.description.sponsorship	Engineering and Applied Sciences	en_US
dc.language.iso	en_US	en_US
dc.publisher	ACM	en_US
dash.license	LAA
dc.subject	explanations	en_US
dc.subject	artificial intelligence	en_US
dc.subject	trust	en_US
dc.title	Proxy tasks and subjective measures can be misleading in evaluating explainable AI systems	en_US
dc.type	Conference Paper	en_US
dc.description.version	Version of Record	en_US
dash.depositing.author	Gajos, Krzysztof
dc.date.available	2021-04-13T14:29:20Z
dash.affiliation.other	Harvard John A. Paulson School of Engineering and Applied Sciences	en_US
dc.relation.book	Proceedings of the 25th International Conference on Intelligent User Interfaces	en_US
dc.identifier.doi	10.1145/3377325.3377498
dash.contributor.affiliated	Bucinca, Zana
dash.contributor.affiliated	Glassman, Elena
dash.contributor.affiliated	Gajos, Krzysztof

Files in this item

Name:: bucinca_iui20_proxy.pdf
Size:: 6.361Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

FAS Scholarly Articles [18292]

Show simple item record