Show simple item record

dc.contributor.authorBucinca, Zana
dc.contributor.authorLin, Phoebe
dc.contributor.authorGajos, Krzysztof
dc.contributor.authorGlassman, Elena
dc.date.accessioned2021-04-13T14:29:20Z
dc.date.issued2020-03-17
dc.identifier.citationBuçinca, Zana, Lin, Phoebe, Gajos, Krzysztof, and Glassman, Elena. 2020. "Proxy Tasks and Subjective Measures Can Be Misleading in Evaluating Explainable AI Systems." Proceedings of the 25th International Conference on Intelligent User Interfaces (IUI’20), March 17–20, 2020, Cagliari, Italy: 454-64.en_US
dc.identifier.isbn9781450371186en_US
dc.identifier.urihttps://nrs.harvard.edu/URN-3:HUL.INSTREPOS:37367251*
dc.description.abstractExplainable artificially intelligent (XAI) systems form part of sociotechnical systems, e.g., human+AI teams tasked with making decisions. Yet, current XAI systems are rarely evaluated by measuring the performance of human+AI teams on actual decision-making tasks. We conducted two online experiments and one in-person think-aloud study to evaluate two currently common techniques for evaluating XAI systems: (1) using proxy, artificial tasks such as how well humans predict the AI's decision from the given explanations, and (2) using subjective measures of trust and preference as predictors of actual performance. The results of our experiments demonstrate that evaluations with proxy tasks did not predict the results of the evaluations with the actual decision-making tasks. Further, the subjective measures on evaluations with actual decision-making tasks did not predict the objective performance on those same tasks. Our results suggest that by employing misleading evaluation methods, our field may be inadvertently slowing its progress toward developing human+AI teams that can reliably perform better than humans or AIs alone.en_US
dc.description.sponsorshipEngineering and Applied Sciencesen_US
dc.language.isoen_USen_US
dc.publisherACMen_US
dash.licenseLAA
dc.subjectexplanationsen_US
dc.subjectartificial intelligenceen_US
dc.subjecttrusten_US
dc.titleProxy tasks and subjective measures can be misleading in evaluating explainable AI systemsen_US
dc.typeConference Paperen_US
dc.description.versionVersion of Recorden_US
dash.depositing.authorGajos, Krzysztof
dc.date.available2021-04-13T14:29:20Z
dash.affiliation.otherHarvard John A. Paulson School of Engineering and Applied Sciencesen_US
dc.relation.bookProceedings of the 25th International Conference on Intelligent User Interfacesen_US
dc.identifier.doi10.1145/3377325.3377498
dash.contributor.affiliatedBucinca, Zana
dash.contributor.affiliatedGlassman, Elena
dash.contributor.affiliatedGajos, Krzysztof


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record