Publication: Quantifying Effects of Automated Noise Auditing Notices and Decision Structuring on Noise, Accuracy, and Fairness in Human Decision-Making
Open/View Files
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
Subjective decision-making by a single human decision-maker is pervasive in modern society; the effects of these decisions in domains like criminal justice are extremely significant. Prior work has aimed to improve human decision-making by developing predictive models that assist judges, but such algorithms often imperil fairness or are met with public hesitation. We thus aim to develop alternative ways of improving human decision-making with the assistance of automation and software but not complex algorithms. Specifically, we aim to reduce noise levels in human decision-making to make decisions more consistent, accurate, and fair. We define noise as the random inconsistencies and errors in decision-making that humans are prone to and that deterministic machines, by definition, avoid. In this work, we tested the effects of automated interpretations of various noise-reducing strategies proposed in literature on the noise, accuracy, and fairness of human-made decisions.
We had 300 total participants on Mechanical Turk evaluate criminal defendants' profiles and predict their risk of re-offending. In our control survey, we gave participants no additional information before they decided. In our first experiment we informed participants of typical noise levels in previous participants' decisions before they submitted their own predictions, in the second we asked reflective questions to structure the decision-making process before participants submitted decisions, and in our final experiment we calculated participants' own noise levels using a calibration test and informed them of this noise midway through the experiment. We found that none of our strategies lowered, or impacted at all, noise levels or accuracy in decisions. However, the calibration test actually lowered participants' positive predictive value from 58.9% to 49.5%. Structuring the decision-making process did improve fairness by lowering discrepancies between participants' positive predictive value on Black and white defendants. However, providing participants with the individualized, calibrated noise notice decreased fairness by creating a statistically significant difference between participants' prediction accuracy on white versus Black defendants (52.6% versus 60.2%). Though only the structuring strategy improved group fairness, it was participants who saw the calibrated noise notices who perceived themselves as fairer. This result implies that the calibrated noise notice intervention could dangerously lead decision-makers to falsely believe their decision-making has improved in fairness.
We were thus unable to yet identify a strategy that lowers noise levels, and only one of our strategies improved fairness and none improved accuracy. However, we believe that continuing to test the effects of more decision hygiene strategies on noise, accuracy, and fairness has valuable potential to improve human decision-making, since such strategies can serve as an alternative to machine learning interventions and their drawbacks.