Publication: Multi-Persona Oracles for Fair Classification
Open/View Files
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
As machine learning systems are increasingly deployed in high-stakes domains, incorporating fairness constraints into model training has become a central challenge. Most fairness-aware algorithms assume access to an idealized human fairness oracle—a source of supervision that is difficult to obtain at scale. Motivated by theories of value pluralism and drawing on ideas from generative social choice, we introduce the Multi-Persona Oracle Framework, which uses large language model (LLM) personas to simulate diverse, subjective perspectives on fairness, aiming to more effectively bridge theory with practice. We collect pairwise fairness judgments from 815 synthetic judges, each representing a unique combination of personality traits, racial identity, and ideological background. These judgments are elicited using carefully designed prompts and applied to 200 training and 800 test comparisons drawn from the COMPAS Recidivism dataset. We extend a no-regret learning framework for fairness-constrained classification, using these constraint sets to train classifiers and evaluate their generalization across unseen individuals and judges. We analyze generalization patterns at the level of individual judges, demographically grouped personas, and two baselines: a default LLM and an expert fairness-oriented persona. To assess robustness, we sweep over a range of fairness slack parameters γ and report accuracy alongside average and maximum fairness violations on heldout test constraints. In our proof-of-concept case study, our findings show that training on ensembles of judges yields strong generalization to fairness constraints in out-of-sample holdout sets, due to the complexity of fairness judgments and the nature of the Logistic Regression model.