Publication:

Multi-Persona Oracles for Fair Classification

Loading...
Thumbnail Image

Date

2025-05-27

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Acharya, Rhea Lily. 2025. Multi-Persona Oracles for Fair Classification. Bachelors Thesis, Harvard University Engineering and Applied Sciences.

Abstract

As machine learning systems are increasingly deployed in high-stakes domains, incorporating fairness constraints into model training has become a central challenge. Most fairness-aware algorithms assume access to an idealized human fairness oracle—a source of supervision that is difficult to obtain at scale. Motivated by theories of value pluralism and drawing on ideas from generative social choice, we introduce the Multi-Persona Oracle Framework, which uses large language model (LLM) personas to simulate diverse, subjective perspectives on fairness, aiming to more effectively bridge theory with practice. We collect pairwise fairness judgments from 815 synthetic judges, each representing a unique combination of personality traits, racial identity, and ideological background. These judgments are elicited using carefully designed prompts and applied to 200 training and 800 test comparisons drawn from the COMPAS Recidivism dataset. We extend a no-regret learning framework for fairness-constrained classification, using these constraint sets to train classifiers and evaluate their generalization across unseen individuals and judges. We analyze generalization patterns at the level of individual judges, demographically grouped personas, and two baselines: a default LLM and an expert fairness-oriented persona. To assess robustness, we sweep over a range of fairness slack parameters γ and report accuracy alongside average and maximum fairness violations on heldout test constraints. In our proof-of-concept case study, our findings show that training on ensembles of judges yields strong generalization to fairness constraints in out-of-sample holdout sets, due to the complexity of fairness judgments and the nature of the Logistic Regression model.

Description

Other Available Sources

Research Data

Keywords

Computer science

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories