Show simple item record

dc.contributor.advisorParkes, David C
dc.contributor.authorWatson-Daniels, Jamelle
dc.date.accessioned2024-06-01T13:39:08Z
dc.date.created2024
dc.date.issued2024-05-07
dc.date.submitted2024
dc.identifier.citationWatson-Daniels, Jamelle. 2024. The Roads Not Taken: Model Multiplicity in Machine Learning. Doctoral dissertation, Harvard University Graduate School of Arts and Sciences.
dc.identifier.other31294605
dc.identifier.urihttps://nrs.harvard.edu/URN-3:HUL.INSTREPOS:37378841*
dc.description.abstractIn machine learning, model multiplicity is the existence of multiple models that perform equally well for a given prediction task (also known as the "Rashomon effect" ). The set of near-optimal models is referred to as the "Rashomon set." Predictive multiplicity examines how predictions change over this set of near-optimal models. If model outputs vary significantly across similar models, this information can offer insight into predictive arbitrariness. In this thesis, I introduce frameworks for evaluating and leveraging predictive multiplicity in different settings. First, I present methods to measure predictive multiplicity in probabilistic classification (predicting the probability of a positive outcome) and develop optimization-based methods to compute these measures efficiently and reliably for convex empirical risk minimization problems. Empirical results show that real-world probabilistic classification tasks can in fact admit competing models that assign substantially different risk estimates. Additionally, I provide insight into how predictive multiplicity arises by analyzing dataset characteristics. Second, I formulate predictive multiplicity analysis in a resource constrained setting recognizing that predictive allocation tasks are governed by a resource budget. I also extend the multiplicity framing, outlining the concept of multi-target multiplicity for quantifying the impact of choices made in regard to target specification for a given predictive allocation task. With this framework, I demonstrate how to fit separate models that are useful for predicting the three outcomes of interest independently and arriving at a way of ranking patients that results in a more equitable allocation. Third, I investigate the connections between predictive multiplicity and predictive churn which is the change in predictions pre- and post- model update in response to a change in training data. I present empirical and theoretical results on characterizing churn in terms of the Rashomon set. Results show that churn unstable points overlap by more than 50 percent with ambiguity points. This points to similarities in the two concepts. Theoretical results to characterize predictive churn between two Rashomon sets as well as churn between models within one Rashomon set hinges on the type of Rashomon set. I focus on predictive multiplicity to advocate for transparency in the prediction model training procedure. These methods to evaluate predictive multiplicity, as well as connections with predictive churn, contribute to a larger effort for machine learning researchers to be accountable to the individuals affected by model predictions. Similar to a person deciding between roads to take while travelling, insight into alternative options (i.e., roads not taken) may provide insight into the significance of the decisions made.
dc.format.mimetypeapplication/pdf
dc.language.isoen
dash.licenseLAA
dc.subjectFairness
dc.subjectMachine Learning
dc.subjectMultiplicity
dc.subjectComputer science
dc.titleThe Roads Not Taken: Model Multiplicity in Machine Learning
dc.typeThesis or Dissertation
dash.depositing.authorWatson-Daniels, Jamelle
dc.date.available2024-06-01T13:39:08Z
thesis.degree.date2024-05
thesis.degree.grantorHarvard University Graduate School of Arts and Sciences
thesis.degree.levelDoctoral
thesis.degree.namePh.D.
dc.contributor.committeeMemberUstun, Berk
dc.contributor.committeeMemberProcaccia, Ariel
dc.contributor.committeeMemberChouldechova, Alexandra
dc.type.materialtext
thesis.degree.departmentEngineering and Applied Sciences - Applied Math
dc.identifier.orcid0000-0003-4711-8789
dash.author.emailjamelle.wd@gmail.com


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record