Publication: Trustworthy Machine Learning for Medicine
Open/View Files
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
Machine learning is achieving significant breakthroughs in various applications, including medical research, from analyzing genomic data to accelerating drug discovery and designing personalized treatment plans for patients. However, as machine learning is applied to such high-stakes domains where model errors and biases can adversely impact human lives, there is a growing focus on build- ing models that not only have high accuracy but that are also trustworthy. This dissertation studies three areas of trustworthy machine learning – interpretability, robustness, and safety alignment – and addresses key challenges in each area. In the area of interpretability, we develop a theoretical framework to understand the mathematical properties of explanation methods, elucidating their commonalities and differences, explaining why different methods can generate disagreeing explana- tions, and providing a principled approach to select among methods. In the area of robustness, we develop algorithms to efficiently estimate a model’s average-case robustness, enabling an accurate and efficient characterization of real-world model behavior for large-scale applications. Lastly, in the area of safety alignment, we develop a novel benchmark dataset and evaluate and improve the medical safety of large language models, finding that publicly-available medical large language models do not meet medical safety standards and that fine-tuning them on safety demonstrations can improve their safety while preserving their medical knowledge. Altogether, this research advances the conceptual understanding and practical application of trustworthy machine learning, especially in the medical domain, and paves the way for future research.