Publication: Towards Practical Applications of Machine Learning in Healthcare with Federated Learning
Open/View Files
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Research Data
Abstract
Federated Learning (FL) has emerged as a significant tool in healthcare machine learning, enabling institutions to collaboratively train models while maintaining data privacy. This dissertation describes the implementation of a real-world healthcare FL project and addresses the challenge of domain shift for more effective model deployment.
We begin by detailing a practical application of FL during the SARS-COV-2 pandemic. Twenty institutions collaborated on a healthcare FL study to develop the "EXAM" (EMR CXR AI Model), which predicts future oxygen requirements for symptomatic patients using vital signs, laboratory data, and chest x-rays. EXAM achieved an Area Under the Curve (AUC) of over 0.92, marking a 16% improvement and a 38% increase in generalizability over local models. This project demonstrated FL's ability to enable rapid scientific collaboration without data exchange, producing a model that generalized across heterogeneous, unharmonized datasets and provided the healthcare community with a validated tool to combat COVID-19.
Next, we address a specialized non-iid FL challenge termed \emph{Domain-mixed FL}, where each client's data is assumed to be a mixture of several predefined domains. We propose a novel method, FedDAR, which learns a shared domain representation and personalized prediction models in a decoupled manner. Theoretical proofs show that FedDAR achieves linear convergence in simplified settings, and extensive empirical studies on both synthetic and real-world datasets demonstrate its superiority over existing FL methods.
Finally, we explore the multi-dimensional domain shift problem prevalent in healthcare ML applications. We introduce a novel strategy using an ensemble of mixtures of experts (EMoE), each expert tailored to adapt to shifts along different dimensions. This approach is designed to be versatile and robust, suitable for both centralized and federated learning settings. Rigorous testing on various real-world datasets has shown that our method outperforms contemporary domain generalization and personalized federated learning approaches, effectively managing the complexities of multi-dimensional domain shifts.