Publication:

Bridging interpretable AI methods to systems biology and medical informatics

Loading...
Thumbnail Image

Date

2023-01-19

Authors

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Yuan, Bo. 2023. Bridging interpretable AI methods to systems biology and medical informatics. Doctoral dissertation, Harvard University Graduate School of Arts and Sciences.

Abstract

Computational modeling of biomedical systems can be used to describe and make therapeutically useful predictions of system behaviors, such as identifying potential drug targets or early detection of high-risk patients for certain diseases. One promising approach is machine learning which has been shown to be effective to model and predict for complex systems. Major challenges for applying large-scale machine learning models to biology and medicine are to find global optima in a complex multidimensional space and mechanistically interpret the solutions. Here we designed specialized machine learning methods to simulate and make predictions for biomedical systems on (i) the molecular level and (ii) the patient level.

(i) Systematic perturbation of cells followed by comprehensive measurements of molecular and phenotypic responses provides informative data resources for constructing computational models of cell biology. We developed a hybrid approach that combines explicit mathematical models of cell dynamics with a machine-learning framework to quantitatively predict cell behavior in response to perturbation of molecular targets. We used these computational models to predict for cellular response to unseen combinatorial perturbations based on network models of cell biology. Independent of prior knowledge, the resulting \textit{de novo} network models recapitulate known interactions with completely data-driven training. We conducted a range of power analyses and demonstrated the approach can be generalized to other cellular systems and is readily applicable to various kinetic models of cell biology.

(ii) Pancreatic cancer is an aggressive disease that typically presents late with poor patient outcomes. There is a pronounced medical need for early detection of pancreatic cancer. Here we used nationwide patient registries from US and Denmark, and explicitly trained machine learning models on the time sequence of diseases in patient clinical histories and test the ability to predict cancer occurrence in time intervals of 3 to 60 months after risk assessment. We performed an explainability analysis of the trained model to find the predictive disease features captured by the models. We showed that our models raise the state-of-the-art performance of cancer risk prediction on real-world datasets and provide support for the design of prediction-surveillance programs for high-risk patients. Cross-application of the Danish deep learning model on the US-VA dataset has lower accuracy, indicating a requirement of independent training in health systems with different coding practices. The use of AI on real-world clinical records has the potential to shift focus from treatment of late-stage to early-stage cancer, benefiting patients by improving lifespan and quality of life.

Description

Other Available Sources

Research Data

Keywords

cancer biology, health records, interpretability, machine learning, perturbation biology, scientific machine learning, Biology

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories