Publication: NOVEL COMPUTATIONAL TOOLS FOR HIGH THROUGHPUT IN- SILICO PROTEIN-PROTEIN INTERACTION SCREENING
Open/View Files
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
Cells are complex biochemical systems that have evolved elaborate molecular processes to survive and reproduce. Many of these processes involve multiple proteins working together via direct physical interactions. Historically, identifying protein-protein interactions (PPIs) relied on slow, labor-intensive, and inaccurate experimental methods. Recent advances in deep learning have led to new in-silico approaches that can rapidly and accurately predict the structure of proteins and protein complexes from primary amino acid sequence. One of the most prominent examples is AlphaFold-multimer (AF-M), a neural network developed by DeepMind in 2022. Like others, we have begun using AF-M to perform large scale in-silico screens to uncover new PPIs. In my thesis work, I created tools to efficiently conduct AF-M PPI screens and evaluate the results. A core focus of my efforts was the creation of SPOC, a novel machine learning based classifier that examines structural predictions along with experimental omics data to assess the biological relevance of binary AF-M structure predictions. In addition, I built predictomes.org, a web platform that enables users to interact with and interpret massive AF-M screening datasets. As an initial proof of principle, I applied these tools to uncover PPIs in human genome maintenance by predicting structures for nearly all possible pairwise combinations among 300 proteins. The predictions were scored with SPOC and released to the community on predictomes.org. This screen uncovered new interactions and helped reveal mechanistic insights into processes ranging from transcription coupled nucleotide excision repair to DNA replication stalling during stress. In a separate collaborative effort with Dr. Lucas Farnung’s lab, I folded the H2A/H2B dimer with nearly all human nuclear proteins to identify proteins that engage with a composite nucleosomal surface known as the acidic patch. I developed an analysis pipeline that identified more than 40 hits, including the E3 ubiquitin ligase SHPRH. The Farnung laboratory used cryo-electron microcopy to solve the structure of SHPRH bound to the nucleosome, revealing an interaction that closely matches the screen’s prediction. The repeated success of large-scale AF-M screens has demonstrated their value as powerful hypothesis generators, motivating development of a proteome-wide structural interactome. However, generating structural models for all ~200 million possible human protein pairs was computationally prohibitive. To address this, I developed KIRC, a classifier that rapidly scores and prioritizes likely interactors based on experimental omics data. Leveraging a GPU cluster donated by NVIDIA corporation, I used AF-M to model the top 1.5 million KIRC-nominated pairs and evaluated them with SPOC. Preliminary analysis suggests that the pipeline yielded more than 34,000 high-confidence interactions, many of which are uncharacterized. In summary, my work has produced a suite of computational tools that streamline large- scale in silico PPI screening and help biomedical researchers effectively harness advanced machine learning to accelerate discovery. The all-by-all genome maintenance screen, the acidic patch screen, and several additional collaborative projects with labs at Harvard Medical School collectively highlight the power and versatility of this approach.