Publication:

NOVEL COMPUTATIONAL TOOLS FOR HIGH THROUGHPUT IN- SILICO PROTEIN-PROTEIN INTERACTION SCREENING

Loading...
Thumbnail Image

Date

2025-07-29

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Schmid, Ernst Walter. 2025. NOVEL COMPUTATIONAL TOOLS FOR HIGH THROUGHPUT IN- SILICO PROTEIN-PROTEIN INTERACTION SCREENING. Doctoral Dissertation, Harvard University Graduate School of Arts and Sciences.

Abstract

Cells are complex biochemical systems that have evolved elaborate molecular processes to survive and reproduce. Many of these processes involve multiple proteins working together via direct physical interactions. Historically, identifying protein-protein interactions (PPIs) relied on slow, labor-intensive, and inaccurate experimental methods. Recent advances in deep learning have led to new in-silico approaches that can rapidly and accurately predict the structure of proteins and protein complexes from primary amino acid sequence. One of the most prominent examples is AlphaFold-multimer (AF-M), a neural network developed by DeepMind in 2022. Like others, we have begun using AF-M to perform large scale in-silico screens to uncover new PPIs. In my thesis work, I created tools to efficiently conduct AF-M PPI screens and evaluate the results. A core focus of my efforts was the creation of SPOC, a novel machine learning based classifier that examines structural predictions along with experimental omics data to assess the biological relevance of binary AF-M structure predictions. In addition, I built predictomes.org, a web platform that enables users to interact with and interpret massive AF-M screening datasets. As an initial proof of principle, I applied these tools to uncover PPIs in human genome maintenance by predicting structures for nearly all possible pairwise combinations among 300 proteins. The predictions were scored with SPOC and released to the community on predictomes.org. This screen uncovered new interactions and helped reveal mechanistic insights into processes ranging from transcription coupled nucleotide excision repair to DNA replication stalling during stress. In a separate collaborative effort with Dr. Lucas Farnung’s lab, I folded the H2A/H2B dimer with nearly all human nuclear proteins to identify proteins that engage with a composite nucleosomal surface known as the acidic patch. I developed an analysis pipeline that identified more than 40 hits, including the E3 ubiquitin ligase SHPRH. The Farnung laboratory used cryo-electron microcopy to solve the structure of SHPRH bound to the nucleosome, revealing an interaction that closely matches the screen’s prediction. The repeated success of large-scale AF-M screens has demonstrated their value as powerful hypothesis generators, motivating development of a proteome-wide structural interactome. However, generating structural models for all ~200 million possible human protein pairs was computationally prohibitive. To address this, I developed KIRC, a classifier that rapidly scores and prioritizes likely interactors based on experimental omics data. Leveraging a GPU cluster donated by NVIDIA corporation, I used AF-M to model the top 1.5 million KIRC-nominated pairs and evaluated them with SPOC. Preliminary analysis suggests that the pipeline yielded more than 34,000 high-confidence interactions, many of which are uncharacterized. In summary, my work has produced a suite of computational tools that streamline large- scale in silico PPI screening and help biomedical researchers effectively harness advanced machine learning to accelerate discovery. The all-by-all genome maintenance screen, the acidic patch screen, and several additional collaborative projects with labs at Harvard Medical School collectively highlight the power and versatility of this approach.

Description

Other Available Sources

Research Data

Keywords

AlphaFold, predictomes, Protein-protein interaction, SPOC, Bioinformatics, Biochemistry, Biology

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories