Publication: Enhancing Protein Sequence Design through Augmented Machine Learning of Hydrogen Bonding Networks
No Thumbnail Available
Open/View Files
Date
2024-06-12
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
The Harvard community has made this article openly available. Please share how this access benefits you.
Citation
Tan, Kevin. 2024. Enhancing Protein Sequence Design through Augmented Machine Learning of Hydrogen Bonding Networks. Bachelor's thesis, Harvard University Engineering and Applied Sciences.
Research Data
Abstract
Creating proteins that bind tightly and specifically to ligands is a significant challenge for protein sequence prediction. Current machine learning models struggle to design hydrogen bonding networks in proteins which are crucial for structure stability and ligand affinity. In this thesis, we explore how data on these higher-order interactions can better inform binding site design. We discuss how buried polar residues form interactions with their environment similar to ligands bound to proteins. We then present a strategy to augment training data with diverse, robust examples of hydrogen bonding networks satisfying these residues. This data is used to train a graph neural network that selects residues to explicitly model as standalone ligands. The model analysis demonstrates that predicted binding site sequences establish more realistic interactions with ligands, even for held-out classes of proteins. This suggests that biasing learning toward hydrogen bonding networks using buried residues can improve the performance of de novo sequence design.
Description
Other Available Sources
Keywords
Bioinformatics
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service