Publication: Enhancing Protein Sequence Design through Augmented Machine Learning of Hydrogen Bonding Networks
Open/View Files
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
Creating proteins that bind tightly and specifically to ligands is a significant challenge for protein sequence prediction. Current machine learning models struggle to design hydrogen bonding networks in proteins which are crucial for structure stability and ligand affinity. In this thesis, we explore how data on these higher-order interactions can better inform binding site design. We discuss how buried polar residues form interactions with their environment similar to ligands bound to proteins. We then present a strategy to augment training data with diverse, robust examples of hydrogen bonding networks satisfying these residues. This data is used to train a graph neural network that selects residues to explicitly model as standalone ligands. The model analysis demonstrates that predicted binding site sequences establish more realistic interactions with ligands, even for held-out classes of proteins. This suggests that biasing learning toward hydrogen bonding networks using buried residues can improve the performance of de novo sequence design.