Publication:
Enhancing Protein Sequence Design through Augmented Machine Learning of Hydrogen Bonding Networks

No Thumbnail Available

Date

2024-06-12

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Tan, Kevin. 2024. Enhancing Protein Sequence Design through Augmented Machine Learning of Hydrogen Bonding Networks. Bachelor's thesis, Harvard University Engineering and Applied Sciences.

Research Data

Abstract

Creating proteins that bind tightly and specifically to ligands is a significant challenge for protein sequence prediction. Current machine learning models struggle to design hydrogen bonding networks in proteins which are crucial for structure stability and ligand affinity. In this thesis, we explore how data on these higher-order interactions can better inform binding site design. We discuss how buried polar residues form interactions with their environment similar to ligands bound to proteins. We then present a strategy to augment training data with diverse, robust examples of hydrogen bonding networks satisfying these residues. This data is used to train a graph neural network that selects residues to explicitly model as standalone ligands. The model analysis demonstrates that predicted binding site sequences establish more realistic interactions with ligands, even for held-out classes of proteins. This suggests that biasing learning toward hydrogen bonding networks using buried residues can improve the performance of de novo sequence design.

Description

Other Available Sources

Keywords

Bioinformatics

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Referenced By

Related Stories