Publication: Restricted Boltzmann Machines and Remote Homology Search
Open/View Files
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
Determining the evolutionary history, structure, and function of newly discovered proteins aids in the task of organizing the protein world around us. However, directly identifying evolutionary relationships, or homology, sequence by sequence is both costly and time-consuming. Creating methods of sequence homology search that can confidently and sensitively infer common ancestry tremendously aids scientists in the journey to classify proteins. This thesis interrogates the use of restricted Boltzmann machines (RBMs) for sequence homology search as their unique structure presents a new approach to profile-based homology search. By comparing the proposed model to established methods for homology in benchmark homology tests, this thesis seeks to see if the information RBMs learn, as illustrated by Tubiana et al. in a recent paper entitled Learning Protein Constitutive Motifs From Sequence Data, can be utilized in remote homology search. The data and findings serve as a first step of assessing the use of the RBM sequence homology searching and a lesson on designing benchmark experiments with models that model sequences differently.