Publication: Towards Learning Regulatory Elements of Promoter Sequences With Deep Learning
Open/View Files
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
Promoters play a key role in gene regulation. Although progress has been made to understand the elements which make up a promoter, the identification of all of the regulatory elements which comprise the promoter remains challenging due to the high variability of promoter sequences. In this thesis, I aim to identify regulatory elements in promoter regions using deep learning. Specifically, I employ a convolutional neural network (CNN) to predict whether a given genomic sequence contains a promoter versus several null models, i.e. background sequences. I compare the performance of the CNN model for each null model and perform saliency analysis to visualize what the network has learned. The main result I found is that the null model must be carefully selected to avoid learning confounding signals such as nucleotide biases. I found that a dinucleotide shuffle of transcription start sites was able to find known regulatory elements associated with bi-directional promoters.