Genome-Wide Analysis of NET-Seq Data to Understand RNA Polymerase II Behaviour Around Transcription Factor Binding Sites
CitationRosen, Leah. 2019. Genome-Wide Analysis of NET-Seq Data to Understand RNA Polymerase II Behaviour Around Transcription Factor Binding Sites. Bachelor's thesis, Harvard College.
AbstractIt has long been known that, in humans, RNA polymerase II (Pol II) transcribes protein-coding DNA into RNA and that proteins called transcription factors (TFs) regulate which genes Pol II transcribes and at what frequency. Thus, the interplay between Pol II and TFs is a vital component of how cells with the same genetic material adapt to changing environments and form diverse cell types. Recently, thanks to modern technologies such as native elongating transcript sequencing (NET-seq), it has been discovered that Pol II frequently pauses as it is transcribing. While some causes of Pol II pausing are understood, many pauses remain mysterious. This thesis explores whether encountering a TF can explain some Pol II pausing behaviour. Although TFs have been heavily studied, their mechanisms remain largely elusive. Investigating how Pol II behaves around TF binding sites could not only explain a subset of the pauses observed, but could also shed light on the mechanisms by which various TFs act. This thesis builds on previous work by Mayer and di Iulio et al. using NET-seq data to investigate this question around the binding sites of specific TFs (Mayer et al., 2015).
We firstly more stringently selected the TF binding sites studied. So that the signal is not diluted by sites where Pol II is not encountering the TF, sites were only selected if they passed a NET-seq coverage threshold. Next, ChIP-seq data were used to confirm that the TF of interest is in fact bound in the cell line of interest. Finally, CAGE-seq data were integrated to stringently remove any promoter proximal pausing that may confound the signal. Based on the more stringently selected sites, Bayes Factors were used to confirm that the amount of transcription initiation around the experimental sites is within a control range and to identify regions where the amount of Pol II pausing is outside of this control range.
Interpretable effects on Pol II pausing were identified for a handful of TFs. Pol II pausing is detected at the TF binding site of E2F1, which makes biological sense, given that E2F1 is known to phosphorylate Pol II; in fact the present observation proposes one mechanism by which E2F1 may achieve this. Further, MafK causes Pol II to pause upstream of its binding site, perhaps reflective of its role in transferring Pol II between genomic locations. MAX causes Pol II to pause both at its binding site and downstream. Interestingly, MAX seems either to cause Pol II to pause downstream, or to initiate transcription, which reflects MAX's various functions and may be modulated by MAX's various binding partners. NRF1 shows symmetrical effects on Pol II pausing, reflecting its palindromic binding site. Finally, PRDM1 causes Pol II to pause at the start of its binding site, consistent with previous reports that it causes Pol II to pause. Thus, this thesis uncovers biologically-relevant Pol II pausing around TF binding sites that elucidate both the cause of a subset of Pol II pauses as well as the poorly understood mechanisms by which TFs act.
Citable link to this pagehttps://nrs.harvard.edu/URN-3:HUL.INSTREPOS:37364652
- FAS Theses and Dissertations