|dc.description.abstract||The human genome harbors tens of thousands of long non-coding RNAs (lncRNAs)—RNAs longer than 200 nucleotides that are reproducibly transcribed but do not encode proteins. While the in vivo roles of a handful of lncRNAs are known and hundreds more have been implicated in disease, the vast majority of lncRNAs remain wholly uncharacterized due to two main issues. First, compared to proteins, our understanding of functional RNA domains is poor. Thus, methods his- torically used to categorize unannotated proteins cannot be analogously applied to lncRNAs. Sec- ond, as transcription is a noisy process, it is likely that some percentage of lncRNAs are transcribed by chance, playing no functional role and imparting no fitness advantage. Therefore, characteriz- ing the non-coding transcriptome will require a novel, integrative framework that is (1) freed from protein-based constraints and (2) able to parse true biological signal from large amounts of noise. In this thesis, I argue that understanding fundamental aspects of RNA biology is key to the eventual goal of categorizing and validating functional lncRNAs. Moreover, I argue that analyzing thousands of transcripts simultaneously is paramount to identify subtle patterns amid noise. I also propose that studying non-coding RNAs can elucidate general biological mechanisms that have been missed by protein-focused studies.
My work has focused on understanding three fundamental aspects of lncRNA biology—regulation, evolution, and function—using computational and high-throughput experimental approaches.
First, in an integrative analysis comparing the regulation of lncRNAs and mRNAs, I find that cer-tain regulatory DNA and RNA motifs are predictive of lncRNA functionality. Second, via high- throughput experiments, I show that less transcription factor binding redundancy is associated with the low expression and high tissue-specificity of lncRNAs. Third, also via high-throughput experiments, I show that lncRNA transcription is evolutionarily volatile and associated with motif turnover. Finally, using massive functional screens, I identify a subset of lncRNAs that are required for endoderm differentiation. Collectively, my work provides a foundation with which to begin to prioritize putatively functional lncRNAs, while also revealing novel mechanisms underlying tran- scriptional regulation, gene expression evolution, and differentiation.||