Publication: Bayesian Models to Identify Hidden Patterns with Applications in Biology
Open/View Files
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
Technology advances have made possible the generation of massive amounts of data in biology. For example, whole genome sequencing (WGS) has made available (nearly) the entirety of DNA sequences of various organisms. Single-cell RNA se- quencing (scRNA-seq) technology measures expression levels of tens of thousands of genes in individual cells. They provide the scientific community huge opportunities as well as challenges in deciphering hidden information and understanding organisms at the micro-level. In this thesis, we will focus on modeling categorical data arise from several domains in biology. We will first introduce a Bayesian method that use aligned sequences from extant species on a species tree to infer DNA substitution rate shift patterns and identify candidate elements associated with a convergent phenotype in the presence of gene tree and species tree discordance. In the second part, we will focus on Bayesian bi-clustering methods, which simultaneous cluster samples and features. We will pro- pose four methods to model categorical data. These methods are designed to tackle different problems in biology, and have increased complexity on how features are mod- eled among object clusters. Though we focus on problems in biology, applications of our methods are broad.