Estimating TMRCA, Modeling the Fixed Pedigree, and the Effect of the Y Chromosome on the Chromatin Landscape
CitationKing, Leandra. 2016. Estimating TMRCA, Modeling the Fixed Pedigree, and the Effect of the Y Chromosome on the Chromatin Landscape. Doctoral dissertation, Harvard University, Graduate School of Arts & Sciences.
AbstractThis thesis consists of three chapters on different topics.
Chapter 1: We demonstrate the advantages of using information at many unlinked loci in order to better calibrate estimates of the time to the most recent common ancestor (TMRCA) at a given locus. To this end, we apply a simple empirical Bayes method to estimate the TMRCA. This method is both asymptotically optimal, in the sense that the estimator converges to the true value when the number of unlinked loci for which we have information is large, and has the advantage of not making any assumptions about demographic history. The algorithm works as follows: we first split the sample at each locus into inferred left and right clades in order to obtain many estimates of the TMRCA, which we can average to obtain an initial estimate of the TMRCA. We then use nucleotide sequence data from other unlinked loci to form an empirical distribution that we can use to improve this initial estimate.
Chapter 2: The population-scaled mutation rate is informative on the effective population size and is thus widely used in population genetics. We show that for two sequences, the Tajima's estimator (theta-hat), based on the average number of pairwise differences at n unlinked loci, is not consistent and therefore its variance does not vanish even as n approaches infinity. The non-zero variance of theta-hat results from the positive correlation between coalescence times that exists even at unlinked loci, due to the process of Mendelian percolation through a fixed pedigree. We derive this correlation under the discrete-time Wright-Fisher model (DTWF), and we point out the effects leading to this surprising result. In particular, whether loci were sampled from the same chromosome (even if very far apart) or from different chromosomes affects the extent of this correlation. We also derive a lower bound on the correlation by conditioning on the fixed number of shared ancestors that connect the pedigrees of the two sequences. We finally obtain empirical estimates of the correlation of coalescence times under demographic models inspired by large-scale human genealogical data. Although the effect we describe is small (of order 1/Ne, where Ne is the effective population size), it is important to recognize this feature of statistical population genomics which runs counter to commonly held notions about unlinked loci.
Chapter 3: The Drosophila melanogaster Y chromosome is able to affect gene expression across the genome. It has been assumed that it does so by modifying the chromatin landscape. We screen two African and two European Y introgression lines for differential expression as well as differential binding in two proteins: Lamin and D1. There is significant intra-population variation in gene expression in the African population, which is surprising given the selective forces at play. Because that there are very few SNP differences in African populations, we can conclude that the effect of the Y chromosome is driven by other mutational events, like variation in repetitive regions. We find that differential binding does occur, and the strongest signals for differential binding are in regions of tandem repeats and centromeric regions. We can conclude from this that non-coding RNA likely plays a mediating role in influencing chromatin state, but also that a variety of different mechanisms are probably at play.
Citable link to this pagehttp://nrs.harvard.edu/urn-3:HUL.InstRepos:33840695
- FAS Theses and Dissertations