Person: Blocker, Alexander Weaver
Loading...
Email Address
AA Acceptance Date
Birth Date
Research Projects
Organizational Units
Job Title
Last Name
Blocker
First Name
Alexander Weaver
Name
Blocker, Alexander Weaver
3 results
Search Results
Now showing 1 - 3 of 3
Publication Distributed and Multiphase Inference in Theory and Practice: Principles, Modeling, and Computation for High-Throughput Science(2013-09-18) Blocker, Alexander Weaver; Meng, Xiao-Li; Airoldi, Edoardo Maria; Liu, JunThe rise of high-throughput scientific experimentation and data collection has introduced new classes of statistical and computational challenges. The technologies driving this data explosion are subject to complex new forms of measurement error, requiring sophisticated statistical approaches. Simultaneously, statistical computing must adapt to larger volumes of data and new computational environments, particularly parallel and distributed settings. This dissertation presents several computational and theoretical contributions to these challenges. In chapter 1, we consider the problem of estimating the genome-wide distribution of nucleosome positions from paired-end sequencing data. We develop a modeling approach based on nonparametric templates that controls for variability due to enzymatic digestion. We use this to construct a calibrated Bayesian method to detect local concentrations of nucleosome positions. Inference is carried out via a distributed HMC algorithm that scales linearly in complexity with the length of the genome being analyzed. We provide MPI-based implementations of the proposed methods, stand-alone and on Amazon EC2, which can provide inferences on an entire S. cerevisiae genome in less than 1 hour on EC2. We then present a method for absolute quantitation from LC-MS/MS proteomics experiments in chapter 2. We present a Bayesian model for the non-ignorable missing data mechanism induced by this technology, which includes an unusual combination of censoring and truncation. We provide a scalable MCMC sampler for inference in this setting, enabling full-proteome analyses using cluster computing environments. A set of simulation studies and actual experiments demonstrate this approach's validity and utility. We close in chapter 3 by proposing a theoretical framework for the analysis of preprocessing under the banner of multiphase inference. Preprocessing forms an oft-neglected foundation for a wide range of statistical and scientific analyses. We provide some initial theoretical foundations for this area, including distributed preprocessing, building upon previous work in multiple imputation. We demonstrate that multiphase inferences can, in some cases, even surpass standard single-phase estimators in efficiency and robustness. Our work suggests several paths for further research into the statistical principles underlying preprocessing.Publication Estimating Latent Processes on a Network From Indirect Measurements(Informa UK Limited, 2013) Airoldi, Edoardo; Blocker, Alexander WeaverIn a communication network, point-to-point traffic volumes over time are critical for designing protocols that route information efficiently and for maintaining security, whether at the scale of an Internet service provider or within a corporation. While technically feasible, the direct measurement of point-to-point traffic imposes a heavy burden on network performance and is typically not implemented. Instead, indirect aggregate traffic volumes are routinely collected. We consider the problem of estimating point-to-point traffic volumes, \(x_t\), from aggregate traffic volumes, \(y_t\), given information about the network routing protocol encoded in a matrix A. This estimation task can be reformulated as finding the solutions to a sequence of ill-posed linear inverse problems, \(y_t=Ax_t\), since the number of origin-destination routes of interest is higher than the number of aggregate measurements available. Here, we introduce a novel multilevel state-space model (SSM) of aggregate traffic volumes with realistic features. We implement a naïve strategy for estimating unobserved point-to-point traffic volumes from indirect measurements of aggregate traffic, based on particle filtering. We then develop a more efficient two-stage inference strategy that relies on model-based regularization: a simple model is used to calibrate regularization parameters that lead to efficient/scalable inference in the multilevel SSM. We apply our methods to corporate and academic networks, where we show that the proposed inference strategy outperforms existing approaches and scales to larger networks. We also design a simulation study to explore the factors that influence the performance. Our results suggest that model-based regularization may be an efficient strategy for inference in other complex multilevel models. Supplementary materials for this article are available online.Publication Ten Simple Rules for the Care and Feeding of Scientific Data(Public Library of Science (PLoS), 2014) Goodman, Alyssa; Pepe, Alberto; Blocker, Alexander Weaver; Borgman, Christine L.; Cranmer, Kyle; Crosas, Merce; Di Stefano, Rosanne; Gil, Yolanda; Groth, Paul; Hedstrom, Peg; Hogg, David W.; Kashyap, Vinay; Mahabal, Ashish; Siemiginowska, Aneta; Slavkovic, Aleksandra