# Inference and Prediction Problems for Spatial and Spatiotemporal Data

 Title: Inference and Prediction Problems for Spatial and Spatiotemporal Data Author: Cervone, Daniel Leonard Citation: Cervone, Daniel Leonard. 2015. Inference and Prediction Problems for Spatial and Spatiotemporal Data. Doctoral dissertation, Harvard University, Graduate School of Arts & Sciences. Full Text & Related Files: CERVONE-DISSERTATION-2015.pdf (2.730Mb; PDF) Abstract: This dissertation focuses on prediction and inference problems for complex spatiotemporal systems. I explore three specific problems in this area---motivated by real data examples---and discuss the theoretical motivations for the proposed methodology, implementation details, and inference/performance on data of interest. Chapter 1 introduces a novel time series model that improves the accuracy of lung tumor tracking for radiotherapy. Tumor tracking requires real-time, multiple-step ahead forecasting of a quasi-periodic time series recording instantaneous tumor locations. Our proposed model is a location-mixture autoregressive (LMAR) process that admits multimodal conditional distributions, fast approximate inference using the EM algorithm and accurate multiple-step ahead predictive distributions. Compared with other families of mixture autoregressive models, LMAR is easier to fit (with a smaller parameter space) and better suited to online inference and multiple-step ahead forecasting as there is no need for Monte Carlo. Against other candidate models in statistics and machine learning, our model provides superior predictive performance for clinical data. Chapter 2 develops a stochastic process model for the spatiotemporal evolution of a basketball possession based on tracking data that records each player's exact location at 25Hz. Our model comprises of multiresolution transition kernels that simultaneously describe players' continuous motion dynamics along with their decisions, ball movements, and other discrete actions. Many such actions occur very sparsely in player $\times$ location space, so we use hierarchical models to share information across different players in the league and disjoint regions on the basketball court---a challenging problem given the scale of our data (over 400 players and 1 billion space-time observations) and the computational cost of inferential methods in spatial statistics. Our framework, in addition to offering valuable insight into individual players’ behavior and decision-making, allows us to estimate the instantaneous expected point value of an NBA possession by averaging over all possible future possession paths. In Chapter 3, we investigate Gaussian process regression where inputs are subject to measurement error. For instance, in spatial statistics, input measurement errors occur when the geographical locations of observed data are not known exactly. Such sources of error are not special cases of nugget'' or microscale variation, and require alternative methods for both interpolation and parameter estimation. We discuss some theory for Kriging in this regime, as well as using Hybrid Monte Carlo to provide predictive distributions (and parameter estimates, if necessary). Through simulation study and analysis of northern hemipshere temperature data from the summer of 2011, we show that appropriate methods for incorporating location measurement error are essential to reliable inference in this regime. Terms of Use: This article is made available under the terms and conditions applicable to Other Posted Material, as set forth at http://nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of-use#LAA Citable link to this page: http://nrs.harvard.edu/urn-3:HUL.InstRepos:17463133 Downloads of this work: