Publication: G-Squared Statistic for Detecting Dependence, Additive Modeling, and Calibration Concordance for Astrophysical Data
No Thumbnail Available
Date
2017-08-31
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
The Harvard community has made this article openly available. Please share how this access benefits you.
Citation
Research Data
Abstract
We present three topics in this thesis, G-squared statistic for independence testing as well as additive modeling, and calibration concordance by multiplicative shrinkage.
Detecting dependence is a fundamental problem. Although the Pearson correlation coefficient is effective for capturing linear dependence, it is powerless for nonlinear or heteroscedastic patterns. We introduce G-squared to test whether two univariate random variables are independent and to measure the strength of their relationship. The G-squared statistic is almost identical R-squared, for linear relationships with constant error variance, and has the intuitive meaning of the piecewise R-squared. We propose two estimators of G-squared and show their consistency. Simulations demonstrate that G-squared estimators are among the most powerful test statistics compared with several state-of-the-art methods.
We consider a nonparametric additive modeling of a reference function where the number of predictor variables can be larger than the sample size, but the number of nonzero components is comparably small. For each predictor variable, the additive component is approximated by B-spline. The G-squared estimated between each predictor and the response helps determine the knots of the B-spline. For variable selection, we apply the adaptive group least absolute shrinkage and selection operator for which we treat the spline bases of each predictor as a group; we also implement forward selection to find the subset with the minimum Bayesian information criterion value. Empirical studies show that both the approaches work well compared with two other methods.
Calibration data are often obtained by observing several sources with several instruments. Analyzing such data for proper concordance among the instruments is challenging because the physical source models are not perfectly specified and data quality varies in ways that cannot be fully quantified. We propose a log-normal hierarchical model and, for outliers, a more general log-t model. Both permit imperfection in the multiplicative mean modeling to be captured by the residual variance. Analytical solutions which take power shrinkage forms are given in special cases and Markov chain Monte Carlo algorithms are adopted for general cases. We apply our method to several data sets and demonstrate that the proposed model provides useful guidance for astrophysicists.
Description
Other Available Sources
Keywords
Statistics
Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service