G-Squared Statistic for Detecting Dependence, Additive Modeling, and Calibration Concordance for Astrophysical Data
AbstractWe present three topics in this thesis, G-squared statistic for independence testing as well as additive modeling, and calibration concordance by multiplicative shrinkage.
Detecting dependence is a fundamental problem. Although the Pearson correlation coefficient is effective for capturing linear dependence, it is powerless for nonlinear or heteroscedastic patterns. We introduce G-squared to test whether two univariate random variables are independent and to measure the strength of their relationship. The G-squared statistic is almost identical R-squared, for linear relationships with constant error variance, and has the intuitive meaning of the piecewise R-squared. We propose two estimators of G-squared and show their consistency. Simulations demonstrate that G-squared estimators are among the most powerful test statistics compared with several state-of-the-art methods.
We consider a nonparametric additive modeling of a reference function where the number of predictor variables can be larger than the sample size, but the number of nonzero components is comparably small. For each predictor variable, the additive component is approximated by B-spline. The G-squared estimated between each predictor and the response helps determine the knots of the B-spline. For variable selection, we apply the adaptive group least absolute shrinkage and selection operator for which we treat the spline bases of each predictor as a group; we also implement forward selection to find the subset with the minimum Bayesian information criterion value. Empirical studies show that both the approaches work well compared with two other methods.
Calibration data are often obtained by observing several sources with several instruments. Analyzing such data for proper concordance among the instruments is challenging because the physical source models are not perfectly specified and data quality varies in ways that cannot be fully quantified. We propose a log-normal hierarchical model and, for outliers, a more general log-t model. Both permit imperfection in the multiplicative mean modeling to be captured by the residual variance. Analytical solutions which take power shrinkage forms are given in special cases and Markov chain Monte Carlo algorithms are adopted for general cases. We apply our method to several data sets and demonstrate that the proposed model provides useful guidance for astrophysicists.
Citable link to this pagehttp://nrs.harvard.edu/urn-3:HUL.InstRepos:39987940
- FAS Theses and Dissertations