Person:

Meng, Xiao-li

Loading...
Profile Picture

Email Address

AA Acceptance Date

Birth Date

Research Projects

Organizational Units

Job Title

Last Name

Meng

First Name

Xiao-li

Name

Meng, Xiao-li

Search Results

Now showing 1 - 10 of 28
  • Publication

    Nano-Project Qualifying Exam Process: An Intensified Dialogue between Students and Faculty

    (American Statistical Association, 2010) Blitzstein, Joseph; Meng, Xiao-li

    An effectively designed examination process goes far beyond revealing students’ knowledge or skills. It also serves as a great teaching and learning tool, incentivizing the students to think more deeply and to connect the dots at a higher level. This extends throughout the entire process: pre-exam preparation, the exam itself, and the post-exam period (the aftermath or, more appropriately, afterstat of the exam). As in the publication process, the first submission is essential but still just one piece in the dialogue. Viewing the entire exam process as an extended dialogue between students and faculty, we discuss ideas for making this dialogue induce more inspiration than perspiration, and thereby making it a memorable deep-learning triumph rather than a wish-to-forget test-taking trauma. We illustrate such a dialogue through a recently introduced course in the Harvard Statistics Department, Stat 399: Problem Solving in Statistics, and two recent Ph.D. qualifying examination problems (with annotated solutions). The problems are examples of “nano-projects”: big picture questions split into bite-sized pieces, fueling contemplation and conversation throughout the entire dialogue.

  • Publication

    Decoding the H-likelihood

    (Institute of Mathematical Statistics, 2009) Meng, Xiao-li
  • Publication

    You want me to analyze data I don't have? Are you insane?

    (Editorial Department of the Shanghai Archives of Psychiatry, 2012) Meng, Xiao-li
  • Publication

    A Trio of Inference Problems That Could Win You a Nobel Prize in Statistics (If You Help Fund It)

    (CRC Press, 2014) Meng, Xiao-li

    Statistical inference is a field full of problems whose solutions require the same intellectual force needed to win a Nobel Prize in other scientific fields. Multi-resolution inference is the oldest of the trio. But emerging applications such as individualized medicine have challenged us to the limit: Infer estimands with resolution levels that far exceed those of any feasible estimator. Multi-phase inference is another reality because (big) data are almost never collected, processed, and analyzed in a single phase. The newest of the trio is multi-source inference, which aims to extract information in data coming from very different sources, some of which were never intended for inference purposes. All of these challenges call for an expanded paradigm with greater emphases on qualitative consistency and relative optimality than do our current inference paradigms.

  • Publication

    Comparing Correlated Correlation Coefficients

    (American Psychological Association, 1992) Rosenthal, Robert; Rubin, Donald; Meng, Xiao-li

    The purpose of this article is to provide simple but accurate methods for comparing correlation coefficients between a dependent variable and a set of independent variables. The methods are simple extensions of Dunn & Clark's (1969) work using the Fisher z transformation and include a test and confidence interval for comparing two correlated correlations, a test for heterogeneity, and a test and confidence interval for a contrast among k (>2) correlated correlations. Also briefly discussed is why the traditional Hotelling's t test for comparing correlated correlations is generally not appropriate in practice.

  • Publication

    Enhanced Security Checks at Airports: Minimizing Time to Detection or Probability of Escape?

    (Wiley Blackwell (John Wiley & Sons), 2012) Meng, Xiao-li

    A recent featured article in Significance on enhanced security checks at airports presented an argument that permitted a sampling probability to exceed one. The argument itself therefore cannot be valid, regardless of whether its intended conclusion is justifiable on other grounds. The fact that such an argument could pass the security check of a statistical publication reminds us of the grand challenge we still face: making principled statistical reasoning a routine vocabulary of our civilization. An attempt to correct the argument also demonstrates how oversights can inspire insights: we show that in a design-based context, the impact of prior can persist even if we have the resources to collect an infinite amount of data.

  • Publication

    Thank God That Regressing Y on X is Not the Same as Regressing X on Y: Direct and Indirect Residual Augmentations

    (Informa UK (Taylor & Francis), 2013) Xu, Xiaojin; Meng, Xiao-li; Yu, Yaming

    What does regressing Y on X versus regressing X on Y have to do with MCMC? It turns out that many strategies for speeding up data-augmentation type algorithms can be understood as fostering independence or “de-correlation” between a regression function and the corresponding residual, thereby reducing or even eliminating dependence among MCMC iterates. There are two general classes of algorithms, those corresponding to regressing parameters on augmented data/auxiliary variables and those that operate the other way around. The interweaving strategy (Yu and Meng, 2011, JCGS) provides a general recipe to automatically take advantage of both, and it is the existence of two different types of residuals that makes the interweaving strategy seemingly magical in some cases and promising in general. The concept of residuals—which depends on actual data—also highlights the potential for substantial improvements when data augmentation schemes are allowed to depend on the observed data. At the same time, there is an intriguing phase transition type of phenomenon regarding choosing (partially) residual augmentation schemes, reminding us once more of the prevailing issue of trade-off between robustness and efficiency. This article reports on these latest theoretical investigations (using a class of normal/independence models) and empirical findings (using a posterior sampling for a Probit regression) in the search for effective residual augmentations—and ultimately more MCMC algorithms—that meet the 3-S criterion: simple, stable, and speedy.

  • Publication

    I Got More Data, My Model is More Refined, but My Estimator is Getting Worse! Am I Just Dumb?

    (Informa UK (Taylor & Francis), 2013) Meng, Xiao-li; Xie, Xianchao

    Possibly, but more likely you are merely a victim of conventional wisdom. More data or better models by no means guarantee better estimators (e.g., with a smaller mean squared error), when you are not following probabilistically principled methods such as MLE (for large samples) or Bayesian approaches. Estimating equations are particularly vulnerable in this regard, almost a necessary price for their robustness. These points will be demonstrated via common tasks of estimating regression parameters and correlations, under simple models such as bivariate normal and ARCH(1). Some general strategies for detecting and avoiding such pitfalls are suggested, including checking for self-efficiency (Meng, 1994; Statistical Science) and adopting a guiding working model. Using the example of estimating the autocorrelation (\rho) under a stationary AR(1) model, we also demonstrate the interaction between model assumptions and observation structures in seeking additional information, as the sampling interval (s) increases. Furthermore, for a given sample size, the optimal s for minimizing the asymptotic variance of (\hat{\rho}_{MLE})is (s = 1) if and only if (\rho^2 ≤ 1/3); beyond that region the optimal s increases at the rate of (log ^{−1}(\rho^{−2})) as (\rho) approaches a unit root, as does the gain in efficiency relative to using (s = 1). A practical implication of this result is that the so-called “non-informative” Jeffreys prior can be far from non-informative even for stationary time series models, because here it converges rapidly to a point mass at a unit root as (s) increases. Our overall emphasis is that intuition and conventional wisdom need to be examined via critical thinking and theoretical verification before they can be trusted fully.

  • Publication

    H-means image segmentation to identify solar thermal features

    (Institute of Electrical and Electronics Engineers, 2012) Stein, Nathan; Stein, Nathan; Kashyap, Vinay; Meng, Xiao-li; van Dyk, David

    Properly segmenting multiband images of the Sun by their thermal properties will help determine the thermal structure of the solar corona. However, off-the-shelf segmentation algorithms are typically inappropriate because temperature information is captured by the relative intensities in different passbands, while the absolute levels are not relevant. Input features are therefore pixel-wise proportions of photons observed in each band. To segment solar images based on these proportions, we use a modification of k-means clustering that we call the H-means algorithm because it uses the Hellinger distance to compare probability vectors. H-means has a closed-form expression for cluster centroids, so computation is as fast as k-means. Tempering the input probability vectors reveals a broader class of H-means algorithms which include spherical k-means clustering. More generally, H-means can be used anytime the input feature is a probabilistic distribution, and hence is useful beyond image segmentation applications.

  • Publication

    Statistics Can Lie but Can also Correct for Lies: Reducing Response Bias in NLAAS via Bayesian Imputation

    (International Press of Boston, Inc., 2013) Liu, Jingchen; Meng, Xiao-li; Chen, Chih-Nan; Alegria, Margarita

    The National Latino and Asian American Study (NLAAS) is a large scale survey of psychiatric epidemiology, the most comprehensive survey of this kind. A unique feature of NLAAS is its embedded experiment for estimating the effect of alternative orderings of interview questions. The findings from the experiment are not completely unexpected, but nevertheless alarming. Compared to the survey results from the widely used traditional ordering, the self-reported psychiatric service-use rates are often doubled or even tripled under a more sensible ordering introduced by NLAAS. These findings explain certain perplexing empirical findings in literature, but at the same time impose some grand challenges. For example, how can one assess racial disparities when different races were surveyed with different survey instruments that are now known to induce substantial differences? The project documented in this paper is part of an effort to address these questions. It creates models for imputing the original responses had the respondents under the traditional survey not taken advantage of the skip patterns to reduce interview time, which resulted in increased rates of incorrect negative responses over the course of the interview. The imputation modeling task is particularly challenging because of the complexity of the questionnaire, the small sample sizes for subgroups of interests, and the need for providing sensible imputation to whatever sub-population that a future user might be interested in studying. As a case study, we report both our findings and frustrations in our quest for dealing with these common real-life complications.