Bayesian Learning of Relationships
AbstractStatistics is the art of communicating with the silent truth-teller: data. More legitimate, accurate and powerful inference from data is the endless pursuit of all statisticians. This dissertation presents our efforts in developing new statistical techniques for learning the relationships inside data, from Bayesian perspective.
Our first focus is on developing a novel hypothesis testing tool to probe a general relationship, either linear or nonlinear. In Chapter 1 we present a Bayesian slicing model for testing the independence between a univariate response variable and a possibly multi-dimensional predictor with optional conditioning variables. The model aims at identifying the relationship through a latent Bayesian slicing scheme, which makes the relationship quantifiable in the discrete space using a Bayes factor. The Bayes factor can be a direct measure of dependency in Bayesian hypothesis testing framework or a testing statistic in the Frequentist counterpart. We discuss the computation of the Bayes factor and compare its performance with existing methods, finding its very stable performance over all investigated dependency patterns and remarkable power of detecting a mixture of relationships.
The next two chapters use real data to show the merits of Bayesian learners for specific relationships. Chapter 2 exemplifies a Bayesian causal relationship inference of the effectiveness of email-marketing, an increasingly important marketing strategy for today's businesses. After applying propensity-score-based unit-matching technique to alleviate potential confounding, we establish a survival-process-based Bayesian model that quantifies the effects of email marketing campaigns in conjunction with customer characteristics as well as their interactions, and explicitly addresses the seasonality and the heteroscedacity of customer behavior. We analyze a large email marketing dataset of an online ticket marketplace to evaluate the short- and long-term effectiveness of their email campaigns, which are both detected as significant, together with the effects from multiple characteristics of customers. Especially, a strong positive interaction is uncovered between email offer and purchase recency, suggesting a more efficient email marketing strategy is to stimulate customers who have been inactive recently with promotional offers.
In Chapter 3 we turn our interest to learning the relationship structure among units through a Bayesian clustering method. In genetics, in particular, the gene-disease association studies, our method can infer the hidden population structure and make the potential confounding controlled. The so-called Bayesian Bi-Clustering method differs from traditional approaches in that we targets a simultaneous classification of the units and search of the class-specific features, which can reduce the data noise level and make the cluster pattern more clear. Based on this, we further develop a Bayesian hierarchical Bi-Clustering method to establish the hierarchy of clusters. These two methods are applied to the International HapMap Project data, a collection of human genotypes across world-wide populations. We find the inferred genetic populations are in high agreement with the nominal populations. In addition, we can detect the interaction between population and disease based on their common genetic root, which explains the difference of disease susceptibility across populations.
Citable link to this pagehttp://nrs.harvard.edu/urn-3:HUL.InstRepos:39987875
- FAS Theses and Dissertations