Publication:

Design-Based Causal Inference: Applications to Social Sciences and Industry

Loading...
Thumbnail Image

Date

2024-04-12

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Ham, Dae Woong. 2024. Design-Based Causal Inference: Applications to Social Sciences and Industry. Doctoral dissertation, Harvard University Graduate School of Arts and Sciences.

Abstract

In today's data-driven world, social scientists and industry data scientists increasingly utilize larger-scale complex experiments, e.g., adaptive sequential experiments and high-dimensional treatments, and novel observational causal inference practices, such as combining Difference-in-Difference and Matching, for causal inference. Despite this new landscape, there is a mismatch between the outdated statistical theory and practical modern demand from applications. In my thesis, my research aims to mend these tensions by providing rigorous yet assumption-lean practical causal solutions that are directly aligned with the novel settings in industry and social science applications.

Chapter 1: Randomization Tests with High-dimensional Treatments. Many experiments, such as conjoint experiments, have high-dimensional treatments. In this setting, standard causal inference methods, which assume a binary treatment, may not be adequate. For example, a simple difference-in-means is often under-powered, yet widely used in political science, to detect treatment effects present in complex high-dimensional interactions. In this chapter, we introduce conditional randomization tests to flexibly allow practitioners to utilize powerful machine learning algorithms to detect a (potentially) high-dimensional treatment effect while requiring no modelling assumptions for the validity of our approach.

Chapter 2: Anytime-valid Causal Inference Through the Design-Based Approach. In most A/B tests, observations arrive over time and the manager desires to test the results as new data becomes available to mitigate risk. For example, Netflix ran an A/B test with a strong negative treatment effect on the sign-up page on 30,000 potential subscribers for over a month, where Netflix lost approximately 3,000 potential subscribers or approximately 1 million U.S. Dollars of lifetime value from a single experiment. This issue is exacerbated for companies running numerous experiments on their customers as prolonged exposure to harmful treatments leads to retention drops and customer dissatisfaction. However, peeking and repeatedly testing with a t-test, for example, leads to uncontrolled type-1 error due to multiple testing. In this chapter, we propose design-based confidence sequences, sequences of confidence intervals with uniform type-1 error guarantees, that formally allows peeking while being statistically valid without any modeling assumptions or regularity assumption on the outcome.

Chapter 3: Difference-in-Difference and Matching Many applied researchers in both social sciences and industry commonly perform matching prior to a Difference-in-Difference (DiD) analysis to bolster the plausibility of the ``parallel-trends'' assumption in observational causal inference settings. For example, analysts regularly use DiD to evaluate the impact of a change when it is difficult to randomize such treatment, e.g., the effect of the introduction of a new law or company policy on revenue. Despite this common practice, it is still unclear when matching prior to a Difference-in-Difference analysis actually reduces bias. In this chapter, we aim to mathematically understand and quantify when the common practice of matching prior to a difference-in-difference analysis is justified and give further practical guidelines on when to match or not.

Description

Other Available Sources

Research Data

Keywords

Applied Statistics, Causal Inference, Design-Based Inference, Non-parametric Statistics, Statistics

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories