Publication:

The Algorithmic Foundations of Private Computational Social Science

Loading...
Thumbnail Image

Date

2022-08-12

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Alabi, Daniel Gbenga. 2022. The Algorithmic Foundations of Private Computational Social Science. Doctoral dissertation, Harvard University Graduate School of Arts and Sciences.

Abstract

Social scientists, political scientists, economists, and healthcare researchers crucially rely on statistical methods to further the study of individuals, society, and human behavior via inferential analysis. Unfortunately, the naive application and release of resulting statistics could reveal sensitive information about the individuals in the dataset. As a result, we propose and analyze rigorous privacy-preserving methods while showing the limits of such methods either mathematically and/or empirically. The main definition of privacy we will rely on is Differential Privacy (DP). In this thesis, we make algorithmic and analytic contributions to fundamental computational social science methods (such as linear regression and rank aggregation):

  1. DP Statistical Prediction: We prove finite-sample convergence rates for a range of DP ordinary least squares methods. Then, we propose methods, based on nonparametric estimators, that perform well---in terms of minimizing prediction loss---when the dataset size is small. We show, experimentally, that the optimal method interpolates between a nonparametric estimator and a parametric one. In particular, for at least one of the methods proposed, the noise due to privacy will be less than the sampling error for common settings of parameters.

  2. DP Uncertainty Quantification: We prove that a differentially private F-statistic converges to the asymptotic distribution of the non-private F-statistic, the chi-squared distribution. In addition, we obtain the optimal 1/sqrt(n) asymptotic convergence rates for resulting DP sufficient statistics. Using the DP F-statistic and a parametric bootstrap, we construct DP hypothesis tests for testing for a linear relationship and testing for the presence of mixtures. Through experiments, we also show when the DP F-statistic outperforms alternative methods for DP hypothesis testing.

  3. DP Rank Aggregation: In social choice theory, (Kemeny) rank aggregation is a well-studied problem where the goal is to combine rankings from multiple voters into a single ranking on the same set of items. Since rankings can reveal preferences of voters (which a voter might like to keep private), it is important to aggregate preferences in such a way to preserve privacy. We present DP algorithms for rank aggregation in the pure and approximate settings along with distribution-independent utility upper and lower bounds. In addition to bounds in the central model, we also present utility bounds for the local model of differential privacy.

Description

Other Available Sources

Research Data

Keywords

Algorithms, Differential Privacy, Social Science, Computer science, Statistics

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories