Person: Honaker, James
Email Address
AA Acceptance Date
Birth Date
Research Projects
Organizational Units
Job Title
Last Name
First Name
Name
Search Results
Publication A Unified Approach to Measurement Error and Missing Data: Overview, Sociological Methods and Research
(Harvard University, 2014) Blackwell, Matthew; Honaker, James; King, GaryAlthough social scientists devote considerable effort to mitigating measurement error during data collection, they often ignore the issue during data analysis. And although many statistical methods have been proposed for reducing measurement error-induced biases, few have been widely used because of implausible assumptions, high levels of model dependence, difficult computation, or inapplicability with multiple mismeasured variables. We develop an easy-to-use alternative without these problems; it generalizes the popular multiple imputation (MI) framework by treating missing data problems as a limiting special case of extreme measurement error, and corrects for both. Like MI, the proposed framework is a simple two-step procedure, so that in the second step researchers can use whatever statistical method they would have if there had been no problem in the first place. We also offer empirical illustrations, open source software that implements all the methods described herein, and a companion paper with technical details and extensions (Blackwell, Honaker, and King, 2014b).
Publication A Unified Approach to Measurement Error and Missing Data: Details and Extensions
(SAGE Publications, 2015) Blackwell, Matthew; Honaker, James; King, GaryWe extend a unified and easy-to-use approach to measurement error and missing data. Black-well, Honaker and King (2014) gives an intuitive overview of the new technique, along with practical suggestions and empirical applications. Here, we other more precise technical details; more sophisticated measurement error model specifications and estimation procedures; and analyses to assess the approach's robustness to correlated measurement errors and to errors in categorical variables. These results support using the technique to reduce bias and increase efficiency in a wide variety of empirical research.
Publication Automating Open Science for Big Data
(SAGE Publications, 2015) Crosas, Merce; King, Gary; Honaker, James; Sweeney, LatanyaThe vast majority of social science research presently uses small (MB or GB scale) data sets. These fixed scale sets are commonly downloaded to the researcher's computer where the analysis is performed locally, and are often shared and cited with well-established technologies, such as the Dataverse Project (see Dataverse.org), to support the published results. The trend towards Big Data - including large scale streaming data - is starting to transform research and has the potential to impact policy-making and our understanding of the social, economic, and political problems that affect human societies. However, this research poses new challenges in execution, accountability, preservation, reuse, and reproducibility. Downloading these data sets to a researcher's computer is infeasible or not practical; hence, analyses take place in the cloud, require unusual expertise, and benefit from collaborative teamwork and novel tool development. The advantage of these data sets in how informative they are also means that they are much more likely to contain highly sensitive personally identifiable information. In this paper, we discuss solutions to these new challenges so that the social sciences can realize the potential of Big Data.
Publication Differential Privacy: A Primer for a Non-Technical Audience
(Vanderbilt University, 2018) Wood, Alexandra; Altman, Micah; Bembenek, Aaron; Bun, Mark; Gaboardi, Marco; Honaker, James; Nissim, Kobbi; O'Brien, David; Steinke, Thomas; Vadhan, SalilDifferential privacy is a formal mathematical framework for quantifying and managing privacy risks. It provides provable privacy protection against a wide range of potential attacks, including those currently unforeseen. Differential privacy is primarily studied in the context of the collection, analysis, and release of aggregate statistics. These range from simple statistical estimations, such as averages, to machine learning. Tools for differentially private analysis are now in early stages of implementation and use across a variety of academic, industry, and government settings. Interest in the concept is growing among potential users of the tools, as well as within legal and policy communities, as it holds promise as a potential approach to satisfying legal requirements for privacy protection when handling personal information. In particular, differential privacy may be seen as a technical solution for analyzing and sharing data while protecting the privacy of individuals in accordance with existing legal or policy requirements for de-identification or disclosure limitation.
This primer seeks to introduce the concept of differential privacy and its privacy implications to non-technical audiences. It provides a simplified and informal, but mathematically accurate, description of differential privacy. Using intuitive illustrations and limited mathematical formalism, it discusses the definition of differential privacy, how differential privacy addresses privacy risks, how differentially private analyses are constructed, and how such analyses can be used in practice. A series of illustrations is used to show how practitioners and policymakers can conceptualize the guarantees provided by differential privacy. These illustrations are also used to explain related concepts, such as composition (the accumulation of risk across multiple analyses), privacy loss parameters, and privacy budgets. This primer aims to provide a foundation that can guide future decisions when analyzing and sharing statistical data about individuals, informing individuals about the privacy protection they will be afforded, and designing policies and regulations for robust privacy protection.