Person: Svoronos, Theodore
Email Address
AA Acceptance Date
Birth Date
Research Projects
Organizational Units
Job Title
Last Name
First Name
Name
Search Results
Publication Policy Decisions and Evidence Use among Civil Servants: A Group Decision Experiment in Pakistan
(Center for International Development at Harvard University, 2020-04) Metzger, Laura; Svoronos, Theodore; Khan, AdnanIn a lab-in-field experiment with elite civil servants in Pakistan, we investigate whether groups outperform individuals in a two-staged task which requires effective use of data and evidence. We also study how efficiently groups harness their members’ individual knowledge for problem-solving. We do not find a significant difference in individual (first stage) and group performance (second stage). Yet, groups could have significantly improved their performance during the second stage of the task, had they more efficiently collaborated to retrieve their members’ respective knowledge. Carefully interpreted in the setting of our experiment, our data suggests that diversity in individual knowledge may hamper effective use of data and evidence for decision- making in small groups of policymakers.
Publication Testing the Validity of the Single Interrupted Time Series Design
(Center for International Development at Harvard University, 2019-07) Baicker, Katherine; Svoronos, TheodoreGiven the complex relationships between patients’ demographics, underlying health needs, and outcomes, establishing the causal effects of health policy and delivery interventions on health outcomes is often empirically challenging. The single interrupted time series (SITS) design has become a popular evaluation method in contexts where a randomized controlled trial is not feasible. In this paper, we formalize the structure and assumptions underlying the single ITS design and show that it is significantly more vulnerable to confounding than is often acknowledged and, as a result, can produce misleading results. We illustrate this empirically using the Oregon Health Insurance Experiment, showing that an evaluation using a single interrupted time series design instead of the randomized controlled trial would have produced large and statistically significant results of the wrong sign. We discuss the pitfalls of the SITS design, and suggest circumstances in which it is and is not likely to be reliable.
Publication Two-Stage Examinations: Can Examinations Be More Formative Experiences?
(Center for International Development at Harvard University, 2018-09) Levy, Dan; Klinger, Mae; Svoronos, TheodoreTwo-stage examinations consist of a first stage in which students work individually as they typically do in examinations (stage 1), followed by a second stage in which they work in groups to complete another examination (stage 2), which typically consists of a subset of the questions from the first examination. Data from two-stage midterm and final examinations are used to assess the extent to which individuals improve their performance when collaborating with other students. On average, the group (stage 2) score was about one standard deviation above the individual (stage 1) score. While this difference cannot be interpreted as the causal effect of two-stage examinations on learning, it suggests that individuals experienced substantial performance gains when working in groups in an examination. This average performance gain was comparable with the average difference between the top performer of the group in stage 1 and the group’s stage 1 average, and was equivalent to about two-thirds of the difference between the “super student” score (i.e. the sum of the maximum score for each question in stage 1) and the group’s stage 1 average. This last result suggests that group collaboration takes substantial (albeit partial) advantage of the aggregate knowledge and skills of the group’s individual members. Student feedback about their experience with two-stage examinations reveal that that these types of examinations are generally perceived to be more helpful for learning and are less stressful than traditional examinations. Finally, using data on group gender compositions, we investigate the potential role of gender dynamics on group efficiency.
Publication Evaluating Health Interventions Over Time: Empirical Tests of the Validity of the Single Interrupted Time Series Design
(2016-08-29) Svoronos, Theodore; Cohen, Jessica; Baicker, Katherine; Levy, DanSingle interrupted time series (ITS) is a quasi-experimental evaluation design used frequently in the health policy literature. This manuscript investigates the validity of single ITS through two within-study comparisons (WSCs), comparing the results of a randomized controlled trial (RCT) with the results that would have been obtained had a single ITS design been employed.
In Part 1, I discuss the theory underlying both within-study comparisons and single ITS. I propose an assessment framework to determine whether results from a given design should be deemed "concordant" with an RCT for a given intervention. This framework aims to unify metrics for concordance used in the existing literature, and considers both practical and statistical significance. After summarizing best practices of single ITS analysis, I propose two falsification tests to determine whether the single ITS design is well suited for the trend stability of a particular dataset. These tests draw from literature on determining structural breaks in time series data, as well as work on the optimal binning of data in the regression discontinuity design.
In Part 2, I conduct two within-study comparisons for single ITS. The first study evaluates a behavior change campaign in Uganda aimed at increasing uptake of rapid diagnostic tests for malaria. The WSC finds that single ITS estimates are highly concordant with that of the RCT, producing almost identical results in both point estimate and standard error. This result is robust to multiple specifications. The second study evaluates the effect of the expansion of Medicaid on emergency department use in Oregon. In this case, the single ITS estimates are so discordant with the RCT as to produce statistically significant results in the wrong direction. This result is also robust to multiple specification decisions.
In comparing these differing results, I note important differences between the two datasets. The Uganda data passed the falsification tests for trend stability proposed in Part 1, while the Oregon data failed. Additionally, the Oregon sample is likely subject to a manifestation of self-selection known as "Ashenfelter's dip," whereas the Uganda sample is not. The implication of this shift in outcomes just before the intervention's introduction is especially damaging to single ITS, in comparison to traditionally "weaker" pre-post designs.
In Part 3, I attempt to generate hypotheses as to when single ITS should and should not be used. First, samples defined by self-selection are particularly problematic for single ITS analysis. Second, the advantages of relying on time trends must be weighed against the additional strong assumptions that the single ITS design carries with it. Third, trend stability in the pre period is a crucial factor in getting reliable estimates from single ITS. Fourth, the robustness of results in both WSCs suggests that whether to evaluate a given program with a single ITS design is a more important decision than how to implement ITS.