Publication:
Essays on Statistics and Data Science Education

No Thumbnail Available

Date

2024-05-07

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Klugman, Emma. 2024. Essays on Statistics and Data Science Education. Doctoral dissertation, Harvard University Graduate School of Arts and Sciences.

Research Data

Abstract

Statistics & data science are growing, rapidly evolving, and increasingly important for an informed citizenry in a data-saturated world. In this dissertation, I address two central questions: (1) who is taking statistics? and (2) what are statistics courses teaching? I estimate that 920,000 US students take statistics in high school each year, but this population has not yet been well studied. Using a rich set of survey responses describing 15,727 students’ demographics, career interests and values, STEM identity, grades, and test scores, my first study compares four groups of high-school course-takers: those who take statistics, calculus, both, and neither. I then employ latent profile analysis to shed light on who these students are, showing that students with different profiles take statistics at surprisingly similar rates: statistics is as an important part of the academic pathway for a wide range of students and serves a demographically diverse population. In my second study, I build upon tools from natural language processing and psychometric measurement to develop a human-in-the-loop methodology for measuring latent constructs in large text corpora, and present a framework for doing so. I construct a lexicon-based instrument to measure the extent to which syllabi from college statistics and data science courses align with a vision for modernizing instruction set forth in the Guidelines for Assessment and Instruction in Statistics Education (GAISE) project and across 145 journal articles spanning almost a century. In so doing, I illustrate an approach that researchers can take in bringing measurement questions to text data, a method that I believe strikes a useful balance between interpretability, communicability, validity, and scalability. My final study applies these instruments to 32,483 syllabi from US statistics and data science courses taught between 2010 and 2018. I find a modest overall increase in modern approaches over this decade. Finally, I explore differences between institution types using multilevel models, finding that private and four-year institutions, as well as those with higher admissions rates and Pell-recipient populations, have more modern syllabi, though two-year institutions and schools serving fewer Pell recipients seem to be gaining ground.

Description

Other Available Sources

Keywords

Data Science, Data Science Education, Measurement, Psychometrics, Statistics Education, Text Analysis, Education, Statistics, Educational tests & measurements

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Referenced By

Related Stories