Multiple Imputation Methods for Latent Profile Analysis in Education and the Behavioral Sciences
WALDMAN-DISSERTATION-2020.pdf (5.163Mb)(embargoed until: 2022-05-01)
Waldman, Marcus R.
MetadataShow full item record
CitationWaldman, Marcus R. 2020. Multiple Imputation Methods for Latent Profile Analysis in Education and the Behavioral Sciences. Doctoral dissertation, Harvard Graduate School of Education.
AbstractResearchers in education and the behavioral sciences are increasingly conducting latent profile analysis by focusing investigations on subpopulations of individuals to describe individual differences with greater nuance. At the same time, missing data is practically inevitable, and even modest missingness rates can threaten the validity of inferences. Multiple imputation is a powerful strategy to treat missing data, but it suffers from key limitations in both the imputation and pooling phases when conducting latent profile analysis.
In this dissertation, I conduct three studies to address these gaps. In the first study (Chapter 2), I evaluate whether recursive partitioning imputation algorithms better mitigate nonresponse bias than alternative missing data approaches that are common in practice. I find that recursive partitioning imputation algorithms perform well when sample sizes are large (N = 1,200), but not when sample sizes are small (N = 300) or when class separation is weak (i.e., entropy ≈ .74). In response, I propose a hybrid imputation procedure in the second study (Chapter 3); the proposed method embeds a finite mixture model to generate imputations using a joint modeling framework within a larger chained equations procedure. I demonstrate the hybrid imputation procedure using real-world data.
In the final study (Chapter 4), I scrutinize current practices for conducting finite mixture model selection with missing data. I am not aware of any studies evaluating whether model selection decisions are sensitive to real-world missing data problems. I fill this gap in research by studying whether selection decisions are sensitive to missing data if either a full information maximum likelihood (FIML) or a multiple imputation strategy is employed. Two findings emerge. First, with regards to FIML, the BIC under extracts the true number of classes relative to the complete data condition in the presence of small sample sizes and small classes. Second, with regards to multiple imputation, current practices for pooling information criteria result in model selection decisions that poorly replicate the decisions that would have been made had the data been complete. I propose two remedial procedures for future practice, and, using simulations, I show that these two procedures outperform current practices.
Citable link to this pagehttps://nrs.harvard.edu/URN-3:HUL.INSTREPOS:37364538