Publication:

Identifying Predictors of Success in Youth Mental Health Care: A Random Forest Analysis of Evidence-Based Treatment in Outpatient Community Clinics

Loading...
Thumbnail Image

Date

2025-08-22

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Horn, Rachel. 2025. Identifying Predictors of Success in Youth Mental Health Care: A Random Forest Analysis of Evidence-Based Treatment in Outpatient Community Clinics. Doctoral Dissertation, Harvard University Graduate School of Arts and Sciences.

Abstract

Objective: Research has shown dozens of treatment protocols to be effective in addressing mental health problems in youth. However, the research has focused largely on group-level effects, with less research examining which youths respond to a given treatment option. The research that has tested predictors and candidate moderators of treatment effects has largely relied on regression analyses. Recently, psychological treatment research has begun to incorporate various machine learning (ML) techniques, and these methods are presumed to outperform regression analyses; however, few studies have tested this assumption. This dissertation included two studies designed to help fill that gap. Both studies aimed to identify predictors of youth psychotherapy outcomes using traditional regression methods and using ML—random forest, specifically—so that findings with the two methods could be compared. Study 1 used data from youths who were treated with Trauma-Focused Cognitive-Behavioral Therapy (TF-CBT). Study 2 used data from youths who were treated with the Modular Approach to Therapy for Children (MATCH). Both studies addressed three questions: (1) Do youth characteristics predict treatment outcomes with random forest? (2) Do clinician factors predict treatment outcome with random forest? and (3) Does random forest outperform traditional regression in the prediction of treatment outcomes from youth characteristics and clinician factors? Method: Participants were youths ages 7-19 who were treated with TF-CBT (N1 = 5503) or MATCH (N2 = 2292) in outpatient clinics in Connecticut between 2012 and 2022. For each treatment, a random forest and regression model were built to predict symptom outcomes using youth demographics (e.g., age at intake), youth clinical features (e.g., baseline symptom severity), clinician demographics (e.g., sex), and clinician professional features (e.g., licensure status). An additional regression model was built to predict symptom outcome using only the baseline severity on the outcome measure to assess whether the additional predictors included in the other two models enhanced predictive accuracy. Variable importance data and a feature selection program, both based in random forest, were used to identify important predictors for answering questions (1) and (2). Root mean squared error and R2 were calculated on a withheld test set to compare model fits for question (3). Results: In study 1, TF-CBT outcomes on the Child PTSD Symptom Scale (CPSS) were predicted by 10 youth features and 4 clinician features. The baseline score on the CPSS was the strongest youth-based predictor, with higher baseline symptoms associated with larger improvement but ultimately higher symptoms after treatment; TF-CBT credential status was the strongest clinician-based predictor, with youths seeing credentialed clinicians demonstrating a poorer symptom outcome compared to those seeing uncredentialed clinicians. The random forest generated a stronger predictive model for the test set than either regression approach (R2 = 0.61, RMSE = 6.48). In study 2, MATCH outcomes on the youth reported Ohio Problem Severity Scale were predicted by 12 youth features and 5 clinician features. The baseline report on the outcome measure was the strongest youth-based predictor of outcome, again with more symptomatic youth at baseline showing larger improvements but ultimately remaining more symptomatic after treatment. Hours of MATCH training completed was the strongest clinician-based predictor of outcome, with youth symptoms declining as clinician MATCH training increased to 70 hours. The random forest generated a stronger predictive model than either regression approach (R2 = 0.45, RMSE = 6.30). Conclusions: In both studies, random forest outperformed traditional regression analyses, but the margin of benefit varied. Psychologists and other statisticians should consider the purpose of their analyses (namely, predictive accuracy versus feature identification versus effect quantification) when selecting between random forest or other machine learning methods and regression, since the “black box” nature of machine learning means its accuracy comes at the expense of model interpretability. Youth clinical features, especially baseline symptom severity, explained much of the outcome variance. Modular, transdiagnostic treatments may be more challenging than more standardized treatments to model accurately because of the individualized course of treatment for each youth. To explain the variance that remains unaccounted for across the two studies, future work may benefit from the inclusion of features that were not measured here, including clinician “soft skills” like warmth and youth features like treatment motivation, as well as the investigation of appropriate outcome measures for analyses of transdiagnostic treatment protocols (i.e., different measures for different presenting problems).

Description

Other Available Sources

Research Data

Keywords

evidence-based treatment, machine learning, personalized medicine, trauma-focused cognitive behavioral therapy, youth mental health, Clinical psychology

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories