A method for identifying predictive markers of mental illness in social media data
Abstract
Undiagnosed mental illness poses a significant health risk. In-person screenings to identify individuals at-risk of mental illness are expensive, time-consuming, and often inaccurate. This report presents an array of computational methods that can be used to identify predictive markers of mental illness - specifically, depression and Post-Traumatic Stress Disorder - by scanning and interpreting text and images posted on social media. Separate analyses of Twitter and Instagram data are presented. Predictive features were extracted from social media posts (N_Twitter = 279,951, N_Instagram = 43,950) using a variety of techniques, including color analysis, face detection, semantic analysis, and Natural Language Processing. Resulting models successfully discriminated between depressed and healthy content, and compared favorably to general practitioners’ average success rates in diagnosing depression. Results held even when the analysis was restricted to content posted before first depression diagnosis. In the case of Twitter data, state-space temporal analysis suggests that onset of depression may be detectable from Twitter data several months prior to diagnosis. These methods offer a data-driven, predictive approach for early screening and detection of mental illness.Terms of Use
This article is made available under the terms and conditions applicable to Other Posted Material, as set forth at http://nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of-use#LAACitable link to this page
http://nrs.harvard.edu/urn-3:HUL.InstRepos:40046396
Collections
- FAS Theses and Dissertations [6902]
Contact administrator regarding this item (to report mistakes or request changes)