Publication:

Word Embeddings in Mental Health

Loading...
Thumbnail Image

Date

2024-06-12

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Thomas Charlon and Tianxi Cai. 2024. "Word Embeddings in Mental Health." Talk presented at R/Medicine 2024.

Abstract

Word Embeddings In Mental Health

Mental health related diagnoses have been on the rise these last years, especially since the pandemic. In the CELEHS laboratory, we analyze electronic health records to help clinicians identify at-risk patients requiring follow-up. In this talk I will present the results of Glove word embeddings on 8,000 open-access suicide-related publications, using the text2vec, opticskxi and sgraph R packages. I developed a novel methodology based on random projections to efficiently find diverse clusters of related concepts in unstructured text data, and evaluate the results by predicting pairs of related concepts, and comparing them to clinician-based known relationships. While many biomedical natural language processing approaches focus on the analysis of specific known concepts, as the ones indexed by the Unified Medical Language System (UMLS), the analysis of the complete text can enable to find novel relationships and borderline concepts.

The Diagnostic and Statistical Manual of Mental Disorders (DSM) is the main reference for clinicians to diagnose mental health diseases, and describes sets of symptoms that form the required diagnostic criteria for each disease. The DSM emphasizes that many patients are diagnosed with multiple conditions, and that current diagnoses could benefit from introducing multidimensional assessments, by taking into account the severity, intensity, duration, and combinations of symptoms, to form more precise diagnostics and help treatment. The DSM also underlines the necessity for diagnoses that take into account the spectrum and gradients of disorders observed, as in schizoaffective and autism spectrum disorders. To this end, the analysis of unstructured text data can help identify clusters of conditions and enable new multidimensional classifications of mental health disorders.

My talk will first present an overview of the problematic as underlined in the DSM and introduce Glove word embeddings using the text2vec package on a set of 8,000 open-access suicide-related publications. I then demonstrate how to explore the embeddings using vector operations to manually find clusters of related concepts, and in a second step automate the discovery of such clusters using the density-based clustering package opticskxi, and visualize the clusters as graphs using the sgraph network visualization package. Further clusters are then discovered by applying semi-directed vector operations, a novel method inspired by random projections. In a last step, I introduce ways to evaluate such clusters, using a database of 17,000 known concepts pairs curated by clinicians with expert knowledge, by predicting pairs of related concepts using a false positive threshold cut-off on cosine similarities.

Novel methodologies in natural language processing will enable us to further understand mental health disorders and their interactions. Specific disorders have been associated to personality traits, as schizophrenia with neuroticism and autism with obsessive-compulsive, and the modeling of such interactions and the further discovery of novel interactions will enable us to enhance the treatment of mental health disorders and identify clinically-actionable features.

Main Sections

00:00 Introduction, Center for Suicide Research and Prevention project 05:42 Text processing and embeddings computation 18:11 Exploration of embeddings, vector operations 29:51 Density-based clustering with OPTICS k-Xi 36:20 Evaluation and knowledge graph generation

More Resources

Center for Suicide Research and Prevention: https://csrp.mgh.harvard.edu/ Git repository of demo: https://gitlab.com/thomaschln/psychclust_rmed24 OPTICS k-Xi density based clustering R package: https://cran.r-project.org/package=opticskxi Knowledge graphs R package: https://cran.r-project.org/package=kgraph

Description

Other Available Sources

Research Data

Keywords

Terms of Use

This article is made available under the terms and conditions applicable to Open Access Policy Articles (OAP), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories