Using Large Language Models to Understand Suicidality in a Social Media-Based Taxonomy of Mental Health Disorders: Linguistic Analysis of Reddit Posts.
AI
LLM
anxiety
artificial intelligence
depression
downstream analyses
explainable AI
explainable artificial intelligence
large language model
mental health
mental health disorder
mental health disorders
natural language processing
online
online discussions
social media
stress
suicide
trauma
web-based discussions
Journal
JMIR mental health
ISSN: 2368-7959
Titre abrégé: JMIR Ment Health
Pays: Canada
ID NLM: 101658926
Informations de publication
Date de publication:
16 May 2024
16 May 2024
Historique:
received:
08
02
2024
revised:
28
03
2024
accepted:
29
03
2024
medline:
21
5
2024
pubmed:
21
5
2024
entrez:
21
5
2024
Statut:
epublish
Résumé
Rates of suicide have increased by over 35% since 1999. Despite concerted efforts, our ability to predict, explain, or treat suicide risk has not significantly improved over the past 50 years. The aim of this study was to use large language models to understand natural language use during public web-based discussions (on Reddit) around topics related to suicidality. We used large language model-based sentence embedding to extract the latent linguistic dimensions of user postings derived from several mental health-related subreddits, with a focus on suicidality. We then applied dimensionality reduction to these sentence embeddings, allowing them to be summarized and visualized in a lower-dimensional Euclidean space for further downstream analyses. We analyzed 2.9 million posts extracted from 30 subreddits, including r/SuicideWatch, between October 1 and December 31, 2022, and the same period in 2010. Our results showed that, in line with existing theories of suicide, posters in the suicidality community (r/SuicideWatch) predominantly wrote about feelings of disconnection, burdensomeness, hopeless, desperation, resignation, and trauma. Further, we identified distinct latent linguistic dimensions (well-being, seeking support, and severity of distress) among all mental health subreddits, and many of the resulting subreddit clusters were in line with a statistically driven diagnostic classification system-namely, the Hierarchical Taxonomy of Psychopathology (HiTOP)-by mapping onto the proposed superspectra. Overall, our findings provide data-driven support for several language-based theories of suicide, as well as dimensional classification systems for mental health disorders. Ultimately, this novel combination of natural language processing techniques can assist researchers in gaining deeper insights about emotions and experiences shared on the web and may aid in the validation and refutation of different mental health theories.
Sections du résumé
Background
UNASSIGNED
Rates of suicide have increased by over 35% since 1999. Despite concerted efforts, our ability to predict, explain, or treat suicide risk has not significantly improved over the past 50 years.
Objective
UNASSIGNED
The aim of this study was to use large language models to understand natural language use during public web-based discussions (on Reddit) around topics related to suicidality.
Methods
UNASSIGNED
We used large language model-based sentence embedding to extract the latent linguistic dimensions of user postings derived from several mental health-related subreddits, with a focus on suicidality. We then applied dimensionality reduction to these sentence embeddings, allowing them to be summarized and visualized in a lower-dimensional Euclidean space for further downstream analyses. We analyzed 2.9 million posts extracted from 30 subreddits, including r/SuicideWatch, between October 1 and December 31, 2022, and the same period in 2010.
Results
UNASSIGNED
Our results showed that, in line with existing theories of suicide, posters in the suicidality community (r/SuicideWatch) predominantly wrote about feelings of disconnection, burdensomeness, hopeless, desperation, resignation, and trauma. Further, we identified distinct latent linguistic dimensions (well-being, seeking support, and severity of distress) among all mental health subreddits, and many of the resulting subreddit clusters were in line with a statistically driven diagnostic classification system-namely, the Hierarchical Taxonomy of Psychopathology (HiTOP)-by mapping onto the proposed superspectra.
Conclusions
UNASSIGNED
Overall, our findings provide data-driven support for several language-based theories of suicide, as well as dimensional classification systems for mental health disorders. Ultimately, this novel combination of natural language processing techniques can assist researchers in gaining deeper insights about emotions and experiences shared on the web and may aid in the validation and refutation of different mental health theories.
Identifiants
pubmed: 38771256
pii: v11i1e57234
doi: 10.2196/57234
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
e57234Informations de copyright
© Brian Bauer, Raquel Norel, Alex Leow, Zad Abi Rached, Bo Wen, Guillermo Cecchi. Originally published in JMIR Mental Health (https://mental.jmir.org).