Using Large Language Models to Understand Suicidality in a Social Media-Based Taxonomy of Mental Health Disorders: Linguistic Analysis of Reddit Posts.

Humans Social Media / statistics & numerical data Suicide / psychology Mental Disorders / psychology Linguistics Natural Language Processing

AI LLM anxiety artificial intelligence depression downstream analyses explainable AI explainable artificial intelligence large language model mental health mental health disorder mental health disorders natural language processing online online discussions social media stress suicide trauma web-based discussions

Journal

JMIR mental health

ISSN: 2368-7959

Titre abrégé: JMIR Ment Health

Pays: Canada

ID NLM: 101658926

Informations de publication

Date de publication:
16 May 2024

Historique:

received: 08 02 2024

revised: 28 03 2024

accepted: 29 03 2024

medline: 21 5 2024

pubmed: 21 5 2024

entrez: 21 5 2024

Statut: epublish

Résumé

Rates of suicide have increased by over 35% since 1999. Despite concerted efforts, our ability to predict, explain, or treat suicide risk has not significantly improved over the past 50 years. The aim of this study was to use large language models to understand natural language use during public web-based discussions (on Reddit) around topics related to suicidality. We used large language model-based sentence embedding to extract the latent linguistic dimensions of user postings derived from several mental health-related subreddits, with a focus on suicidality. We then applied dimensionality reduction to these sentence embeddings, allowing them to be summarized and visualized in a lower-dimensional Euclidean space for further downstream analyses. We analyzed 2.9 million posts extracted from 30 subreddits, including r/SuicideWatch, between October 1 and December 31, 2022, and the same period in 2010. Our results showed that, in line with existing theories of suicide, posters in the suicidality community (r/SuicideWatch) predominantly wrote about feelings of disconnection, burdensomeness, hopeless, desperation, resignation, and trauma. Further, we identified distinct latent linguistic dimensions (well-being, seeking support, and severity of distress) among all mental health subreddits, and many of the resulting subreddit clusters were in line with a statistically driven diagnostic classification system-namely, the Hierarchical Taxonomy of Psychopathology (HiTOP)-by mapping onto the proposed superspectra. Overall, our findings provide data-driven support for several language-based theories of suicide, as well as dimensional classification systems for mental health disorders. Ultimately, this novel combination of natural language processing techniques can assist researchers in gaining deeper insights about emotions and experiences shared on the web and may aid in the validation and refutation of different mental health theories.

Sections du résumé

Background UNASSIGNED

Rates of suicide have increased by over 35% since 1999. Despite concerted efforts, our ability to predict, explain, or treat suicide risk has not significantly improved over the past 50 years.

Objective UNASSIGNED

The aim of this study was to use large language models to understand natural language use during public web-based discussions (on Reddit) around topics related to suicidality.

Methods UNASSIGNED

We used large language model-based sentence embedding to extract the latent linguistic dimensions of user postings derived from several mental health-related subreddits, with a focus on suicidality. We then applied dimensionality reduction to these sentence embeddings, allowing them to be summarized and visualized in a lower-dimensional Euclidean space for further downstream analyses. We analyzed 2.9 million posts extracted from 30 subreddits, including r/SuicideWatch, between October 1 and December 31, 2022, and the same period in 2010.

Results UNASSIGNED

Our results showed that, in line with existing theories of suicide, posters in the suicidality community (r/SuicideWatch) predominantly wrote about feelings of disconnection, burdensomeness, hopeless, desperation, resignation, and trauma. Further, we identified distinct latent linguistic dimensions (well-being, seeking support, and severity of distress) among all mental health subreddits, and many of the resulting subreddit clusters were in line with a statistically driven diagnostic classification system-namely, the Hierarchical Taxonomy of Psychopathology (HiTOP)-by mapping onto the proposed superspectra.

Conclusions UNASSIGNED

Overall, our findings provide data-driven support for several language-based theories of suicide, as well as dimensional classification systems for mental health disorders. Ultimately, this novel combination of natural language processing techniques can assist researchers in gaining deeper insights about emotions and experiences shared on the web and may aid in the validation and refutation of different mental health theories.

Identifiants

DOI: 10.2196/57234 PMID: 38771256

pubmed: 38771256

pii: v11i1e57234

doi: 10.2196/57234

doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

Pagination

e57234

Using Large Language Models to Understand Suicidality in a Social Media-Based Taxonomy of Mental Health Disorders: Linguistic Analysis of Reddit Posts.

Journal

Informations de publication

Résumé

Sections du résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Pagination

Informations de copyright

Auteurs

Brian Bauer (B)

Raquel Norel (R)

Alex Leow (A)

Zad Abi Rached (ZA)

Bo Wen (B)

Guillermo Cecchi (G)

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Smoking Cessation and Incident Cardiovascular Disease.

Evaluation of Low-Value Services Across Major Medicare Advantage Insurers and Traditional Medicare.

Effectiveness of Virtual Yoga for Chronic Low Back Pain: A Randomized Clinical Trial.

Classifications MeSH