The Use of Natural Language Processing Methods in Reddit to Investigate Opioid Use: Scoping Review.


Journal

JMIR infodemiology
ISSN: 2564-1891
Titre abrégé: JMIR Infodemiology
Pays: Canada
ID NLM: 9918249014806676

Informations de publication

Date de publication:
13 Sep 2024
Historique:
received: 23 07 2023
accepted: 18 06 2024
revised: 01 06 2024
medline: 13 9 2024
pubmed: 13 9 2024
entrez: 13 9 2024
Statut: epublish

Résumé

The growing availability of big data spontaneously generated by social media platforms allows us to leverage natural language processing (NLP) methods as valuable tools to understand the opioid crisis. We aimed to understand how NLP has been applied to Reddit (Reddit Inc) data to study opioid use. We systematically searched for peer-reviewed studies and conference abstracts in PubMed, Scopus, PsycINFO, ACL Anthology, IEEE Xplore, and Association for Computing Machinery data repositories up to July 19, 2022. Inclusion criteria were studies investigating opioid use, using NLP techniques to analyze the textual corpora, and using Reddit as the social media data source. We were specifically interested in mapping studies' overarching goals and findings, methodologies and software used, and main limitations. In total, 30 studies were included, which were classified into 4 nonmutually exclusive overarching goal categories: methodological (n=6, 20% studies), infodemiology (n=22, 73% studies), infoveillance (n=7, 23% studies), and pharmacovigilance (n=3, 10% studies). NLP methods were used to identify content relevant to opioid use among vast quantities of textual data, to establish potential relationships between opioid use patterns or profiles and contextual factors or comorbidities, and to anticipate individuals' transitions between different opioid-related subreddits, likely revealing progression through opioid use stages. Most studies used an embedding technique (12/30, 40%), prediction or classification approach (12/30, 40%), topic modeling (9/30, 30%), and sentiment analysis (6/30, 20%). The most frequently used programming languages were Python (20/30, 67%) and R (2/30, 7%). Among the studies that reported limitations (20/30, 67%), the most cited was the uncertainty regarding whether redditors participating in these forums were representative of people who use opioids (8/20, 40%). The papers were very recent (28/30, 93%), from 2019 to 2022, with authors from a range of disciplines. This scoping review identified a wide variety of NLP techniques and applications used to support surveillance and social media interventions addressing the opioid crisis. Despite the clear potential of these methods to enable the identification of opioid-relevant content in Reddit and its analysis, there are limits to the degree of interpretive meaning that they can provide. Moreover, we identified the need for standardized ethical guidelines to govern the use of Reddit data to safeguard the anonymity and privacy of people using these forums.

Sections du résumé

BACKGROUND BACKGROUND
The growing availability of big data spontaneously generated by social media platforms allows us to leverage natural language processing (NLP) methods as valuable tools to understand the opioid crisis.
OBJECTIVE OBJECTIVE
We aimed to understand how NLP has been applied to Reddit (Reddit Inc) data to study opioid use.
METHODS METHODS
We systematically searched for peer-reviewed studies and conference abstracts in PubMed, Scopus, PsycINFO, ACL Anthology, IEEE Xplore, and Association for Computing Machinery data repositories up to July 19, 2022. Inclusion criteria were studies investigating opioid use, using NLP techniques to analyze the textual corpora, and using Reddit as the social media data source. We were specifically interested in mapping studies' overarching goals and findings, methodologies and software used, and main limitations.
RESULTS RESULTS
In total, 30 studies were included, which were classified into 4 nonmutually exclusive overarching goal categories: methodological (n=6, 20% studies), infodemiology (n=22, 73% studies), infoveillance (n=7, 23% studies), and pharmacovigilance (n=3, 10% studies). NLP methods were used to identify content relevant to opioid use among vast quantities of textual data, to establish potential relationships between opioid use patterns or profiles and contextual factors or comorbidities, and to anticipate individuals' transitions between different opioid-related subreddits, likely revealing progression through opioid use stages. Most studies used an embedding technique (12/30, 40%), prediction or classification approach (12/30, 40%), topic modeling (9/30, 30%), and sentiment analysis (6/30, 20%). The most frequently used programming languages were Python (20/30, 67%) and R (2/30, 7%). Among the studies that reported limitations (20/30, 67%), the most cited was the uncertainty regarding whether redditors participating in these forums were representative of people who use opioids (8/20, 40%). The papers were very recent (28/30, 93%), from 2019 to 2022, with authors from a range of disciplines.
CONCLUSIONS CONCLUSIONS
This scoping review identified a wide variety of NLP techniques and applications used to support surveillance and social media interventions addressing the opioid crisis. Despite the clear potential of these methods to enable the identification of opioid-relevant content in Reddit and its analysis, there are limits to the degree of interpretive meaning that they can provide. Moreover, we identified the need for standardized ethical guidelines to govern the use of Reddit data to safeguard the anonymity and privacy of people using these forums.

Identifiants

pubmed: 39269743
pii: v4i1e51156
doi: 10.2196/51156
doi:

Substances chimiques

Analgesics, Opioid 0

Types de publication

Journal Article Review Systematic Review

Langues

eng

Sous-ensembles de citation

IM

Pagination

e51156

Informations de copyright

©Alexandra Almeida, Thomas Patton, Mike Conway, Amarnath Gupta, Steffanie A Strathdee, Annick Bórquez. Originally published in JMIR Infodemiology (https://infodemiology.jmir.org), 13.09.2024.

Auteurs

Alexandra Almeida (A)

Scientific Computing Program, Oswaldo Cruz Foundation, Rio de Janeiro, Brazil.
San Diego State University, School of Social Work, San Diego, CA, United States.
Department of Medicine, University of California San Diego, San Diego, CA, United States.

Thomas Patton (T)

Department of Medicine, University of California San Diego, San Diego, CA, United States.

Mike Conway (M)

School of Computing and Information Systems, The University of Melbourne, Melbourne, Australia.

Amarnath Gupta (A)

San Diego Supercomputer Center, University of California San Diego, San Diego, CA, United States.

Steffanie A Strathdee (SA)

Department of Medicine, University of California San Diego, San Diego, CA, United States.

Annick Bórquez (A)

Department of Medicine, University of California San Diego, San Diego, CA, United States.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH