Identifying Electronic Nicotine Delivery System Brands and Flavors on Instagram: Natural Language Processing Analysis.


Journal

Journal of medical Internet research
ISSN: 1438-8871
Titre abrégé: J Med Internet Res
Pays: Canada
ID NLM: 100959882

Informations de publication

Date de publication:
18 01 2022
Historique:
received: 09 05 2021
accepted: 21 11 2021
revised: 01 11 2021
entrez: 18 1 2022
pubmed: 19 1 2022
medline: 27 1 2022
Statut: epublish

Résumé

Electronic nicotine delivery system (ENDS) brands, such as JUUL, used social media as a key component of their marketing strategy, which led to massive sales growth from 2015 to 2018. During this time, ENDS use rapidly increased among youths and young adults, with flavored products being particularly popular among these groups. The aim of our study is to develop a named entity recognition (NER) model to identify potential emerging vaping brands and flavors from Instagram post text. NER is a natural language processing task for identifying specific types of words (entities) in text based on the characteristics of the entity and surrounding words. NER models were trained on a labeled data set of 2272 Instagram posts coded for ENDS brands and flavors. We compared three types of NER models-conditional random fields, a residual convolutional neural network, and a fine-tuned distilled bidirectional encoder representations from transformers (FTDB) network-to identify brands and flavors in Instagram posts with key model outcomes of precision, recall, and F1 scores. We used data from Nielsen scanner sales and Wikipedia to create benchmark dictionaries to determine whether brands from established ENDS brand and flavor lists were mentioned in the Instagram posts in our sample. To prevent overfitting, we performed 5-fold cross-validation and reported the mean and SD of the model validation metrics across the folds. For brands, the residual convolutional neural network exhibited the highest mean precision (0.797, SD 0.084), and the FTDB exhibited the highest mean recall (0.869, SD 0.103). For flavors, the FTDB exhibited both the highest mean precision (0.860, SD 0.055) and recall (0.801, SD 0.091). All NER models outperformed the benchmark brand and flavor dictionary look-ups on mean precision, recall, and F1. Comparing between the benchmark brand lists, the larger Wikipedia list outperformed the Nielsen list in both precision and recall. Our findings suggest that NER models correctly identified ENDS brands and flavors in Instagram posts at rates competitive with, or better than, others in the published literature. Brands identified during manual annotation showed little overlap with those in Nielsen scanner data, suggesting that NER models may capture emerging brands with limited sales and distribution. NER models address the challenges of manual brand identification and can be used to support future infodemiology and infoveillance studies. Brands identified on social media should be cross-validated with Nielsen and other data sources to differentiate emerging brands that have become established from those with limited sales and distribution.

Sections du résumé

BACKGROUND
Electronic nicotine delivery system (ENDS) brands, such as JUUL, used social media as a key component of their marketing strategy, which led to massive sales growth from 2015 to 2018. During this time, ENDS use rapidly increased among youths and young adults, with flavored products being particularly popular among these groups.
OBJECTIVE
The aim of our study is to develop a named entity recognition (NER) model to identify potential emerging vaping brands and flavors from Instagram post text. NER is a natural language processing task for identifying specific types of words (entities) in text based on the characteristics of the entity and surrounding words.
METHODS
NER models were trained on a labeled data set of 2272 Instagram posts coded for ENDS brands and flavors. We compared three types of NER models-conditional random fields, a residual convolutional neural network, and a fine-tuned distilled bidirectional encoder representations from transformers (FTDB) network-to identify brands and flavors in Instagram posts with key model outcomes of precision, recall, and F1 scores. We used data from Nielsen scanner sales and Wikipedia to create benchmark dictionaries to determine whether brands from established ENDS brand and flavor lists were mentioned in the Instagram posts in our sample. To prevent overfitting, we performed 5-fold cross-validation and reported the mean and SD of the model validation metrics across the folds.
RESULTS
For brands, the residual convolutional neural network exhibited the highest mean precision (0.797, SD 0.084), and the FTDB exhibited the highest mean recall (0.869, SD 0.103). For flavors, the FTDB exhibited both the highest mean precision (0.860, SD 0.055) and recall (0.801, SD 0.091). All NER models outperformed the benchmark brand and flavor dictionary look-ups on mean precision, recall, and F1. Comparing between the benchmark brand lists, the larger Wikipedia list outperformed the Nielsen list in both precision and recall.
CONCLUSIONS
Our findings suggest that NER models correctly identified ENDS brands and flavors in Instagram posts at rates competitive with, or better than, others in the published literature. Brands identified during manual annotation showed little overlap with those in Nielsen scanner data, suggesting that NER models may capture emerging brands with limited sales and distribution. NER models address the challenges of manual brand identification and can be used to support future infodemiology and infoveillance studies. Brands identified on social media should be cross-validated with Nielsen and other data sources to differentiate emerging brands that have become established from those with limited sales and distribution.

Identifiants

pubmed: 35040793
pii: v24i1e30257
doi: 10.2196/30257
pmc: PMC8808345
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

e30257

Informations de copyright

©Rob Chew, Michael Wenger, Jamie Guillory, James Nonnemaker, Annice Kim. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 18.01.2022.

Références

J Med Internet Res. 2009 Mar 27;11(1):e11
pubmed: 19329408
Health Aff (Millwood). 2005 Nov-Dec;24(6):1601-10
pubmed: 16284034
Tob Control. 2019 Nov;28(6):603-609
pubmed: 30377241
Am J Health Behav. 2020 Jan 1;44(1):76-81
pubmed: 31783934
AMIA Annu Symp Proc. 2018 Apr 16;2017:1215-1224
pubmed: 29854190
Proc Natl Acad Sci U S A. 2020 Dec 1;117(48):30046-30054
pubmed: 32493748
Stud Health Technol Inform. 2017;245:322-326
pubmed: 29295108
Am J Prev Med. 2011 May;40(5 Suppl 2):S154-8
pubmed: 21521589
JAMA. 2019 Dec 3;322(21):2095-2103
pubmed: 31688912
Tob Control. 2020 Dec;29(e1):e87-e97
pubmed: 32217772
MMWR Morb Mortal Wkly Rep. 2018 Nov 16;67(45):1276-1277
pubmed: 30439875
J Med Internet Res. 2018 Mar 12;20(3):e80
pubmed: 29530840
MMWR Morb Mortal Wkly Rep. 2020 Nov 20;69(46):1736-1742
pubmed: 33211681
Pediatrics. 2019 Sep;144(3):
pubmed: 31451608
Tob Control. 2014 Jul;23 Suppl 3:iii3-9
pubmed: 24935895
MMWR Surveill Summ. 2019 Nov 06;68(12):1-22
pubmed: 31805035
Tob Regul Sci. 2018 Mar;4(2):30-43
pubmed: 30662930
IEEE Trans Neural Netw Learn Syst. 2014 May;25(5):845-69
pubmed: 24808033
J Am Med Inform Assoc. 2018 Jan 1;25(1):72-80
pubmed: 28505280
Am J Health Behav. 2021 May 1;45(3):402-418
pubmed: 33894792
Tob Control. 2020 May;29(Suppl 3):s147-s154
pubmed: 32321848
Prev Med. 2019 Sep;126:105775
pubmed: 31323286

Auteurs

Rob Chew (R)

Center for Data Science, RTI International, Research Triangle Park, NC, United States.

Michael Wenger (M)

Center for Data Science, RTI International, Research Triangle Park, NC, United States.

Jamie Guillory (J)

Center for Health Analytics, Media, and Policy, RTI International, Research Triangle Park, NC, United States.

James Nonnemaker (J)

Center for Health Analytics, Media, and Policy, RTI International, Research Triangle Park, NC, United States.

Annice Kim (A)

Center for Health Analytics, Media, and Policy, RTI International, Research Triangle Park, NC, United States.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH