Identifying Electronic Nicotine Delivery System Brands and Flavors on Instagram: Natural Language Processing Analysis.

Adolescent Electronic Nicotine Delivery Systems Humans Infodemiology Natural Language Processing Social Media Vaping Young Adult

ENDS brands flavors named entity recognition social media

Journal

Journal of medical Internet research

ISSN: 1438-8871

Titre abrégé: J Med Internet Res

Pays: Canada

ID NLM: 100959882

Informations de publication

Date de publication:
18 01 2022

Historique:

received: 09 05 2021

accepted: 21 11 2021

revised: 01 11 2021

entrez: 18 1 2022

pubmed: 19 1 2022

medline: 27 1 2022

Statut: epublish

Résumé

Electronic nicotine delivery system (ENDS) brands, such as JUUL, used social media as a key component of their marketing strategy, which led to massive sales growth from 2015 to 2018. During this time, ENDS use rapidly increased among youths and young adults, with flavored products being particularly popular among these groups. The aim of our study is to develop a named entity recognition (NER) model to identify potential emerging vaping brands and flavors from Instagram post text. NER is a natural language processing task for identifying specific types of words (entities) in text based on the characteristics of the entity and surrounding words. NER models were trained on a labeled data set of 2272 Instagram posts coded for ENDS brands and flavors. We compared three types of NER models-conditional random fields, a residual convolutional neural network, and a fine-tuned distilled bidirectional encoder representations from transformers (FTDB) network-to identify brands and flavors in Instagram posts with key model outcomes of precision, recall, and F1 scores. We used data from Nielsen scanner sales and Wikipedia to create benchmark dictionaries to determine whether brands from established ENDS brand and flavor lists were mentioned in the Instagram posts in our sample. To prevent overfitting, we performed 5-fold cross-validation and reported the mean and SD of the model validation metrics across the folds. For brands, the residual convolutional neural network exhibited the highest mean precision (0.797, SD 0.084), and the FTDB exhibited the highest mean recall (0.869, SD 0.103). For flavors, the FTDB exhibited both the highest mean precision (0.860, SD 0.055) and recall (0.801, SD 0.091). All NER models outperformed the benchmark brand and flavor dictionary look-ups on mean precision, recall, and F1. Comparing between the benchmark brand lists, the larger Wikipedia list outperformed the Nielsen list in both precision and recall. Our findings suggest that NER models correctly identified ENDS brands and flavors in Instagram posts at rates competitive with, or better than, others in the published literature. Brands identified during manual annotation showed little overlap with those in Nielsen scanner data, suggesting that NER models may capture emerging brands with limited sales and distribution. NER models address the challenges of manual brand identification and can be used to support future infodemiology and infoveillance studies. Brands identified on social media should be cross-validated with Nielsen and other data sources to differentiate emerging brands that have become established from those with limited sales and distribution.

Sections du résumé

BACKGROUND

OBJECTIVE

The aim of our study is to develop a named entity recognition (NER) model to identify potential emerging vaping brands and flavors from Instagram post text. NER is a natural language processing task for identifying specific types of words (entities) in text based on the characteristics of the entity and surrounding words.

METHODS

NER models were trained on a labeled data set of 2272 Instagram posts coded for ENDS brands and flavors. We compared three types of NER models-conditional random fields, a residual convolutional neural network, and a fine-tuned distilled bidirectional encoder representations from transformers (FTDB) network-to identify brands and flavors in Instagram posts with key model outcomes of precision, recall, and F1 scores. We used data from Nielsen scanner sales and Wikipedia to create benchmark dictionaries to determine whether brands from established ENDS brand and flavor lists were mentioned in the Instagram posts in our sample. To prevent overfitting, we performed 5-fold cross-validation and reported the mean and SD of the model validation metrics across the folds.

RESULTS

For brands, the residual convolutional neural network exhibited the highest mean precision (0.797, SD 0.084), and the FTDB exhibited the highest mean recall (0.869, SD 0.103). For flavors, the FTDB exhibited both the highest mean precision (0.860, SD 0.055) and recall (0.801, SD 0.091). All NER models outperformed the benchmark brand and flavor dictionary look-ups on mean precision, recall, and F1. Comparing between the benchmark brand lists, the larger Wikipedia list outperformed the Nielsen list in both precision and recall.

CONCLUSIONS

Our findings suggest that NER models correctly identified ENDS brands and flavors in Instagram posts at rates competitive with, or better than, others in the published literature. Brands identified during manual annotation showed little overlap with those in Nielsen scanner data, suggesting that NER models may capture emerging brands with limited sales and distribution. NER models address the challenges of manual brand identification and can be used to support future infodemiology and infoveillance studies. Brands identified on social media should be cross-validated with Nielsen and other data sources to differentiate emerging brands that have become established from those with limited sales and distribution.

Identifiants

DOI: 10.2196/30257 PMID: 35040793 PMC: PMC8808345

pubmed: 35040793

pii: v24i1e30257

doi: 10.2196/30257

pmc: PMC8808345

doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

Pagination

e30257

Informations de copyright

©Rob Chew, Michael Wenger, Jamie Guillory, James Nonnemaker, Annice Kim. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 18.01.2022.

Références

J Med Internet Res. 2009 Mar 27;11(1):e11

pubmed: 19329408

Health Aff (Millwood). 2005 Nov-Dec;24(6):1601-10

pubmed: 16284034

Tob Control. 2019 Nov;28(6):603-609

pubmed: 30377241

Am J Health Behav. 2020 Jan 1;44(1):76-81

pubmed: 31783934

AMIA Annu Symp Proc. 2018 Apr 16;2017:1215-1224

pubmed: 29854190

Proc Natl Acad Sci U S A. 2020 Dec 1;117(48):30046-30054

pubmed: 32493748

Stud Health Technol Inform. 2017;245:322-326

pubmed: 29295108

Am J Prev Med. 2011 May;40(5 Suppl 2):S154-8

pubmed: 21521589

JAMA. 2019 Dec 3;322(21):2095-2103

pubmed: 31688912

Tob Control. 2020 Dec;29(e1):e87-e97

pubmed: 32217772

MMWR Morb Mortal Wkly Rep. 2018 Nov 16;67(45):1276-1277

pubmed: 30439875

J Med Internet Res. 2018 Mar 12;20(3):e80

pubmed: 29530840

MMWR Morb Mortal Wkly Rep. 2020 Nov 20;69(46):1736-1742

pubmed: 33211681

Pediatrics. 2019 Sep;144(3):

pubmed: 31451608

Tob Control. 2014 Jul;23 Suppl 3:iii3-9

pubmed: 24935895

MMWR Surveill Summ. 2019 Nov 06;68(12):1-22

pubmed: 31805035

Tob Regul Sci. 2018 Mar;4(2):30-43

pubmed: 30662930

IEEE Trans Neural Netw Learn Syst. 2014 May;25(5):845-69

pubmed: 24808033

J Am Med Inform Assoc. 2018 Jan 1;25(1):72-80

pubmed: 28505280

Am J Health Behav. 2021 May 1;45(3):402-418

pubmed: 33894792

Tob Control. 2020 May;29(Suppl 3):s147-s154

pubmed: 32321848

Prev Med. 2019 Sep;126:105775

pubmed: 31323286

Identifying Electronic Nicotine Delivery System Brands and Flavors on Instagram: Natural Language Processing Analysis.

Journal

Informations de publication

Résumé

Sections du résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Pagination

Informations de copyright

Références

Auteurs

Rob Chew (R)

Michael Wenger (M)

Jamie Guillory (J)

James Nonnemaker (J)

Annice Kim (A)

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Smoking Cessation and Incident Cardiovascular Disease.

Evaluation of Low-Value Services Across Major Medicare Advantage Insurers and Traditional Medicare.

Effectiveness of Virtual Yoga for Chronic Low Back Pain: A Randomized Clinical Trial.

Classifications MeSH