Extracting Multiple Worries From Breast Cancer Patient Blogs Using Multilabel Classification With the Natural Language Processing Model Bidirectional Encoder Representations From Transformers: Infodemiology Study of Blogs.

BERT model NLP artificial intelligence blog post breast cancer breast neoplasm cancer content analysis machine learning model natural language processing oncology patient data peer support quality of life sentiment analysis social media social support text mining

Journal

JMIR cancer
ISSN: 2369-1999
Titre abrégé: JMIR Cancer
Pays: Canada
ID NLM: 101666844

Informations de publication

Date de publication:
03 Jun 2022
Historique:
received: 11 03 2022
accepted: 23 05 2022
revised: 10 05 2022
entrez: 3 6 2022
pubmed: 4 6 2022
medline: 4 6 2022
Statut: epublish

Résumé

Patients with breast cancer have a variety of worries and need multifaceted information support. Their accumulated posts on social media contain rich descriptions of their daily worries concerning issues such as treatment, family, and finances. It is important to identify these issues to help patients with breast cancer to resolve their worries and obtain reliable information. This study aimed to extract and classify multiple worries from text generated by patients with breast cancer using Bidirectional Encoder Representations From Transformers (BERT), a context-aware natural language processing model. A total of 2272 blog posts by patients with breast cancer in Japan were collected. Five worry labels, "treatment," "physical," "psychological," "work/financial," and "family/friends," were defined and assigned to each post. Multiple labels were allowed. To assess the label criteria, 50 blog posts were randomly selected and annotated by two researchers with medical knowledge. After the interannotator agreement had been assessed by means of Cohen kappa, one researcher annotated all the blogs. A multilabel classifier that simultaneously predicts five worries in a text was developed using BERT. This classifier was fine-tuned by using the posts as input and adding a classification layer to the pretrained BERT. The performance was evaluated for precision using the average of 5-fold cross-validation results. Among the blog posts, 477 included "treatment," 1138 included "physical," 673 included "psychological," 312 included "work/financial," and 283 included "family/friends." The interannotator agreement values were 0.67 for "treatment," 0.76 for "physical," 0.56 for "psychological," 0.73 for "work/financial," and 0.73 for "family/friends," indicating a high degree of agreement. Among all blog posts, 544 contained no label, 892 contained one label, and 836 contained multiple labels. It was found that the worries varied from user to user, and the worries posted by the same user changed over time. The model performed well, though prediction performance differed for each label. The values of precision were 0.59 for "treatment," 0.82 for "physical," 0.64 for "psychological," 0.67 for "work/financial," and 0.58 for "family/friends." The higher the interannotator agreement and the greater the number of posts, the higher the precision tended to be. This study showed that the BERT model can extract multiple worries from text generated from patients with breast cancer. This is the first application of a multilabel classifier using the BERT model to extract multiple worries from patient-generated text. The results will be helpful to identify breast cancer patients' worries and give them timely social support.

Sections du résumé

BACKGROUND BACKGROUND
Patients with breast cancer have a variety of worries and need multifaceted information support. Their accumulated posts on social media contain rich descriptions of their daily worries concerning issues such as treatment, family, and finances. It is important to identify these issues to help patients with breast cancer to resolve their worries and obtain reliable information.
OBJECTIVE OBJECTIVE
This study aimed to extract and classify multiple worries from text generated by patients with breast cancer using Bidirectional Encoder Representations From Transformers (BERT), a context-aware natural language processing model.
METHODS METHODS
A total of 2272 blog posts by patients with breast cancer in Japan were collected. Five worry labels, "treatment," "physical," "psychological," "work/financial," and "family/friends," were defined and assigned to each post. Multiple labels were allowed. To assess the label criteria, 50 blog posts were randomly selected and annotated by two researchers with medical knowledge. After the interannotator agreement had been assessed by means of Cohen kappa, one researcher annotated all the blogs. A multilabel classifier that simultaneously predicts five worries in a text was developed using BERT. This classifier was fine-tuned by using the posts as input and adding a classification layer to the pretrained BERT. The performance was evaluated for precision using the average of 5-fold cross-validation results.
RESULTS RESULTS
Among the blog posts, 477 included "treatment," 1138 included "physical," 673 included "psychological," 312 included "work/financial," and 283 included "family/friends." The interannotator agreement values were 0.67 for "treatment," 0.76 for "physical," 0.56 for "psychological," 0.73 for "work/financial," and 0.73 for "family/friends," indicating a high degree of agreement. Among all blog posts, 544 contained no label, 892 contained one label, and 836 contained multiple labels. It was found that the worries varied from user to user, and the worries posted by the same user changed over time. The model performed well, though prediction performance differed for each label. The values of precision were 0.59 for "treatment," 0.82 for "physical," 0.64 for "psychological," 0.67 for "work/financial," and 0.58 for "family/friends." The higher the interannotator agreement and the greater the number of posts, the higher the precision tended to be.
CONCLUSIONS CONCLUSIONS
This study showed that the BERT model can extract multiple worries from text generated from patients with breast cancer. This is the first application of a multilabel classifier using the BERT model to extract multiple worries from patient-generated text. The results will be helpful to identify breast cancer patients' worries and give them timely social support.

Identifiants

pubmed: 35657664
pii: v8i2e37840
doi: 10.2196/37840
pmc: PMC9206207
doi:

Types de publication

Journal Article

Langues

eng

Pagination

e37840

Informations de copyright

©Tomomi Watanabe, Shuntaro Yada, Eiji Aramaki, Hiroshi Yajima, Hayato Kizaki, Satoko Hori. Originally published in JMIR Cancer (https://cancer.jmir.org), 03.06.2022.

Références

Annu Int Conf IEEE Eng Med Biol Soc. 2020 Jul;2020:5847-5850
pubmed: 33019303
J Med Internet Res. 2020 Jan 23;22(1):e16816
pubmed: 32012074
JMIR Med Inform. 2017 Jul 31;5(3):e23
pubmed: 28760725
PLoS One. 2021 Nov 9;16(11):e0259763
pubmed: 34752490
Int J Med Inform. 2017 Sep;105:110-120
pubmed: 28750904
Support Care Cancer. 2020 Oct;28(10):4789-4801
pubmed: 31974768
JMIR Med Inform. 2018 Nov 29;6(4):e45
pubmed: 30497991
PLoS One. 2022 May 4;17(5):e0267901
pubmed: 35507636
J Biomed Inform. 2017 Oct;74:59-70
pubmed: 28864104
Patient Educ Couns. 2013 Sep;92(3):413-7
pubmed: 23891419
CA Cancer J Clin. 2021 May;71(3):209-249
pubmed: 33538338
J Oncol Pract. 2019 Feb;15(2):106-107
pubmed: 30523754
J Med Internet Res. 2020 Apr 28;22(4):e16206
pubmed: 32310818
Yearb Med Inform. 2017 Aug;26(1):214-227
pubmed: 29063568
Patient Educ Couns. 2001 Dec 1;45(3):195-8
pubmed: 11722855
Biometrics. 1977 Mar;33(1):159-74
pubmed: 843571

Auteurs

Tomomi Watanabe (T)

Division of Drug Informatics, Keio University Faculty of Pharmacy, Tokyo, Japan.

Shuntaro Yada (S)

Nara Institute of Science and Technology, Nara, Japan.

Eiji Aramaki (E)

Nara Institute of Science and Technology, Nara, Japan.

Hiroshi Yajima (H)

Mediaid Corporation, Tokyo, Japan.

Hayato Kizaki (H)

Division of Drug Informatics, Keio University Faculty of Pharmacy, Tokyo, Japan.

Satoko Hori (S)

Division of Drug Informatics, Keio University Faculty of Pharmacy, Tokyo, Japan.

Classifications MeSH