Leveraging Bayesian networks and information theory to learn risk factors for breast cancer metastasis.


Journal

BMC bioinformatics
ISSN: 1471-2105
Titre abrégé: BMC Bioinformatics
Pays: England
ID NLM: 100965194

Informations de publication

Date de publication:
10 Jul 2020
Historique:
received: 06 06 2019
accepted: 02 07 2020
entrez: 12 7 2020
pubmed: 12 7 2020
medline: 21 8 2020
Statut: epublish

Résumé

Even though we have established a few risk factors for metastatic breast cancer (MBC) through epidemiologic studies, these risk factors have not proven to be effective in predicting an individual's risk of developing metastasis. Therefore, identifying critical risk factors for MBC continues to be a major research imperative, and one which can lead to advances in breast cancer clinical care. The objective of this research is to leverage Bayesian Networks (BN) and information theory to identify key risk factors for breast cancer metastasis from data. We develop the Markov Blanket and Interactive risk factor Learner (MBIL) algorithm, which learns single and interactive risk factors having a direct influence on a patient's outcome. We evaluate the effectiveness of MBIL using simulated datasets, and compare MBIL with the BN learning algorithms Fast Greedy Search (FGS), PC algorithm (PC), and CPC algorithm (CPC). We apply MBIL to learn risk factors for 5 year breast cancer metastasis using a clinical dataset we curated. We evaluate the learned risk factors by consulting with breast cancer experts and literature. We further evaluate the effectiveness of MBIL at learning risk factors for breast cancer metastasis by comparing it to the BN learning algorithms Necessary Path Condition (NPC) and Greedy Equivalent Search (GES). The averages of the Jaccard index for the simulated datasets containing 2000 records were 0.705, 0.272, 0.228, and 0.147 for MBIL, FGS, PC, and CPC respectively. MBIL, NPC, and GES all learned that grade and lymph_nodes_positive are direct risk factors for 5 year metastasis. Only MBIL and NPC found that surgical_margins is a direct risk factor. Only NPC found that invasive is a direct risk factor. MBIL learned that HER2 and ER interact to directly affect 5 year metastasis. Neither GES nor NPC learned that HER2 and ER are direct risk factors. The results involving simulated datasets indicated that MBIL can learn direct risk factors substantially better than standard Bayesian network learning algorithms. An application of MBIL to a real breast cancer dataset identified both single and interactive risk factors that directly influence breast cancer metastasis, which can be investigated further.

Sections du résumé

BACKGROUND BACKGROUND
Even though we have established a few risk factors for metastatic breast cancer (MBC) through epidemiologic studies, these risk factors have not proven to be effective in predicting an individual's risk of developing metastasis. Therefore, identifying critical risk factors for MBC continues to be a major research imperative, and one which can lead to advances in breast cancer clinical care. The objective of this research is to leverage Bayesian Networks (BN) and information theory to identify key risk factors for breast cancer metastasis from data.
METHODS METHODS
We develop the Markov Blanket and Interactive risk factor Learner (MBIL) algorithm, which learns single and interactive risk factors having a direct influence on a patient's outcome. We evaluate the effectiveness of MBIL using simulated datasets, and compare MBIL with the BN learning algorithms Fast Greedy Search (FGS), PC algorithm (PC), and CPC algorithm (CPC). We apply MBIL to learn risk factors for 5 year breast cancer metastasis using a clinical dataset we curated. We evaluate the learned risk factors by consulting with breast cancer experts and literature. We further evaluate the effectiveness of MBIL at learning risk factors for breast cancer metastasis by comparing it to the BN learning algorithms Necessary Path Condition (NPC) and Greedy Equivalent Search (GES).
RESULTS RESULTS
The averages of the Jaccard index for the simulated datasets containing 2000 records were 0.705, 0.272, 0.228, and 0.147 for MBIL, FGS, PC, and CPC respectively. MBIL, NPC, and GES all learned that grade and lymph_nodes_positive are direct risk factors for 5 year metastasis. Only MBIL and NPC found that surgical_margins is a direct risk factor. Only NPC found that invasive is a direct risk factor. MBIL learned that HER2 and ER interact to directly affect 5 year metastasis. Neither GES nor NPC learned that HER2 and ER are direct risk factors.
DISCUSSION CONCLUSIONS
The results involving simulated datasets indicated that MBIL can learn direct risk factors substantially better than standard Bayesian network learning algorithms. An application of MBIL to a real breast cancer dataset identified both single and interactive risk factors that directly influence breast cancer metastasis, which can be investigated further.

Identifiants

pubmed: 32650714
doi: 10.1186/s12859-020-03638-8
pii: 10.1186/s12859-020-03638-8
pmc: PMC7350636
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

298

Subventions

Organisme : BLRD VA
ID : I01 BX003368
Pays : United States
Organisme : U.S. Department of Defense
ID : W81XWH-19-1-0495
Organisme : U.S. National Library of Medicine
ID : R01LM011663

Références

Endocr Rev. 2008 Apr;29(2):217-33
pubmed: 18216219
Cell Res. 2008 Mar;18(3):350-9
pubmed: 18270520
RNA. 2008 Nov;14(11):2348-60
pubmed: 18812439
Breast Cancer Res Treat. 2018 Apr;168(2):577-578
pubmed: 29270699
J Natl Cancer Inst. 1995 Nov 15;87(22):1681-5
pubmed: 7473816
Int J Clin Pract. 2007 Dec;61(12):2051-63
pubmed: 17892469
PLoS One. 2015 Dec 01;10(12):e0143247
pubmed: 26624895
Breast Cancer Res Treat. 2017 Jan;161(2):279-287
pubmed: 27888421
BMC Bioinformatics. 2016 May 26;17(1):221
pubmed: 27230078
Breast Cancer Res Treat. 2017 Oct;165(3):743-750
pubmed: 28689363
Science. 2003 Jul 18;301(5631):336-8
pubmed: 12869753
PLoS One. 2017 Aug 9;12(8):e0182666
pubmed: 28793339
Cell. 2006 Nov 17;127(4):679-95
pubmed: 17110329
Clin Med Res. 2009 Jun;7(1-2):4-13
pubmed: 19574486
Nature. 2012 Apr 18;486(7403):346-52
pubmed: 22522925
BioData Min. 2012 Oct 01;5(1):16
pubmed: 23025260

Auteurs

Xia Jiang (X)

Department of Biomedical Informatics, University of Pittsburgh School of Medicine, 5607 Baum Blvd, Pittsburgh, PA, 15217, USA. xij6@pitt.edu.

Alan Wells (A)

Department of Pathology, University of Pittsburgh and Pittsburgh VA Health System, Pittsburgh, PA, USA.
UPMC Hillman Cancer Center, Pittsburgh, PA, USA.

Adam Brufsky (A)

UPMC Hillman Cancer Center, Pittsburgh, PA, USA.
Division of Hematology/Oncology, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA.

Darshan Shetty (D)

Department of Biomedical Informatics, University of Pittsburgh School of Medicine, 5607 Baum Blvd, Pittsburgh, PA, 15217, USA.

Kahmil Shajihan (K)

Department of Biomedical Informatics, University of Pittsburgh School of Medicine, 5607 Baum Blvd, Pittsburgh, PA, 15217, USA.

Richard E Neapolitan (RE)

Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL, USA.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH