Analyzing a co-occurrence gene-interaction network to identify disease-gene association.
Biological NLP
Biomedical literature
Disease-gene association
Genetic network
Text mining
Journal
BMC bioinformatics
ISSN: 1471-2105
Titre abrégé: BMC Bioinformatics
Pays: England
ID NLM: 100965194
Informations de publication
Date de publication:
08 Feb 2019
08 Feb 2019
Historique:
received:
23
04
2018
accepted:
17
01
2019
entrez:
10
2
2019
pubmed:
10
2
2019
medline:
19
3
2019
Statut:
epublish
Résumé
Understanding the genetic networks and their role in chronic diseases (e.g., cancer) is one of the important objectives of biological researchers. In this work, we present a text mining system that constructs a gene-gene-interaction network for the entire human genome and then performs network analysis to identify disease-related genes. We recognize the interacting genes based on their co-occurrence frequency within the biomedical literature and by employing linear and non-linear rare-event classification models. We analyze the constructed network of genes by using different network centrality measures to decide on the importance of each gene. Specifically, we apply betweenness, closeness, eigenvector, and degree centrality metrics to rank the central genes of the network and to identify possible cancer-related genes. We evaluated the top 15 ranked genes for different cancer types (i.e., Prostate, Breast, and Lung Cancer). The average precisions for identifying breast, prostate, and lung cancer genes vary between 80-100%. On a prostate case study, the system predicted an average of 80% prostate-related genes. The results show that our system has the potential for improving the prediction accuracy of identifying gene-gene interaction and disease-gene associations. We also conduct a prostate cancer case study by using the threshold property in logistic regression, and we compare our approach with some of the state-of-the-art methods.
Sections du résumé
BACKGROUND
BACKGROUND
Understanding the genetic networks and their role in chronic diseases (e.g., cancer) is one of the important objectives of biological researchers. In this work, we present a text mining system that constructs a gene-gene-interaction network for the entire human genome and then performs network analysis to identify disease-related genes. We recognize the interacting genes based on their co-occurrence frequency within the biomedical literature and by employing linear and non-linear rare-event classification models. We analyze the constructed network of genes by using different network centrality measures to decide on the importance of each gene. Specifically, we apply betweenness, closeness, eigenvector, and degree centrality metrics to rank the central genes of the network and to identify possible cancer-related genes.
RESULTS
RESULTS
We evaluated the top 15 ranked genes for different cancer types (i.e., Prostate, Breast, and Lung Cancer). The average precisions for identifying breast, prostate, and lung cancer genes vary between 80-100%. On a prostate case study, the system predicted an average of 80% prostate-related genes.
CONCLUSIONS
CONCLUSIONS
The results show that our system has the potential for improving the prediction accuracy of identifying gene-gene interaction and disease-gene associations. We also conduct a prostate cancer case study by using the threshold property in logistic regression, and we compare our approach with some of the state-of-the-art methods.
Identifiants
pubmed: 30736752
doi: 10.1186/s12859-019-2634-7
pii: 10.1186/s12859-019-2634-7
pmc: PMC6368766
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
70Subventions
Organisme : AARE
ID : 843401
Références
Genomics. 1990 Feb;6(2):389-91
pubmed: 12134874
Nucleic Acids Res. 2003 Jan 1;31(1):291-3
pubmed: 12520005
Genome Res. 2003 Nov;13(11):2498-504
pubmed: 14597658
Nucleic Acids Res. 2004 Jan 1;32(Database issue):D258-61
pubmed: 14681407
Nat Rev Cancer. 2004 Mar;4(3):177-83
pubmed: 14993899
Nucleic Acids Res. 2005 Jan 1;33(Database issue):D514-7
pubmed: 15608251
Proc IEEE Comput Soc Bioinform Conf. 2002;1:109-17
pubmed: 15838128
AMIA Annu Symp Proc. 2006;:1123
pubmed: 17238742
Bioinformatics. 2008 Jul 1;24(13):i277-85
pubmed: 18586725
Bioinformatics. 2009 Nov 15;25(22):3045-6
pubmed: 19744993
J Biomed Inform. 2013 Apr;46(2):200-11
pubmed: 23159498
Database (Oxford). 2013 Apr 12;2013:bat018
pubmed: 23584832
Nucleic Acids Res. 2013 Jul;41(Web Server issue):W510-7
pubmed: 23761452
J Am Med Inform Assoc. 2014 May-Jun;21(3):399-405
pubmed: 23999671
Drug Discov Today. 2014 Jul;19(7):882-9
pubmed: 24201223
Methods Mol Biol. 2014;1159:11-31
pubmed: 24788259
BMC Bioinformatics. 2014 Sep 17;15:304
pubmed: 25228247
Nucleic Acids Res. 2015 Jan;43(Database issue):D447-52
pubmed: 25352553
Methods. 2015 Mar;74:83-9
pubmed: 25484339
IEEE J Biomed Health Inform. 2015 Nov;19(6):1918-28
pubmed: 25616086
Semin Cancer Biol. 2015 Dec;35 Suppl:S25-S54
pubmed: 25892662
Semin Cancer Biol. 2015 Dec;35 Suppl:S78-S103
pubmed: 25936818
Bioinformatics. 2016 Jan 1;32(1):106-13
pubmed: 26338771
IEEE/ACM Trans Comput Biol Bioinform. 2016 May-Jun;13(3):494-504
pubmed: 26415184
Nucleic Acids Res. 2017 Jan 4;45(D1):D877-D887
pubmed: 27899610
Sci Rep. 2017 Nov 17;7(1):15784
pubmed: 29150626