Prediction of breast cancer proteins involved in immunotherapy, metastasis, and RNA-binding using molecular descriptors and artificial neural networks.
Journal
Scientific reports
ISSN: 2045-2322
Titre abrégé: Sci Rep
Pays: England
ID NLM: 101563288
Informations de publication
Date de publication:
22 05 2020
22 05 2020
Historique:
received:
02
11
2019
accepted:
28
04
2020
entrez:
24
5
2020
pubmed:
24
5
2020
medline:
2
12
2020
Statut:
epublish
Résumé
Breast cancer (BC) is a heterogeneous disease where genomic alterations, protein expression deregulation, signaling pathway alterations, hormone disruption, ethnicity and environmental determinants are involved. Due to the complexity of BC, the prediction of proteins involved in this disease is a trending topic in drug design. This work is proposing accurate prediction classifier for BC proteins using six sets of protein sequence descriptors and 13 machine-learning methods. After using a univariate feature selection for the mix of five descriptor families, the best classifier was obtained using multilayer perceptron method (artificial neural network) and 300 features. The performance of the model is demonstrated by the area under the receiver operating characteristics (AUROC) of 0.980 ± 0.0037, and accuracy of 0.936 ± 0.0056 (3-fold cross-validation). Regarding the prediction of 4,504 cancer-associated proteins using this model, the best ranked cancer immunotherapy proteins related to BC were RPS27, SUPT4H1, CLPSL2, POLR2K, RPL38, AKT3, CDK3, RPS20, RASL11A and UBTD1; the best ranked metastasis driver proteins related to BC were S100A9, DDA1, TXN, PRNP, RPS27, S100A14, S100A7, MAPK1, AGR3 and NDUFA13; and the best ranked RNA-binding proteins related to BC were S100A9, TXN, RPS27L, RPS27, RPS27A, RPL38, MRPL54, PPAN, RPS20 and CSRP1. This powerful model predicts several BC-related proteins that should be deeply studied to find new biomarkers and better therapeutic targets. Scripts can be downloaded at https://github.com/muntisa/neural-networks-for-breast-cancer-proteins.
Identifiants
pubmed: 32444848
doi: 10.1038/s41598-020-65584-y
pii: 10.1038/s41598-020-65584-y
pmc: PMC7244564
doi:
Substances chimiques
Biomarkers, Tumor
0
RNA
63231-63-0
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
8515Références
López-Cortés, A. et al. Breast cancer risk associated with gene expression and genotype polymorphisms of the folate-metabolizing MTHFR gene: a case-control study in a high altitude Ecuadorian mestizo population. Tumor Biol. 36, 6451–6461 (2015).
doi: 10.1007/s13277-015-3335-0
López-Cortés, A. et al. Mutational Analysis of Oncogenic AKT1 Gene Associated with Breast Cancer Risk in the High Altitude Ecuadorian Mestizo Population. Biomed Res. Int. 2018, 7463832 (2018).
pubmed: 30065942
pmcid: 6051326
doi: 10.1155/2018/7463832
Ding, L. et al. Perspective on Oncogenic Processes at the End of the Beginning of Cancer Genomics. Cell 173(305-320), e10 (2018).
Guerrero, S. et al. Analysis of Racial/Ethnic Representation in Select Basic and Applied Cancer Research Studies. Sci. Rep. 8, 13978 (2018).
pubmed: 30228363
pmcid: 6143551
doi: 10.1038/s41598-018-32264-x
López-Cortés, A., Guerrero, S., Redal, M. A., Alvarado, A. T. & Quiñones, L. A. State of art of cancer pharmacogenomics in Latin American populations. Int. J. Mol. Sci. 18, 639 (2017).
doi: 10.3390/ijms18060639
Quinones, L. et al. Perception of the Usefulness of Drug/Gene Pairs and Barriers for Pharmacogenomics in Latin America. Curr. Drug Metab. 15, 202–208 (2014).
pubmed: 24524664
doi: 10.2174/1389200215666140202220753
López-Cortés, A. et al. Pharmacogenomics, biomarker network, and allele frequencies in colorectal cancer. Pharmacogenomics Journal. 20, 136–158 (2020).
pubmed: 31616044
doi: 10.1038/s41397-019-0102-4
Bray, F. et al. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA. Cancer J. Clin. 68, 394–424 (2018).
pubmed: 30207593
doi: 10.3322/caac.21492
López-Cortés, A. et al. OncoOmics approaches to reveal essential genes in breast cancer: a panoramic view from pathogenesis to precision medicine. Sci. Rep. 10, 5285 (2020).
pubmed: 32210335
pmcid: 7093549
doi: 10.1038/s41598-020-62279-2
Bailey, M. H. et al. Comprehensive Characterization of Cancer Driver Genes and Mutations. Cell 173(371-385), e18 (2018).
Sanchez-Vega, F. et al. Oncogenic Signaling Pathways in The Cancer Genome Atlas. Cell 173(321-337), e10 (2018).
Berger, A. C. et al. A Comprehensive Pan-Cancer Molecular Study of Gynecologic and Breast Cancers. Cancer Cell 33, 690–705 (2018).
pubmed: 29622464
pmcid: 5959730
doi: 10.1016/j.ccell.2018.03.014
Koboldt, D. C. et al. Comprehensive molecular portraits of human breast tumours. Nature 490, 61–70 (2012).
doi: 10.1038/nature11412
Uhlen, M. et al. Towards a knowledge-based Human Protein Atlas. Nat. Biotechnol. 28, 1248–1250 (2010).
pubmed: 21139605
doi: 10.1038/nbt1210-1248
Uhlén, M. et al. Tissue-based map of the human proteome. Science. 347, 394–403 (2015).
doi: 10.1126/science.347.6217.39-d
Thul, P. J. & Lindskog, C. The human protein atlas: A spatial map of the human proteome. Protein Sci. 27, 233–244 (2018).
pubmed: 28940711
doi: 10.1002/pro.3307
Tsherniak, A. et al. Defining a Cancer Dependency Map. Cell 170(564-576), e16 (2017).
Meyers, R. M. et al. Computational correction of copy number effect improves specificity of CRISPR-Cas9 essentiality screens in cancer cells. Nat. Genet. 49, 1779–1784 (2017).
pubmed: 29083409
pmcid: 5709193
doi: 10.1038/ng.3984
McFarland, J. M. et al. Improved estimation of cancer dependencies from large-scale RNAi screens using model-based normalization and data integration. Nat. Commun. 9, 1–13 (2018).
doi: 10.1038/s41467-018-06916-5
Ivanov, A. A. et al. The OncoPPi Portal: An integrative resource to explore and prioritize protein-protein interactions for cancer target discovery. Bioinformatics. 34, 1183–1191 (2018).
pubmed: 29186335
doi: 10.1093/bioinformatics/btx743
López-Cortés, A. et al. Gene prioritization, communality analysis, networking and metabolic integrated pathway to better understand breast cancer pathogenesis. Sci. Rep. 8, 16679 (2018).
pubmed: 30420728
pmcid: 6232116
doi: 10.1038/s41598-018-35149-1
Bailey, M. H. et al. Comprehensive Characterization of Cancer Driver Genes and Mutations. Cell 173, 371–385 (2018).
pubmed: 29625053
pmcid: 6029450
doi: 10.1016/j.cell.2018.02.060
Thorn, C. F., Klein, T. E. & Altman, R. B. PharmGKB: The pharmacogenomics knowledge base. Methods Mol. Biol. 1015, 311–320 (2013).
pubmed: 23824865
pmcid: 4084821
doi: 10.1007/978-1-62703-435-7_20
Barbarino, J. M., Whirl-Carrillo, M., Altman, R. B. & Klein, T. E. PharmGKB: A worldwide resource for pharmacogenomic information. Wiley Interdisciplinary Reviews: Systems Biology and Medicine 10, e1417 (2018).
pubmed: 29474005
doi: 10.1002/wics.1417
Tamborero, D. et al. Cancer Genome Interpreter annotates the biological and clinical relevance of tumor alterations. Genome Med. 10, 25 (2018).
pubmed: 29592813
pmcid: 5875005
doi: 10.1186/s13073-018-0531-8
Cabrera-Andrade, A. Gene Prioritization through Consensus Strategy, Enrichment Methodologies Analysis, and Networking for Osteosarcoma Pathogenesis. Int. J. Mol. Sci. 21, 1–21 (2020).
doi: 10.3390/ijms21031053
Tejera, E. et al. Consensus strategy in genes prioritization and combined bioinformatics analysis for preeclampsia pathogenesis. BMC Med. Genomics 10, 50 (2017).
pubmed: 28789679
pmcid: 5549357
doi: 10.1186/s12920-017-0286-x
Ding, L. et al. Perspective on Oncogenic Processes at the End of the Beginning of Cancer Genomics. Cell 173, 305–320 (2018).
pubmed: 29625049
pmcid: 5916814
doi: 10.1016/j.cell.2018.03.033
Gao, Q. et al. Driver Fusions and Their Implications in the Development and Treatment of Human Cancers. Cell Rep. 23, 227–238 (2018).
pubmed: 29617662
pmcid: 5916809
doi: 10.1016/j.celrep.2018.03.050
Huang, K. lin et al. Pathogenic Germline Variants in 10,389 Adult Cancers. Cell 173, 355–370 (2018).
pubmed: 29625052
pmcid: 5949147
doi: 10.1016/j.cell.2018.03.039
Thorsson, V. et al. The Immune Landscape of Cancer. Immunity 48, 812–830 (2018).
pubmed: 29628290
pmcid: 5982584
doi: 10.1016/j.immuni.2018.03.023
Liu, J. et al. An Integrated TCGA Pan-Cancer Clinical Data Resource to Drive High-Quality Survival Outcome Analytics. Cell 173, 400–416 (2018).
pubmed: 29625055
pmcid: 6066282
doi: 10.1016/j.cell.2018.02.052
Reimand, J., Kull, M., Peterson, H., Hansen, J. & Vilo, J. G:Profiler-a web-based toolset for functional profiling of gene lists from large-scale experiments. Nucleic Acids Res. 35, 193–200 (2007).
doi: 10.1093/nar/gkm226
Posey, J. E. et al. Resolution of Disease Phenotypes Resulting from Multilocus Genomic Variation. N. Engl. J. Med. 376, 21–31 (2017).
pubmed: 27959697
doi: 10.1056/NEJMoa1516767
Patel, S. J. et al. Identification of essential genes for cancer immunotherapy. Nature 548, 537–542 (2017).
pubmed: 28783722
pmcid: 5870757
doi: 10.1038/nature23477
Manning, G., Whyte, D. B., Martinez, R., Hunter, T. & Sudarsanam, S. The protein kinase complement of the human genome. Science 298, 1912–1934 (2002).
pubmed: 12471243
doi: 10.1126/science.1075762
Bar-Joseph, Z. et al. Genome-wide transcriptional analysis of the human cell cycle identifies genes differentially regulated in normal and cancer cells. Proc. Natl. Acad. Sci. 105, 955–960 (2008).
pubmed: 18195366
doi: 10.1073/pnas.0704723105
Knijnenburg, T. A. et al. Genomic and Molecular Landscape of DNA Damage Repair Deficiency across The Cancer Genome Atlas. Cell Rep. 23(239-254), e6 (2018).
Hentze, M. W., Castello, A., Schwarzl, T. & Preiss, T. A brave new world of RNA-binding proteins. Nature Rev. Mol. Cell Biol. 19, 327–341 (2018).
doi: 10.1038/nrm.2017.130
Carvalho-Silva, D. et al. Open Targets Platform: New developments and updates two years on. Nucleic Acids Res. 47, D1056–D1065 (2019).
pubmed: 30462303
doi: 10.1093/nar/gky1133
Golbraikh, A., Wang, X. S., Zhu, H. & Tropsha, A. Predictive QSAR modeling: Methods and applications in drug discovery and chemical risk assessment. in Handbook of Computational Chemistry. https://doi.org/10.1007/978-3-319-27282-5_37 (2017).
Fernández-Blanco, E., Aguiar-Pulido, V., Robert Munteanu, C. & Dorado, J. Random Forest classification based on star graph topological indices for antioxidant proteins. J. Theor. Biol. 317, 331–307 (2013).
pubmed: 23116665
doi: 10.1016/j.jtbi.2012.10.006
Munteanu, C. R. et al. LECTINPred: Web server that uses complex networks of protein structure for prediction of lectins with potential use as cancer biomarkers or in parasite vaccine design. Mol. Inform. 33, 276–285 (2014).
pubmed: 27485774
doi: 10.1002/minf.201300027
Fernandez-Lozano, C. et al. Classification of signaling proteins based on molecular star graph descriptors using Machine Learning models. J. Theor. Biol. 384, 50–58 (2015).
pubmed: 26297890
doi: 10.1016/j.jtbi.2015.07.038
Blanco, J. L., Porto-Pazos, A. B., Pazos, A. & Fernandez-Lozano, C. Prediction of high anti-angiogenic activity peptides in silico using a generalized linear model and feature selection. Sci. Rep. 8, 15688 (2018).
pubmed: 30356060
pmcid: 6200741
doi: 10.1038/s41598-018-33911-z
Wei, L., Zhou, C., Chen, H., Song, J. & Su, R. ACPred-FL: a sequence-based predictor using effective feature representation to improve the prediction of anti-cancer peptides. Bioinformatics 34, 4007–4016 (2018).
pubmed: 29868903
pmcid: 6247924
Concu, R., Cordeiro, M. N. D. S., Munteanu, C. R. & González-Díaz, H. PTML Model of Enzyme Subclasses for Mining the Proteome of Biofuel Producing Microorganisms. J. Proteome Res. 18, 2735–2746 (2019).
pubmed: 31081631
doi: 10.1021/acs.jproteome.8b00949
Vilar, S., González-Díaz, H., Santana, L. & Uriarte, E. QSAR model for alignment-free prediction of human breast cancer biomarkers based on electrostatic potentials of protein pseudofolding HP-lattice networks. J. Comput. Chem. 16, 2613–2622 (2008).
doi: 10.1002/jcc.21016
Munteanu, C. R., Magalhães, A. L., Uriarte, E. & González-Díaz, H. Multi-target QPDR classification model for human breast and colon cancer-related proteins using star graph topological indices. J. Theor. Biol. 257, 303–311 (2009).
pubmed: 19111559
doi: 10.1016/j.jtbi.2008.11.017
Cao, D. S., Xiao, N., Xu, Q. S. & Chen, A. F. Rcpi: R/Bioconductor package to generate various descriptors of proteins, compounds and their interactions. Bioinformatics 31, 279–281 (2015).
pubmed: 25246429
doi: 10.1093/bioinformatics/btu624
Hao, J. & Ho, T. K. Machine Learning Made Easy: A Review of Scikit-learn Package in Python Programming Language. Journal of Educational and Behavioral Statistics 44, 348–361 (2019).
doi: 10.3102/1076998619832248
Jolliffe, I. T. Principal Component Analysis, Second Edition. Encycl. Stat. Behav. Sci. (2002).
Russell, S. & Norvig, P. Artificial Intelligence A Modern Approach Third Edition. Pearson (2010).
Cover, T. M. & Hart, P. E. Nearest Neighbor Pattern Classification. IEEE Trans. Inf. Theory 13, 21–27 (1967).
Mika, S., Ratsch, G., Weston, J., Scholkopf, B. & Muller, K. R. Fisher discriminant analysis with kernels. in Neural Networks for Signal Processing - Proceedings of the IEEE Workshop (1999).
Patle, A. & Chouhan, D. S. SVM kernel functions for classification. in 2013 International Conference on Advances in Technology and Engineering, ICATE 2013 (2013).
Peduzzi, P., Concato, J., Kemper, E., Holford, T. R. & Feinstem, A. R. A simulation study of the number of events per variable in logistic regression analysis. J. Clin. Epidemiol. 49, 1373–1379 (1996).
pubmed: 8970487
doi: 10.1016/S0895-4356(96)00236-3
White, B. W. & Rosenblatt, F. Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms. Am. J. Psychol. (1963).
Swain, P. H. & Hauska, H. DECISION TREE CLASSIFIER: DESIGN AND POTENTIAL. IEEE Trans Geosci Electron (1977).
Breiman L. Machine Learning, 45(1), 5–32. Stat. Dep. Univ. California, Berkeley, CA 94720. (2001).
Chen, T. & Guestrin, C. XGBoost: A Scalable Tree Boosting System (2016).
Friedman, J. H. Stochastic gradient boosting. Comput. Stat. Data Anal. 38, 367–378 (2002).
doi: 10.1016/S0167-9473(01)00065-2
Hughes, G. F. On the Mean Accuracy of Statistical Pattern Recognizers. IEEE Trans. Inf. Theory 14, 55–63 (1968).
doi: 10.1109/TIT.1968.1054102
Breiman, L. Bagging predictors. Mach. Learn. 24, 123–140 (1996).
Rocco, P. et al. OncoScore: A novel, Internet-based tool to assess the oncogenic potential of genes. Sci. Rep. 7, 46290 (2017).
pmcid: 5384236
doi: 10.1038/s41598-017-14484-9
Zheng, G. et al. HCMDB: The human cancer metastasis database. Nucleic Acids Res. 46, 950–955 (2018).
doi: 10.1093/nar/gkx1008
Chawla, N. V., Bowyer, K. W., Hall, L. O. & Kegelmeyer, W. P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002).
doi: 10.1613/jair.953
Bradley, A. P. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit. 30, 1145–1159 (1997).
doi: 10.1016/S0031-3203(96)00142-2
Gao, J. et al. Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal. Sci. Signal. 6, 11 (2013).
doi: 10.1126/scisignal.2004088
Cerami, E. et al. The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. Cancer Discov. 2, 401–404 (2012).
pubmed: 22588877
doi: 10.1158/2159-8290.CD-12-0095
Finotello, F., Rieder, D., Hackl, H. & Trajanoski, Z. Next-generation computational tools for interrogating cancer immunity. Nat. Rev. Genet. 20, 724–746 (2019).
pubmed: 31515541
doi: 10.1038/s41576-019-0166-7
Atsuta, Y. et al. Identification of metallopanstimulin-1 as a member of a tumor associated antigen in patients with breast cancer. Cancer Lett. 182, 101–107 (2002).
pubmed: 12175529
doi: 10.1016/S0304-3835(02)00068-X
Itamochi, H. et al. Whole-genome sequencing revealed novel prognostic biomarkers and promising targets for therapy of ovarian clear cell carcinoma. Br. J. Cancer 5, 717–724 (2017).
doi: 10.1038/bjc.2017.228
Angus, L. et al. The genomic landscape of metastatic breast cancer highlights changes in mutation and signature frequencies. Nat. Genet. 51, 1450–1458 (2019).
pubmed: 31570896
pmcid: 6858873
doi: 10.1038/s41588-019-0507-7
Caicedo, A. et al. MitoCeption as a new tool to assess the effects of mesenchymal stem/stromal cell mitochondria on cancer cell metabolism and function. Sci. Rep. 5, 9073 (2015).
pubmed: 25766410
pmcid: 4358056
doi: 10.1038/srep09073
Aponte, P. M. & Caicedo, A. Stemness in cancer: Stem cells, cancer stem cells, and their microenvironment. Stem Cells International 2017, 5619472 (2017).
pubmed: 28473858
pmcid: 5394399
doi: 10.1155/2017/5619472
Fokas, E., Engenhart-Cabillic, R., Daniilidis, K., Rose, F. & An, H. X. Metastasis: The seed and soil theory gains identity. Cancer and Metastasis Reviews 26, 3–4 (2007).
doi: 10.1007/s10555-007-9088-5
Schell, M. J. et al. A composite gene expression signature optimizes prediction of colorectal cancer metastasis and outcome. Clin. Cancer Res. 22, 734–745 (2016).
pubmed: 26446941
doi: 10.1158/1078-0432.CCR-15-0143
Lee, J. Y. et al. Mutational profiling of brain metastasis from breast cancer: Matched pair analysis of targeted sequencing between brain metastasis and primary breast cancer. Oncotarget 6, 43731–43742 (2015).
pubmed: 26527317
pmcid: 4791262
doi: 10.18632/oncotarget.6192
Bergenfelz, C. et al. S100A9 expressed in ER-PgR-breast cancers induces inflammatory cytokines and is associated with an impaired overall survival. Br. J. Cancer 113, 1234–1243 (2015).
pubmed: 26448179
pmcid: 4647879
doi: 10.1038/bjc.2015.346
García-cárdenas, J. M. et al. Post-transcriptional Regulation of Colorectal Cancer: A Focus on RNA-Binding. Proteins. 6, 1–18 (2019).
Burd, C. G. & Dreyfuss, G. Conserved structures and diversity of functions of RNA-binding proteins. Science 265, 615–621 (1994).
pubmed: 8036511
doi: 10.1126/science.8036511
Lukong, K. E. & Chang, K. wei, Khandjian, E. W. & Richard, S. RNA-binding proteins in human genetic disease. Trends in Genetics 24, 416–425 (2008).
pubmed: 18597886
doi: 10.1016/j.tig.2008.05.004
Kechavarzi, B. & Janga, S. C. Dissecting the expression landscape of RNA-binding proteins in human cancers. Genome Biol. 15, R14 (2014).
pubmed: 24410894
pmcid: 4053825
doi: 10.1186/gb-2014-15-1-r14
Guerrero, S. et al. In silico analyses reveal new putative Breast Cancer RNA-binding proteins. bioRxiv (2020).
Rodrigues, P. et al. Oxidative stress in susceptibility to breast cancer: Study in Spanish population. BMC Cancer 14, 861 (2014).
pubmed: 25416100
pmcid: 4251690
doi: 10.1186/1471-2407-14-861