Supervised Methods for Biomarker Detection from Microarray Experiments.

Biomarkers Biomedical Research Microarray Analysis

Biological validation Biomarker Classifier Data unbalancing Feature selection Hyperparameter estimation Microarray Model selection Multiomics Validation metrics

Journal

Methods in molecular biology (Clifton, N.J.)

ISSN: 1940-6029

Titre abrégé: Methods Mol Biol

Pays: United States

ID NLM: 9214969

Informations de publication

Date de publication:
2022

Historique:

entrez: 13 12 2021

pubmed: 14 12 2021

medline: 27 1 2022

Statut: ppublish

Résumé

Biomarkers are valuable indicators of the state of a biological system. Microarray technology has been extensively used to identify biomarkers and build computational predictive models for disease prognosis, drug sensitivity and toxicity evaluations. Activation biomarkers can be used to understand the underlying signaling cascades, mechanisms of action and biological cross talk. Biomarker detection from microarray data requires several considerations both from the biological and computational points of view. In this chapter, we describe the main methodology used in biomarkers discovery and predictive modeling and we address some of the related challenges. Moreover, we discuss biomarker validation and give some insights into multiomics strategies for biomarker detection.

Identifiants

DOI: 10.1007/978-1-0716-1839-4_8 PMID: 34902125

pubmed: 34902125

doi: 10.1007/978-1-0716-1839-4_8

doi:

Substances chimiques

Biomarkers 0

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

Pagination

101-120

Informations de copyright

Références

Strimbu K, Tavel JA (2010) What are biomarkers? Curr Opin HIV AIDS 5:463–466

pubmed: 20978388 pmcid: 3078627

Gupta RC (2014) Introduction. In: Biomarkers in toxicology. Elsevier, pp 3–5

Califf RM (2018) Biomarker definitions and their applications. Exp Biol Med 243:213–221

Torres R, Judson-Torres RL (2019) Research techniques made simple: feature selection for biomarker discovery. J Invest Dermatol 139:2068–2074.e1

pubmed: 31543209

Shahrjooihaghighi A, Frigui H, Zhang X et al (2017) An ensemble feature selection method for biomarker discovery. Proc IEEE Int Symp Signal Proc Inf Tech 2017:416–421

pubmed: 30887013

Deng X, Campagne F (2010) Introduction to the development and validation of predictive biomarker models from high-throughput data sets. Methods Mol Biol 620:435–470

pubmed: 20652515

McDermott JE, Wang J, Mitchell H et al (2013) Challenges in biomarker discovery: combining expert insights with statistical analysis of complex omics data. Expert Opin Med Diagn 7:37–51

pubmed: 23335946 pmcid: 3548234

Piatetsky-Shapiro G, Tamayo P (2003) Microarray data mining. SIGKDD Explor Newsl 5:1

Deyati A, Younesi E, Hofmann-Apitius M et al (2013) Challenges and opportunities for oncology biomarker discovery. Drug Discov Today 18:614–624

pubmed: 23280501

Kinaret PAS, Serra A, Federico A et al (2020) Transcriptomics in toxicogenomics, part I: experimental design, technologies, publicly available data, and regulatory aspects. Nanomaterials 10:750

pmcid: 7221878

Federico A, Serra A, Ha MK et al (2020) Transcriptomics in toxicogenomics, part II: preprocessing and differential expression analysis for high quality data. Nanomaterials 10:903

pmcid: 7279140

Serra A, Fratello M, Cattelani L et al (2020) Transcriptomics in toxicogenomics, part III: data modelling for risk assessment. Nanomaterials 10:708

pmcid: 7221955

Serra A, Galdi P, Tagliaferri R (2018) Machine learning for bioinformatics and neuroimaging. WIREs Data Mining Knowl Discov 8:e1248

Saeys Y, Inza I, Larrañaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23:2507–2517

pubmed: 17720704

Hall MA, Smith LA (1998) Practical feature subset selection for machine learning. In: McDonald C (ed) Computer science ’98 proceedings of the 21st australasian computer science conference ACSC’98, Perth, 4–6 February, 1998. Springer, Berlin, pp 181–191

Yu L, Liu H (2003) Feature selection for high-dimensional data: a fast correlation-based filter solution. In: Fawcett T, Mishra N (eds) Proceedings, twentieth international conference on machine learning. Amer Assn for Artificial, Menlo Park, CA, pp 856–863

Kononenko I (1994) Estimating attributes: analysis and extensions of RELIEF. In: Bergadano F, Raedt L (eds) Machine learning: ECML-94. Springer, Berlin, pp 171–182

Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27:1226–1238

pubmed: 16119262

Somol P, Pudil P, Novovičová J et al (1999) Adaptive floating search methods in feature selection. Pattern Recognit Lett 20:1157–1163

Borboudakis G, Tsamardinos I (2019) Forward-backward selection with early dropping. J Mach Learn Res 20:276–314

Sanz H, Valim C, Vegas E et al (2018) SVM-RFE: selection and visualization of the most relevant features through non-linear kernels. BMC Bioinformatics 19:432

pubmed: 30453885 pmcid: 6245920

Annavarapu CSR, Dara S, Banka H (2016) Cancer microarray data feature selection using multi-objective binary particle swarm optimization algorithm. Excli J 15:460–473

pubmed: 27822174 pmcid: 5083964

Chuang L-Y, Yang C-H, Li J-C et al (2012) A hybrid BPSO-CGA approach for gene selection and classification of microarray data. J Comput Biol 19:68–82

pubmed: 21210743 pmcid: 3244808

Fortino V, Scala G, Greco D (2020) Feature set optimization in biomarker discovery from genome-scale data. Bioinformatics 36:3393–3400

pubmed: 32119073

Breiman L (2001) Random forests. Machine Learn 45:5–32

Chen X, Ishwaran H (2012) Random forests for genomic data analysis. Genomics 99:323–329

pubmed: 22546560

Fratello M, Tagliaferri R (2019) Decision trees and random forests. In: Encyclopedia of bioinformatics and computational biology. Elsevier, pp 374–383

Hastie T (2020) Ridge regularization: an essential concept in data science. Technometrics:1–8

Tibshirani R (1996) Regression shrinkage and selection via the lasso. J Royal Stat Soc B 58:267–288

Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J Royal Stat Soc B 67:301–320

Larrañaga P, Calvo B, Santana R et al (2006) Machine learning in bioinformatics. Brief Bioinform 7:86–112

pubmed: 16761367

Tolios A, De Las RJ, Hovig E et al (2020) Computational approaches in cancer multidrug resistance research: identification of potential biomarkers, drug targets and drug-target interactions. Drug Resist Updat 48:100662

pubmed: 31927437

Park H, Shiraishi Y, Imoto S et al (2017) A novel adaptive penalized logistic regression for uncovering biomarker associated with anti-cancer drug sensitivity. IEEE/ACM Trans Comput Biol Bioinform 14:771–782

pubmed: 27164605

Cervantes J, Garcia-Lamont F, Rodríguez-Mazahua L et al (2020) A comprehensive survey on support vector machine classification: applications, challenges and trends. Neurocomputing 408:189–215

Zheng D, Ding Y, Ma Q et al (2018) Identification of serum microRNAs as novel biomarkers in esophageal squamous cell carcinoma using feature selection algorithms. Front Oncol 8:674

pubmed: 30719423

Su R, Liu X, Wei L et al (2019) Deep-resp-forest: a deep forest model to predict anti-cancer drug response. Methods 166:91–102

pubmed: 30772464

Zhou ZH, Feng J (2019) Deep forest. Natl Sci Rev 6(1):74–86

Abiodun OI, Jantan A, Omolara AE et al (2018) State-of-the-art in artificial neural network applications: a survey. Heliyon 4:e00938

pubmed: 30519653 pmcid: 6260436

Wang H, Liu R, Schyman P et al (2019) Deep neural network models for predicting chemically induced liver toxicity endpoints from transcriptomic responses. Front Pharmacol 10:42

pubmed: 30804783 pmcid: 6370634

Raies AB, Bajic VB (2016) In silico toxicology: computational methods for the prediction of chemical toxicity. Wiley Interdiscip Rev Comput Mol Sci 6:147–172

pubmed: 27066112 pmcid: 4785608

Maunz A, Helma C (2008) Prediction of chemical toxicity with local support vector regression and activity-specific kernels. SAR QSAR Environ Res 19:413–431

pubmed: 18853295

Xu Y, Pei J, Lai L (2017) Deep learning based regression and multiclass models for acute oral toxicity prediction with automatic chemical feature extraction. J Chem Inf Model 57:2672–2685

pubmed: 29019671

Ding MQ, Chen L, Cooper GF et al (2018) Precision oncology beyond targeted therapy: combining omics data with machine learning matches the majority of cancer cells to effective therapeutics. Mol Cancer Res 16:269–278

pubmed: 29133589

Geeleher P, Cox NJ, Huang RS (2014) Clinical drug response can be predicted using baseline gene expression levels and in vitro drug sensitivity in cell lines. Genome Biol 15:R47

pubmed: 24580837 pmcid: 4054092

Zhang W, Tang J, Wang N (2016) Using the machine learning approach to predict patient survival from high-dimensional survival data. In: 2016 IEEE international conference on bioinformatics and biomedicine (BIBM). IEEE, pp 1234–1238

Tong Z, Liu Y, Ma H et al (2020) Development, validation and comparison of artificial neural network models and logistic regression models predicting survival of unresectable pancreatic cancer. Front Bioeng Biotechnol 8:196

pubmed: 32232040 pmcid: 7082923

Serra A, Saarimäki LA, Fratello M et al (2020) BMDx: a graphical Shiny application to perform Benchmark Dose analysis for transcriptomics data. Bioinformatics 36:2932–2933

pubmed: 31950985

Kuo B, Francina Webster A, Thomas RS et al (2016) BMDExpress Data Viewer—a visualization tool to analyze BMDExpress datasets. J Appl Toxicol 36:1048–1059

pubmed: 26671443

Serra A, Fratello M, Del Giudice G et al (2020) TinderMIX: time-dose integrated modelling of toxicogenomics data. Gigascience 9:giaa055

pubmed: 32449777 pmcid: 7247400

Saarimäki LA, Kinaret PAS, Scala G et al (2020) Toxicogenomics analysis of dynamic dose-response in macrophages highlights molecular alterations relevant for multi-walled carbon nanotube-induced lung fibrosis. NanoImpact 20:100274

Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33:1–22

pubmed: 20808728 pmcid: 2929880

Jang IS, Neto EC, Guinney J et al (2014) Systematic assessment of analytical methods for drug sensitivity prediction from cancer cell line data. Pac Symp Biocomput:63–74

Galdi P, Tagliaferri R (2019) Data mining: accuracy and error measures for classification and prediction. In: Encyclopedia of bioinformatics and computational biology. Elsevier, pp 431–436

Handelman GS, Kok HK, Chandra RV et al (2019) Peering into the black box of artificial intelligence: evaluation metrics of machine learning methods. AJR Am J Roentgenol 212:38–43

pubmed: 30332290

Chicco D (2017) Ten quick tips for machine learning in computational biology. BioData Min 10:35

pubmed: 29234465 pmcid: 5721660

Tharwat A, Moemen YS, Hassanien AE (2016) A predictive model for toxicity effects assessment of biotransformed hepatic drugs using iterative sampling method. Sci Rep 6:38660

pubmed: 27934950 pmcid: 5146749

Tharwat A, Moemen YS, Hassanien AE (2017) Classification of toxicity effects of biotransformed hepatic drugs using whale optimized support vector machines. J Biomed Inform 68:132–149

pubmed: 28286029

Eitrich T, Kless A, Druska C et al (2007) Classification of highly unbalanced CYP450 data of drugs using cost sensitive machine learning techniques. J Chem Inf Model 47:92–103

pubmed: 17238253

Lunardon N, Menardi G, Torelli N (2014) ROSE: a package for binary imbalanced learning. R J 6:79

Menardi G, Torelli N (2014) Training and assessing classification rules with imbalanced data. Data Min Knowl Discov 28:92–122

Chawla NV, Bowyer KW, Hall LO et al (2002) SMOTE: Synthetic Minority Over-sampling Technique. jair 16:321–357

Kovács G (2019) An empirical comparison and evaluation of minority oversampling techniques on a large number of imbalanced datasets. Appl Soft Comput 83:105662

Matthews BW (1975) Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim Biophys Acta 405:442–451

pubmed: 1180967

Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: Data mining, inference, and prediction, second edition (2nd ed.). Springer

van Gool AJ, Bietrix F, Caldenhoven E et al (2017) Bridging the translational innovation gap through good biomarker practice. Nat Rev Drug Discov 16:587–588

pubmed: 28450744

McShane LM, Cavenagh MM, Lively TG et al (2013) Criteria for the use of omics-based predictors in clinical trials. Nature 502:317–320

pubmed: 24132288 pmcid: 4180668

Taylor JMG, Ankerst DP, Andridge RR (2008) Validation of biomarker-based risk prediction models. Clin Cancer Res 14:5977–5983

pubmed: 18829476 pmcid: 3896456

Athar A, Füllgrabe A, George N et al (2019) ArrayExpress update—from bulk to single-cell expression data. Nucleic Acids Res 47:D711–D715

pubmed: 30357387

Edgar R, Domrachev M, Lash AE (2002) Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res 30:207–210

pubmed: 11752295 pmcid: 99122

Schmidt EE, Pelz O, Buhlmann S et al (2013) GenomeRNAi: a database for cell-based and in vivo RNAi phenotypes, 2013 update. Nucleic Acids Res 41:D1021–D1026

pubmed: 23193271

Tryka KA, Hao L, Sturcke A et al (2014) NCBI’s database of genotypes and phenotypes: dbGaP. Nucleic Acids Res 42:D975–D979

pubmed: 24297256

Ohno-Machado L, Sansone S-A, Alter G et al (2017) Finding useful data across multiple biomedical data repositories using DataMed. Nat Genet 49:816–819

pubmed: 28546571 pmcid: 6460922

Perez-Riverol Y, Bai M, da Veiga Leprevost F et al (2017) Discovering and linking public omics data sets using the Omics Discovery Index. Nat Biotechnol 35:406–409

pubmed: 28486464 pmcid: 5831141

Sun X, Pittard WS, Xu T et al (2017) Omicseq: a web-based search engine for exploring omics datasets. Nucleic Acids Res 45:W445–W452

pubmed: 28402462 pmcid: 5793835

Khomtchouk B, Vand KA, Wahlestedt T et al (2016) PubData: search engine for bioinformatics databases worldwide. BioRxiv

Quezada H, Guzmán-Ortiz AL, Díaz-Sánchez H et al (2017) Omics-based biomarkers: current status and potential use in the clinic. Bol Med Hosp Infant Mex 74:219–226

pubmed: 29382490

Olivier M, Asmis R, Hawkins GA et al (2019) The need for multi-omics biomarker signatures in precision medicine. Int J Mol Sci 20:4781

pmcid: 6801754

Serra A, Galdi P, Tagliaferri R (2019) Multiview learning in biomedical applications. In: Artificial intelligence in the age of neural networks and brain computing. Elsevier, pp 265–280

Fan Z, Zhou Y, Ressom HW (2020) MOTA: network-based multi-omic data integration for biomarker discovery. Metabolites 10(4):144

pmcid: 7241240

Nicora G, Vitali F, Dagliati A et al (2020) Integrated multi-omics analyses in oncology: a review of machine learning methods and tools. Front Oncol 10:1030

pubmed: 32695678 pmcid: 7338582

Lin E, Lane HY (2017) Machine learning and systems genomics approaches for multi-omics data. Biomark Res 5(1):1–6

Serra A, Fratello M, Fortino V et al (2015) MVDA: a multi-view genomic data integration methodology. BMC Bioinformatics 16:261

pubmed: 26283178 pmcid: 4539887

Pavlidis P, Weston J, Cai J et al (2001) Gene functional classification from heterogeneous data. In: Proceedings of the fifth annual international conference on Computational biology—RECOMB ’01. ACM Press, New York, NY, pp 249–255

El-Manzalawy Y, Hsieh T-Y, Shivakumar M et al (2018) Min-redundancy and max-relevance multi-view feature selection for predicting ovarian cancer survival using multi-omics data. BMC Med Genomics 11:71

pubmed: 30255801 pmcid: 6157248

El-Manzalawy Y (2018) CCA based multi-view feature selection for multi-omics data integration. BioRxiv

Wang, Z, Yuan W, Montana G (2015) Sparse multi-view matrix factorization: a multivariate approach to multiple tissue comparisons. Bioinformatics 31(19):3163–3171

Supervised Methods for Biomarker Detection from Microarray Experiments.

Journal

Informations de publication

Résumé

Identifiants

Substances chimiques

Types de publication

Langues

Sous-ensembles de citation

Pagination

Informations de copyright

Références

Auteurs

Angela Serra (A)

Luca Cattelani (L)

Michele Fratello (M)

Vittorio Fortino (V)

Pia Anneli Sofia Kinaret (PAS)

Dario Greco (D)

Articles similaires

Within-subject variation of C-reactive protein and high-sensitivity C-reactive protein: A systematic review and meta-analysis.

The cardio-protective effect of cardiomyopeptidin in critically ill patients with myocardial injury: a retrospective cohort study.

Clustering based on renal and inflammatory admission parameters in critically ill patients admitted to the ICU.

COBRE Center for Stress, Trauma, and Resilience (STAR COBRE): A Regional and National Hub for Transformative Research to Elucidate and Mitigate the Lasting Imprint of Early Stress and Trauma.

Classifications MeSH