ExhauFS: exhaustive search-based feature selection for classification and survival regression.


Journal

PeerJ
ISSN: 2167-8359
Titre abrégé: PeerJ
Pays: United States
ID NLM: 101603425

Informations de publication

Date de publication:
2022
Historique:
received: 20 01 2022
accepted: 09 03 2022
entrez: 5 4 2022
pubmed: 6 4 2022
medline: 6 4 2022
Statut: epublish

Résumé

Feature selection is one of the main techniques used to prevent overfitting in machine learning applications. The most straightforward approach for feature selection is an exhaustive search: one can go over all possible feature combinations and pick up the model with the highest accuracy. This method together with its optimizations were actively used in biomedical research, however, publicly available implementation is missing. We present ExhauFS-the user-friendly command-line implementation of the exhaustive search approach for classification and survival regression. Aside from tool description, we included three application examples in the manuscript to comprehensively review the implemented functionality. First, we executed ExhauFS on a toy cervical cancer dataset to illustrate basic concepts. Then, multi-cohort microarray breast cancer datasets were used to construct gene signatures for 5-year recurrence classification. The vast majority of signatures constructed by ExhauFS passed 0.65 threshold of sensitivity and specificity on all datasets, including the validation one. Moreover, a number of gene signatures demonstrated reliable performance on independent RNA-seq dataset without any coefficient re-tuning,

Identifiants

pubmed: 35378930
doi: 10.7717/peerj.13200
pii: 13200
pmc: PMC8976470
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Pagination

e13200

Informations de copyright

©2022 Nersisyan et al.

Déclaration de conflit d'intérêts

The authors declare there are no competing interests.

Références

Sci Rep. 2020 Sep 23;10(1):15534
pubmed: 32968196
N Engl J Med. 2004 Dec 30;351(27):2817-26
pubmed: 15591335
Nat Biotechnol. 2005 Dec;23(12):1499-501
pubmed: 16333293
Nature. 2002 Jan 31;415(6871):530-6
pubmed: 11823860
Genome Biol. 2014;15(12):550
pubmed: 25516281
Nature. 2020 Sep;585(7825):357-362
pubmed: 32939066
Nat Methods. 2020 Mar;17(3):261-272
pubmed: 32015543
Breast Cancer Res Treat. 2009 Jul;116(2):303-9
pubmed: 18821012
BMC Med Res Methodol. 2017 Apr 7;17(1):53
pubmed: 28388943
J Clin Oncol. 2010 Sep 20;28(27):4111-9
pubmed: 20697068
Methods. 2016 Dec 1;111:21-31
pubmed: 27592382
BMC Med Genomics. 2018 Feb 13;11(Suppl 1):9
pubmed: 29504916
Brief Bioinform. 2008 Sep;9(5):392-403
pubmed: 18562478
Comput Struct Biotechnol J. 2019 Jun 22;17:854-861
pubmed: 31321001
Front Oncol. 2019 Nov 12;9:1207
pubmed: 31799184
Bioinformatics. 2007 Oct 1;23(19):2507-17
pubmed: 17720704
Genet Med. 2009 Jan;11(1):66-73
pubmed: 19125125
Mol Ther. 2020 Jan 8;28(1):157-170
pubmed: 31636041
Biostatistics. 2007 Jan;8(1):118-27
pubmed: 16632515
J Proteomics. 2013 Dec 06;94:279-88
pubmed: 24125731
PLoS One. 2020 Feb 12;15(2):e0228575
pubmed: 32049961
Sci Rep. 2015 Oct 08;5:14967
pubmed: 26446398
Front Genet. 2021 Dec 06;12:782699
pubmed: 34938324
Nucleic Acids Res. 2015 Oct 30;43(19):9158-75
pubmed: 26400174
Mol Oncol. 2015 Jun;9(6):1218-33
pubmed: 25771305
Semin Cancer Biol. 2017 Aug;45:50-57
pubmed: 27639751
Bioinformatics. 2004 Feb 12;20(3):307-15
pubmed: 14960456
Brief Bioinform. 2021 May 20;22(3):
pubmed: 34020547
RNA Biol. 2021 Oct 15;18(sup1):430-438
pubmed: 34286662
BMC Genomics. 2008 May 22;9:239
pubmed: 18498629
Nature. 2012 Oct 4;490(7418):61-70
pubmed: 23000897
Clin Cancer Res. 2017 Dec 15;23(24):7512-7520
pubmed: 28972043
Genes (Basel). 2019 Jan 28;10(2):
pubmed: 30696086
Comput Struct Biotechnol J. 2014 Nov 15;13:8-17
pubmed: 25750696
Bioinformatics. 2010 Jan 1;26(1):139-40
pubmed: 19910308
PLoS One. 2021 Apr 14;16(4):e0249424
pubmed: 33852600
Stat Methods Med Res. 2016 Aug;25(4):1359-80
pubmed: 23592714
Biochem Biophys Res Commun. 2011 Jun 10;409(3):424-9
pubmed: 21586272
Nature. 2012 Jul 18;487(7407):330-7
pubmed: 22810696
BMC Bioinformatics. 2008 Dec 29;9:559
pubmed: 19114008
BMC Med. 2006 Jun 30;4:16
pubmed: 16813654
Cancer Treat Res. 2002;113:59-70
pubmed: 12613350
Sci Rep. 2018 Feb 5;8(1):2418
pubmed: 29402894
Nat Med. 2019 Apr;25(4):656-666
pubmed: 30833750
Cancer Inform. 2007 Feb 11;2:59-77
pubmed: 19458758
Cancers (Basel). 2021 Aug 26;13(17):
pubmed: 34503106
Stat Med. 1997 Feb 28;16(4):385-95
pubmed: 9044528

Auteurs

Stepan Nersisyan (S)

Faculty of Biology and Biotechnology, HSE University, Moscow, Russia.

Victor Novosad (V)

Faculty of Biology and Biotechnology, HSE University, Moscow, Russia.
Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry RAS, Moscow, Russia.

Alexei Galatenko (A)

Faculty of Mechanics and Mathematics, Lomonosov Moscow State University, Moscow, Russia.
Moscow Center for Fundamental and Applied Mathematics, Moscow, Russia.

Andrey Sokolov (A)

Faculty of Mechanics and Mathematics, Lomonosov Moscow State University, Moscow, Russia.
Moscow Center for Fundamental and Applied Mathematics, Moscow, Russia.

Grigoriy Bokov (G)

Faculty of Mechanics and Mathematics, Lomonosov Moscow State University, Moscow, Russia.
Moscow Center for Fundamental and Applied Mathematics, Moscow, Russia.

Alexander Konovalov (A)

Faculty of Mechanics and Mathematics, Lomonosov Moscow State University, Moscow, Russia.
Moscow Center for Fundamental and Applied Mathematics, Moscow, Russia.

Dmitry Alekseev (D)

Faculty of Mechanics and Mathematics, Lomonosov Moscow State University, Moscow, Russia.
Moscow Center for Fundamental and Applied Mathematics, Moscow, Russia.

Alexander Tonevitsky (A)

Faculty of Biology and Biotechnology, HSE University, Moscow, Russia.
Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry RAS, Moscow, Russia.
Institute of Nanotechnologies of Microelectronics RAS, Moscow, Russia.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH