Band-based similarity indices for gene expression classification and clustering.
Journal
Scientific reports
ISSN: 2045-2322
Titre abrégé: Sci Rep
Pays: England
ID NLM: 101563288
Informations de publication
Date de publication:
03 11 2021
03 11 2021
Historique:
received:
23
06
2021
accepted:
11
10
2021
entrez:
4
11
2021
pubmed:
5
11
2021
medline:
28
1
2022
Statut:
epublish
Résumé
The concept of depth induces an ordering from centre outwards in multivariate data. Most depth definitions are unfeasible for dimensions larger than three or four, but the Modified Band Depth (MBD) is a notable exception that has proven to be a valuable tool in the analysis of high-dimensional gene expression data. This depth definition relates the centrality of each individual to its (partial) inclusion in all possible bands formed by elements of the data set. We assess (dis)similarity between pairs of observations by accounting for such bands and constructing binary matrices associated to each pair. From these, contingency tables are calculated and used to derive standard similarity indices. Our approach is computationally efficient and can be applied to bands formed by any number of observations from the data set. We have evaluated the performance of several band-based similarity indices with respect to that of other classical distances in standard classification and clustering tasks in a variety of simulated and real data sets. However, the use of the method is not restricted to these, the extension to other similarity coefficients being straightforward. Our experiments show the benefits of our technique, with some of the selected indices outperforming, among others, the Euclidean distance.
Identifiants
pubmed: 34732744
doi: 10.1038/s41598-021-00678-9
pii: 10.1038/s41598-021-00678-9
pmc: PMC8566472
doi:
Substances chimiques
Biomarkers, Tumor
0
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
21609Subventions
Organisme : Ministerio de Ciencia e Innovación
ID : FIS2017-84440-C2-2-P
Organisme : Comunidad de Madrid
ID : EPUC3M23
Informations de copyright
© 2021. The Author(s).
Références
J Natl Cancer Inst. 2003 Jan 1;95(1):14-8
pubmed: 12509396
Bioinformatics. 2003 Jun 12;19(9):1090-9
pubmed: 12801869
BMC Bioinformatics. 2013 Jul 25;14:237
pubmed: 23885712
Nature. 2002 Jan 31;415(6871):530-6
pubmed: 11823860
Nucleic Acids Res. 2019 Jan 8;47(D1):D711-D715
pubmed: 30357387
Bioinformatics. 2003 Mar 1;19(4):459-66
pubmed: 12611800
Nature. 2000 Aug 17;406(6797):747-52
pubmed: 10963602
PLoS One. 2014 May 30;9(5):e98187
pubmed: 24878701
Nature. 2000 Aug 3;406(6795):536-40
pubmed: 10952317
IEEE Trans Pattern Anal Mach Intell. 2009 Feb;31(2):306-18
pubmed: 19110495
Nat Genet. 2013 Oct;45(10):1113-20
pubmed: 24071849
Genome Res. 2002 Jan;12(1):203-14
pubmed: 11779846
BMC Bioinformatics. 2010 Oct 11;11:503
pubmed: 20937082
Bioinformatics. 2004 Dec 12;20(18):3583-93
pubmed: 15466910
Bioinformatics. 2006 Apr 1;22(7):830-6
pubmed: 16410319
FEBS Lett. 2000 Aug 25;480(1):17-24
pubmed: 10967323
Nature. 2000 Feb 3;403(6769):503-11
pubmed: 10676951
Bioinformatics. 2006 Jan 1;22(1):58-67
pubmed: 16257984
BMC Genomics. 2008;9 Suppl 1:S13
pubmed: 18366602
Biostatistics. 2010 Apr;11(2):254-64
pubmed: 20064844
Gene. 2005 Jun 6;352:75-81
pubmed: 15927423
Proc Natl Acad Sci U S A. 2000 Feb 15;97(4):1423-6
pubmed: 10677477
BMC Genomics. 2017 Jul 3;18(1):508
pubmed: 28673244
BMC Bioinformatics. 2014;15 Suppl 2:S2
pubmed: 24564555
Science. 1999 Oct 15;286(5439):531-7
pubmed: 10521349
PLoS One. 2016 Jun 20;11(6):e0157484
pubmed: 27322383
BMC Bioinformatics. 2009 Aug 22;10:260
pubmed: 19698124
J Pathol. 2010 Jan;220(2):263-80
pubmed: 19927298
J Chem Inf Model. 2012 Nov 26;52(11):2884-901
pubmed: 23078167
BMC Bioinformatics. 2008 Nov 27;9:497
pubmed: 19038021
Ecology. 2015 Feb;96(2):575-86
pubmed: 26240877
BMC Bioinformatics. 2007 Mar 30;8:111
pubmed: 17397530
Proc Natl Acad Sci U S A. 1999 Jun 8;96(12):6745-50
pubmed: 10359783