Band-based similarity indices for gene expression classification and clustering.


Journal

Scientific reports
ISSN: 2045-2322
Titre abrégé: Sci Rep
Pays: England
ID NLM: 101563288

Informations de publication

Date de publication:
03 11 2021
Historique:
received: 23 06 2021
accepted: 11 10 2021
entrez: 4 11 2021
pubmed: 5 11 2021
medline: 28 1 2022
Statut: epublish

Résumé

The concept of depth induces an ordering from centre outwards in multivariate data. Most depth definitions are unfeasible for dimensions larger than three or four, but the Modified Band Depth (MBD) is a notable exception that has proven to be a valuable tool in the analysis of high-dimensional gene expression data. This depth definition relates the centrality of each individual to its (partial) inclusion in all possible bands formed by elements of the data set. We assess (dis)similarity between pairs of observations by accounting for such bands and constructing binary matrices associated to each pair. From these, contingency tables are calculated and used to derive standard similarity indices. Our approach is computationally efficient and can be applied to bands formed by any number of observations from the data set. We have evaluated the performance of several band-based similarity indices with respect to that of other classical distances in standard classification and clustering tasks in a variety of simulated and real data sets. However, the use of the method is not restricted to these, the extension to other similarity coefficients being straightforward. Our experiments show the benefits of our technique, with some of the selected indices outperforming, among others, the Euclidean distance.

Identifiants

pubmed: 34732744
doi: 10.1038/s41598-021-00678-9
pii: 10.1038/s41598-021-00678-9
pmc: PMC8566472
doi:

Substances chimiques

Biomarkers, Tumor 0

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

21609

Subventions

Organisme : Ministerio de Ciencia e Innovación
ID : FIS2017-84440-C2-2-P
Organisme : Comunidad de Madrid
ID : EPUC3M23

Informations de copyright

© 2021. The Author(s).

Références

J Natl Cancer Inst. 2003 Jan 1;95(1):14-8
pubmed: 12509396
Bioinformatics. 2003 Jun 12;19(9):1090-9
pubmed: 12801869
BMC Bioinformatics. 2013 Jul 25;14:237
pubmed: 23885712
Nature. 2002 Jan 31;415(6871):530-6
pubmed: 11823860
Nucleic Acids Res. 2019 Jan 8;47(D1):D711-D715
pubmed: 30357387
Bioinformatics. 2003 Mar 1;19(4):459-66
pubmed: 12611800
Nature. 2000 Aug 17;406(6797):747-52
pubmed: 10963602
PLoS One. 2014 May 30;9(5):e98187
pubmed: 24878701
Nature. 2000 Aug 3;406(6795):536-40
pubmed: 10952317
IEEE Trans Pattern Anal Mach Intell. 2009 Feb;31(2):306-18
pubmed: 19110495
Nat Genet. 2013 Oct;45(10):1113-20
pubmed: 24071849
Genome Res. 2002 Jan;12(1):203-14
pubmed: 11779846
BMC Bioinformatics. 2010 Oct 11;11:503
pubmed: 20937082
Bioinformatics. 2004 Dec 12;20(18):3583-93
pubmed: 15466910
Bioinformatics. 2006 Apr 1;22(7):830-6
pubmed: 16410319
FEBS Lett. 2000 Aug 25;480(1):17-24
pubmed: 10967323
Nature. 2000 Feb 3;403(6769):503-11
pubmed: 10676951
Bioinformatics. 2006 Jan 1;22(1):58-67
pubmed: 16257984
BMC Genomics. 2008;9 Suppl 1:S13
pubmed: 18366602
Biostatistics. 2010 Apr;11(2):254-64
pubmed: 20064844
Gene. 2005 Jun 6;352:75-81
pubmed: 15927423
Proc Natl Acad Sci U S A. 2000 Feb 15;97(4):1423-6
pubmed: 10677477
BMC Genomics. 2017 Jul 3;18(1):508
pubmed: 28673244
BMC Bioinformatics. 2014;15 Suppl 2:S2
pubmed: 24564555
Science. 1999 Oct 15;286(5439):531-7
pubmed: 10521349
PLoS One. 2016 Jun 20;11(6):e0157484
pubmed: 27322383
BMC Bioinformatics. 2009 Aug 22;10:260
pubmed: 19698124
J Pathol. 2010 Jan;220(2):263-80
pubmed: 19927298
J Chem Inf Model. 2012 Nov 26;52(11):2884-901
pubmed: 23078167
BMC Bioinformatics. 2008 Nov 27;9:497
pubmed: 19038021
Ecology. 2015 Feb;96(2):575-86
pubmed: 26240877
BMC Bioinformatics. 2007 Mar 30;8:111
pubmed: 17397530
Proc Natl Acad Sci U S A. 1999 Jun 8;96(12):6745-50
pubmed: 10359783

Auteurs

Aurora Torrente (A)

Departamento de Matemáticas, Instituto Gregorio Millán, Universidad Carlos III de Madrid, 28911, Leganés, Spain. etorrent@est-econ.uc3m.es.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH