Another look at matrix correlations.
Journal
Bioinformatics (Oxford, England)
ISSN: 1367-4811
Titre abrégé: Bioinformatics
Pays: England
ID NLM: 9808944
Informations de publication
Date de publication:
01 11 2019
01 11 2019
Historique:
received:
24
01
2019
revised:
03
04
2019
accepted:
15
04
2019
pubmed:
14
5
2019
medline:
2
7
2020
entrez:
14
5
2019
Statut:
ppublish
Résumé
High throughput technologies are widely employed in modern biomedical research. They yield measurements of a large number of biomolecules in a single experiment. The number of experiments usually is much smaller than the number of measurements in each experiment. The simultaneous measurements of biomolecules provide a basis for a comprehensive, systems view for describing relevant biological processes. Often it is necessary to determine correlations between the data matrices under different conditions or pathways. However, the techniques for analyzing the data with a low number of samples for possible correlations within or between conditions are still in development. Earlier developed correlative measures, such as the RV coefficient, use the trace of the product of data matrices as the most relevant characteristic. However, a recent study has shown that the RV coefficient consistently overestimates the correlations in the case of low sample numbers. To correct for this bias, it was suggested to discard the diagonal elements of the outer products of each data matrix. In this work, a principled approach based on the matrix decomposition generates three trace-independent parts for every matrix. These components are unique, and they are used to determine different aspects of correlations between the original datasets. Simulations show that the decomposition results in the removal of high correlation bias and the dependence on the sample number intrinsic to the RV coefficient. We then use the correlations to analyze a real proteomics dataset. The python code can be downloaded from http://dynamic-proteome.utmb.edu/MatrixCorrelations.aspx. Supplementary data are available at Bioinformatics online.
Identifiants
pubmed: 31081021
pii: 5480130
doi: 10.1093/bioinformatics/btz281
pmc: PMC6853692
doi:
Types de publication
Journal Article
Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
4748-4753Subventions
Organisme : NIGMS NIH HHS
ID : R01 GM112044
Pays : United States
Organisme : NIGMS NIH HHS
ID : R01 GM112131
Pays : United States
Organisme : NLM NIH HHS
ID : T15 LM007093
Pays : United States
Informations de copyright
© The Author(s) 2019. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Références
J Stat Softw. 2010;33(1):1-22
pubmed: 20808728
Sci Data. 2016 Mar 15;3:160015
pubmed: 26977904
Bioinformatics. 2005 Mar;21(6):754-64
pubmed: 15479708
Stat Appl Genet Mol Biol. 2011;10:Article 14
pubmed: 21381439
J Proteome Res. 2018 Nov 2;17(11):3740-3748
pubmed: 30265007
Nucleic Acids Res. 2017 Jan 4;45(D1):D362-D368
pubmed: 27924014
Bioinformatics. 2009 Feb 1;25(3):401-5
pubmed: 19073588