Multiblock variable influence on orthogonal projections (MB-VIOP) for enhanced interpretation of total, global, local and unique variations in OnPLS models.
Feature selection
Latent variable interpretation
MB-VIOP
Multiblock variable selection
OnPLS
VIP
Variable importance in multiblock regression
Variable influence on projection
Journal
BMC bioinformatics
ISSN: 1471-2105
Titre abrégé: BMC Bioinformatics
Pays: England
ID NLM: 100965194
Informations de publication
Date de publication:
03 Apr 2021
03 Apr 2021
Historique:
received:
16
07
2020
accepted:
10
02
2021
entrez:
4
4
2021
pubmed:
5
4
2021
medline:
10
4
2021
Statut:
epublish
Résumé
For multivariate data analysis involving only two input matrices (e.g., X and Y), the previously published methods for variable influence on projection (e.g., VIP A method for variable selection in multiblock analysis, called multiblock variable influence on orthogonal projections (MB-VIOP) is explained in this paper. MB-VIOP is a model based variable selection method that uses the data matrices, the scores and the normalized loadings of an OnPLS model in order to sort the input variables of more than two data matrices according to their importance for both simplification and interpretation of the total multiblock model, and also of the unique, local and global model components separately. MB-VIOP has been tested using three datasets: a synthetic four-block dataset, a real three-block omics dataset related to plant sciences, and a real six-block dataset related to the food industry. We provide evidence for the usefulness and reliability of MB-VIOP by means of three examples (one synthetic and two real-world cases). MB-VIOP assesses in a trustable and efficient way the importance of both isolated and ranges of variables in any type of data. MB-VIOP connects the input variables of different data matrices according to their relevance for the interpretation of each latent variable, yielding enhanced interpretability for each OnPLS model component. Besides, MB-VIOP can deal with strong overlapping of types of variation, as well as with many data blocks with very different dimensionality. The ability of MB-VIOP for generating dimensionality reduced models with high interpretability makes this method ideal for big data mining, multi-omics data integration and any study that requires exploration and interpretation of large streams of data.
Sections du résumé
BACKGROUND
BACKGROUND
For multivariate data analysis involving only two input matrices (e.g., X and Y), the previously published methods for variable influence on projection (e.g., VIP
RESULTS
RESULTS
A method for variable selection in multiblock analysis, called multiblock variable influence on orthogonal projections (MB-VIOP) is explained in this paper. MB-VIOP is a model based variable selection method that uses the data matrices, the scores and the normalized loadings of an OnPLS model in order to sort the input variables of more than two data matrices according to their importance for both simplification and interpretation of the total multiblock model, and also of the unique, local and global model components separately. MB-VIOP has been tested using three datasets: a synthetic four-block dataset, a real three-block omics dataset related to plant sciences, and a real six-block dataset related to the food industry.
CONCLUSIONS
CONCLUSIONS
We provide evidence for the usefulness and reliability of MB-VIOP by means of three examples (one synthetic and two real-world cases). MB-VIOP assesses in a trustable and efficient way the importance of both isolated and ranges of variables in any type of data. MB-VIOP connects the input variables of different data matrices according to their relevance for the interpretation of each latent variable, yielding enhanced interpretability for each OnPLS model component. Besides, MB-VIOP can deal with strong overlapping of types of variation, as well as with many data blocks with very different dimensionality. The ability of MB-VIOP for generating dimensionality reduced models with high interpretability makes this method ideal for big data mining, multi-omics data integration and any study that requires exploration and interpretation of large streams of data.
Identifiants
pubmed: 33812384
doi: 10.1186/s12859-021-04015-9
pii: 10.1186/s12859-021-04015-9
pmc: PMC8019512
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
176Références
J Proteome Res. 2009 Jan;8(1):199-210
pubmed: 19053836
Anal Chim Acta. 2016 Jan 1;902:70-81
pubmed: 26703255
Psychometrika. 1966 Sep;31(3):413-9
pubmed: 5221135
Mol Syst Biol. 2018 Jun 20;14(6):e8124
pubmed: 29925568
BMC Bioinformatics. 2020 Jan 9;21(1):9
pubmed: 31918677
Anal Chim Acta. 2013 Aug 12;791:13-24
pubmed: 23890602
Biostatistics. 2014 Jul;15(3):569-83
pubmed: 24550197
Anal Chem. 2018 Nov 20;90(22):13400-13408
pubmed: 30335973
PLoS Comput Biol. 2017 Nov 3;13(11):e1005752
pubmed: 29099853
J Pharm Biomed Anal. 1991;9(8):625-35
pubmed: 1790182
Ann Appl Stat. 2013 Mar 1;7(1):523-542
pubmed: 23745156
BMC Bioinformatics. 2018 Oct 11;19(1):371
pubmed: 30309317
Stat Appl Genet Mol Biol. 2008;7(1):Article 35
pubmed: 19049491
Anal Chim Acta. 2010 Sep 30;678(2):195-202
pubmed: 20888452
Food Res Int. 2016 Sep;87:142-151
pubmed: 29606235
Psychol Methods. 2009 Jun;14(2):81-100
pubmed: 19485623