Linked matrix factorization.

data integration dimension reduction massive data sets missing data imputation principal components analysis

Journal

Biometrics
ISSN: 1541-0420
Titre abrégé: Biometrics
Pays: United States
ID NLM: 0370625

Informations de publication

Date de publication:
06 2019
Historique:
received: 10 02 2018
accepted: 16 08 2018
pubmed: 6 12 2018
medline: 30 1 2020
entrez: 6 12 2018
Statut: ppublish

Résumé

Several recent methods address the dimension reduction and decomposition of linked high-content data matrices. Typically, these methods consider one dimension, rows or columns, that is shared among the matrices. This shared dimension may represent common features measured for different sample sets (horizontal integration) or a common sample set with features from different platforms (vertical integration). We introduce an approach for simultaneous horizontal and vertical integration, Linked Matrix Factorization (LMF), for the general case where some matrices share rows (e.g., features) and some share columns (e.g., samples). Our motivating application is a cytotoxicity study with accompanying genomic and molecular chemical attribute data. The toxicity matrix (cell lines

Identifiants

pubmed: 30516272
doi: 10.1111/biom.13010
doi:

Substances chimiques

Cytotoxins 0

Types de publication

Journal Article Research Support, N.I.H., Extramural

Langues

eng

Sous-ensembles de citation

IM

Pagination

582-592

Subventions

Organisme : NCRR NIH HHS
ID : UL1 RR033183
Pays : United States
Organisme : NCRR NIH HHS
ID : KL2 RR033182
Pays : United States

Informations de copyright

© 2019 International Biometric Society.

Références

1000 Genomes Project Consortium et al. (2012). An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56.
Abdo, N., Xia, M., Brown, C. C., Kosyk, O., Huang, R., Sakamuru, S., Yi- Hui, Z., Jack, J. R., Gallins, P., Xia, K., et al. (2015). Population-based in vitro hazard and concentration-response assessment of chemicals: The 1000 genomes high-throughput screening study. Environ Health Perspect (Online) 123, 458.
Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J R Stat Soc Ser B 57, 289-300.
Crainiceanu, C. M., Caffo, B. S., Luo, S., Zipunnikov, V. M., and Punjabi, N. M. (2011). Population value decomposition, a framework for the analysis of image populations. J Am Stat Assoc 106, 775-790.
Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39, 1-38.
Eduati, F., Mangravite, L. M., Wang, T., Tang, H., Bare, J. C., Huang, R., Norman, T., Kellen, M., Menden, M. P., Yang, J., et al. (2015). Prediction of human population responses to toxic compounds by a collaborative competition. Nat Biotechnol 33, 933-940.
Hastie, T. and Mazumder, R. (2015). softimpute: Matrix completion via iterative soft-thresholded SVD. R Package Version1.
Khan, S. A. and Kaski, S. (2014). Bayesian multi-view tensor factorization. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, 656-671. Springer.
Kolda, T. G. and Bader, B. W. (2009). Tensor decompositions and applications. SIAM Rev 51, 455-500.
Kurucz, M., Benczúr, A. A., and Csalogány, K. (2007). Methods for large scale SVD with missing values. In Proceedings of KDD Cup and Workshop, volume 12, 31-38.
Kuz'min, V., Artemenko, A. G., and Muratov, E. N. (2008). Hierarchical QSAR technology based on the simplex representation of molecular structure. J Comput Aided Mol Des 22, 403-421.
Li, G. and Gaynanova, I. (2018). A general framework for association analysis of heterogeneous data. Ann Appl Stat 12, 1700-1726.
Lock, E. F., Abdo, N., Huang, R., Xia, M., Kosyk, O., O’ Shea, S. H., Zhou, Y.-H., Sedykh, A., Tropsha, A., Austin, C. P., et al. (2012). Quantitative high-throughput screening for chemical toxicity in a population-based in vitro model. Toxicol Sci 126, 578-588.
Lock, E. F., Hoadley, K. A., Marron, J. S., and Nobel, A. B. (2013). Joint and individual variation explained (JIVE) for integrated analysis of multiple data types. Ann Appl Stat 7, 523.
Löfstedt, T. and Trygg, J. (2011). OnPLS -A novel multiblock method for the modelling of predictive and orthogonal variation. J Chemometr 25, 441-455.
O’ Connell, M. J. and Lock, E. F. (2016). R.JIVE for exploration of multi-source molecular data. Bioinformatics 32, 2877-2879.
Schouteden, M., Van Deun, K., Wilderjans, T. F., and Van Mechelen, I. (2014). Performing DISCO-SCA to search for distinctive and common information in linked data. Behav Res Methods 46, 576-587.
Tseng, G. C., Ghosh, D., and Zhou, X. J. (2015). Integrating Omics Data. Cambridge University Press.
Wei, S., Lee, C., Wichers, L., and Marron, J. (2016). Direction-projection-permutation for high-dimensional hypothesis tests. J Comput Graph Stat 25, 549-569.
Westerhuis, J., Kourti, T., and Mac Gregor, J. (1998). Analysis of multiblock and hierarchical PCA and PLS models. J Chemometr 12, 301-321.
Yang, Z. and Michailidis, G. (2016). A non-negative matrix factorization method for detecting modules in heterogeneous omics multi-modal data. Bioinformatics 32, 1-8.
Yokota, T. and Cichocki, A. (2014). Linked Tucker2 decomposition for flexible multi-block data analysis. In International Conference on Neural Information Processing, 111-118. Springer.

Auteurs

Michael J O'Connell (MJ)

Department of Statistics, Miami University, Oxford, Ohio 45056.

Eric F Lock (EF)

Division of Biostatistics, University of Minnesota, Minneapolis, Minnesota 55455.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH