A compound decision approach to covariance matrix estimation.
compound decision theory
g-modeling
nonparametric maximum likelihood
separable decision rule
Journal
Biometrics
ISSN: 1541-0420
Titre abrégé: Biometrics
Pays: United States
ID NLM: 0370625
Informations de publication
Date de publication:
06 2023
06 2023
Historique:
received:
20
04
2021
accepted:
18
04
2022
medline:
21
6
2023
pubmed:
3
5
2022
entrez:
2
5
2022
Statut:
ppublish
Résumé
Covariance matrix estimation is a fundamental statistical task in many applications, but the sample covariance matrix is suboptimal when the sample size is comparable to or less than the number of features. Such high-dimensional settings are common in modern genomics, where covariance matrix estimation is frequently employed as a method for inferring gene networks. To achieve estimation accuracy in these settings, existing methods typically either assume that the population covariance matrix has some particular structure, for example, sparsity, or apply shrinkage to better estimate the population eigenvalues. In this paper, we study a new approach to estimating high-dimensional covariance matrices. We first frame covariance matrix estimation as a compound decision problem. This motivates defining a class of decision rules and using a nonparametric empirical Bayes g-modeling approach to estimate the optimal rule in the class. Simulation results and gene network inference in an RNA-seq experiment in mouse show that our approach is comparable to or can outperform a number of state-of-the-art proposals.
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
1201-1212Informations de copyright
© 2022 The Authors. Biometrics published by Wiley Periodicals LLC on behalf of International Biometric Society.
Références
Brown, L.D. & Greenshtein, E. (2009) Nonparametric empirical Bayes and compound decision approaches to estimation of a high-dimensional vector of normal means. The Annals of Statistics, 37, 1685-1704.
Bun, J., Allez, R., Bouchaud, J.-P. & Potters, M. (2016) Rotational invariant estimator for general noisy matrices. IEEE Transactions on Information Theory, 62, 7475-7490.
Cai, T. & Liu, W. (2011) Adaptive thresholding for sparse covariance matrix estimation. Journal of the American Statistical Association, 106, 672-684.
Dey, K.K. & Stephens, M. (2018) Corshrink: empirical bayes shrinkage estimation of correlations, with applications. bioRxiv, p. 368316.
Donoho, D.L. & Johnstone, I.M. (1995) Adapting to unknown smoothness via wavelet shrinkage. Journal of the American Statistical Association, 90, 1200-1224.
Efron, B. (2014) Two modeling strategies for empirical Bayes estimation. Statistical Science, 29, 285-301.
Efron, B. (2019) Bayes, Oracle Bayes and empirical Bayes. Statistical Science, 34, 177-201.
Fan, J., Fan, Y. & Lv, J. (2008) High dimensional covariance matrix estimation using a factor model. Journal of Econometrics, 147, 186-197.
Feng, L. & Dicker, L.H. (2018) Approximate nonparametric maximum likelihood for mixture models: a convex optimization approach to fitting arbitrary multivariate mixing distributions. Computational Statistics & Data Analysis, 122, 80-91.
Fourdrinier, D., Strawderman, W.E. & Wells, M.T. (2018) Shrinkage estimation. Singapore: Springer.
Higham, N.J. (1988) Computing a nearest symmetric positive semidefinite matrix. Linear Algebra and its Applications, 103, 103-118.
Huang, C., Farewell, D. & Pan, J. (2017) A calibration method for non-positive definite covariance matrix in multivariate data analysis. Journal of Multivariate Analysis, 157, 45-52.
James, W. & Stein, C.M. (1961) Estimation with quadratic loss. In: Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1. Berkeley and Los Angeles: University of California Press, pp. 367-379.
Jiang, W. & Zhang, C.-H. (2009) General maximum likelihood empirical Bayes estimation of normal means. The Annals of Statistics, 37, 1647-1684.
Johnstone, I.M. (2017) Gaussian estimation: sequence and wavelet models. Technical report, Department of Statistics, Stanford University, Stanford.
Kiefer, J. & Wolfowitz, J. (1956) Consistency of the maximum likelihood estimator in the presence of infinitely many incidental parameters. The Annals of Mathematical Statistics, 27, 887-906.
Koenker, R. & Mizera, I. (2014) Convex optimization, shape constraints, compound decisions, and empirical Bayes rules. Journal of the American Statistical Association, 109, 674-685.
Laird, N. (1978) Nonparametric maximum likelihood estimation of a mixing distribution. Journal of the American Statistical Association, 73, 805-811.
Lam, C. et al. (2016) Nonparametric eigenvalue-regularized precision or covariance matrix estimator. The Annals of Statistics, 44, 928-953.
Langfelder, P. & Horvath, S. (2008) WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics, 9, 1-13.
Ledoit, O. & Wolf, M. (2004) A well-conditioned estimator for large-dimensional covariance matrices. Journal of Multivariate Analysis, 88, 365-411.
Ledoit, O. & Wolf, M. (2019) Quadratic shrinkage for large covariance matrices. Technical Report 335, Department of Economics, University of Zurich.
Ledoit, O., Wolf, M. et al. (2012) Nonlinear shrinkage estimation of large-dimensional covariance matrices. The Annals of Statistics, 40, 1024-1060.
Li, J., Zhou, J., Zhang, B. & Li, X.R. (2017) Estimation of high dimensional covariance matrices by shrinkage algorithms. In: 2017 20th International Conference on Information Fusion (Fusion), pp. 1-8. IEEE.
Lindley, D.V. (1962) Discussion on Professor Stein's paper. Journal of the Royal Statistical Society: Series B (Methodological), 24, 265-296.
Lindsay, B.G. (1983) The geometry of mixture likelihoods: a general theory. The Annals of Statistics, 11, 86-94.
Liu, Y., Sun, X. & Zhao, S. (2017) A covariance matrix shrinkage method with Toeplitz rectified target for DOA estimation under the uniform linear array. AEU-International Journal of Electronics and Communications, 81, 50-55.
Markowetz, F. & Spang, R. (2007) Inferring cellular networks - a review. BMC Bioinformatics, 8, S5.
Mestre, X. (2008) On the asymptotic behavior of the sample estimates of eigenvalues and eigenvectors of covariance matrices. IEEE Transactions on Signal Processing, 56, 5353-5368.
Robbins, H. (1951) Asymptotically subminimax solutions of compound statistical decision problems. In: Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability. The Regents of the University of California.
Robbins, H. (1955) An empirical Bayes approach to statistics. In: Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability. Berkeley and Los Angeles: University of California Press, pp. 157-164.
Rothman, A.J., Levina, E. & Zhu, J. (2009) Generalized thresholding of large covariance matrices. Journal of the American Statistical Association, 104, 177-186.
Saha, S., Guntuboyina, A. et al. (2020) On the nonparametric maximum likelihood estimator for Gaussian location mixture densities with application to Gaussian denoising. Annals of Statistics, 48, 738-762.
Saul, M.C., Seward, C.H., Troy, J.M., Zhang, H., Sloofman, L.G., Lu, X. et al. (2017) Transcriptional regulatory dynamics drive coordinated metabolic and neural response to social challenge in mice. Genome Research, 27, 959-972.
Schäfer, J. & Strimmer, K. (2005) A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Statistical Applications in Genetics and Molecular Biology, 4, Article 32.
Stein, C. (1975) Estimation of a covariance matrix. In: 39th Annual Meeting IMS, Atlanta, GA, 1975.
Stein, C. (1986) Lectures on the theory of estimation of many parameters. Journal of Soviet Mathematics, 34, 1373-1403.
Stigler, S.M. (1990) The 1988 Neyman memorial lecture: a Galtonian perspective on shrinkage estimators. Statistical Science, 5, 147-155.
Varin, C., Reid, N. & Firth, D. (2011) An overview of composite likelihood methods. Statistica Sinica, 21, 5-42.
Xue, L., Ma, S. & Zou, H. (2012) Positive-definite l1-penalized estimation of large covariance matrices. Journal of the American Statistical Association, 107, 1480-1491.
Zhang, C.-H. (2003) Compound decision theory and empirical Bayes methods. The Annals of Statistics, 31, 379-390.
Zhang, B. & Horvath, S. (2005) A general framework for weighted gene co-expression network analysis. Statistical Applications in Genetics and Molecular Biology, 4, Article 17.