On Support Recovery with Sparse CCA: Information Theoretic and Computational Limits.

Canonical Correlation Analysis High Dimension Low Degree Polynomials Support Recovery Variable Selection

Journal

IEEE transactions on information theory
ISSN: 0018-9448
Titre abrégé: IEEE Trans Inf Theory
Pays: United States
ID NLM: 9880925

Informations de publication

Date de publication:
Mar 2023
Historique:
pmc-release: 01 03 2024
medline: 16 10 2023
pubmed: 16 10 2023
entrez: 16 10 2023
Statut: ppublish

Résumé

In this paper, we consider asymptotically exact support recovery in the context of high dimensional and sparse Canonical Correlation Analysis (CCA). Our main results describe four regimes of interest based on information theoretic and computational considerations. In regimes of "low" sparsity we describe a simple, general, and computationally easy method for support recovery, whereas in a regime of "high" sparsity, it turns out that support recovery is information theoretically impossible. For the sake of information theoretic lower bounds, our results also demonstrate a non-trivial requirement on the "minimal" size of the nonzero elements of the canonical vectors that is required for asymptotically consistent support recovery. Subsequently, the regime of "moderate" sparsity is further divided into two subregimes. In the lower of the two sparsity regimes, we show that polynomial time support recovery is possible by using a sharp analysis of a co-ordinate thresholding [1] type method. In contrast, in the higher end of the moderate sparsity regime, appealing to the "Low Degree Polynomial" Conjecture [2], we provide evidence that polynomial time support recovery methods are inconsistent. Finally, we carry out numerical experiments to compare the efficacy of various methods discussed.

Identifiants

pubmed: 37842015
doi: 10.1109/tit.2022.3214201
pmc: PMC10569110
mid: NIHMS1875481
doi:

Types de publication

Journal Article

Langues

eng

Pagination

1695-1738

Subventions

Organisme : NIEHS NIH HHS
ID : P42 ES030990
Pays : United States

Références

J Am Stat Assoc. 2009 Jun 1;104(486):682-693
pubmed: 20617121
BMC Bioinformatics. 2009 Jan 26;10:34
pubmed: 19171069
Stat Sin. 2021 Jan;31(1):333-360
pubmed: 35046630
Biometrics. 2019 Sep;75(3):734-744
pubmed: 30714093
Ann Appl Stat. 2013 Mar 1;7(1):523-542
pubmed: 23745156
Neuroimage. 2003 Jul;19(3):837-45
pubmed: 12880812
PLoS One. 2022 Dec 30;17(12):e0276886
pubmed: 36584096
J Neural Eng. 2009 Aug;6(4):046002
pubmed: 19494422
Biostatistics. 2009 Jul;10(3):515-34
pubmed: 19377034
Neural Comput. 2004 Dec;16(12):2639-64
pubmed: 15516276
Neuroimage. 2010 Apr 15;50(3):1004-16
pubmed: 20083207
Stat Appl Genet Mol Biol. 2008;7(1):Article3
pubmed: 18241193

Auteurs

Nilanjana Laha (N)

Department of Statistics, Texas A&M University, College Station, TX 77843.

Rajarshi Mukherjee (R)

Department of Biostatistics, Harvard T. H. Chan School of Public Health, 677 Huntington Ave, Boston, MA 02115.

Classifications MeSH