HIGH-DIMENSIONAL FACTOR REGRESSION FOR HETEROGENEOUS SUBPOPULATIONS.

Factor models heterogeneity penalized regression prediction

Journal

Statistica Sinica
ISSN: 1017-0405
Titre abrégé: Stat Sin
Pays: China (Republic : 1949- )
ID NLM: 101473244

Informations de publication

Date de publication:
Jan 2023
Historique:
medline: 19 10 2023
pubmed: 19 10 2023
entrez: 19 10 2023
Statut: ppublish

Résumé

In modern scientific research, data heterogeneity is commonly observed owing to the abundance of complex data. We propose a factor regression model for data with heterogeneous subpopulations. The proposed model can be represented as a decomposition of heterogeneous and homogeneous terms. The heterogeneous term is driven by latent factors in different subpopulations. The homogeneous term captures common variation in the covariates and shares common regression coefficients across subpopulations. Our proposed model attains a good balance between a global model and a group-specific model. The global model ignores the data heterogeneity, while the group-specific model fits each subgroup separately. We prove the estimation and prediction consistency for our proposed estimators, and show that it has better convergence rates than those of the group-specific and global models. We show that the extra cost of estimating latent factors is asymptotically negligible and the minimax rate is still attainable. We further demonstrate the robustness of our proposed method by studying its prediction error under a mis-specified group-specific model. Finally, we conduct simulation studies and analyze a data set from the Alzheimer's Disease Neuroimaging Initiative and an aggregated microarray data set to further demonstrate the competitiveness and interpretability of our proposed factor regression model.

Identifiants

pubmed: 37854586
doi: 10.5705/ss.202020.0145
pmc: PMC10583735
mid: NIHMS1892524
doi:

Types de publication

Journal Article

Langues

eng

Pagination

27-53

Subventions

Organisme : NIA NIH HHS
ID : R01 AG073259
Pays : United States
Organisme : NIGMS NIH HHS
ID : R01 GM126550
Pays : United States

Références

Brief Bioinform. 2013 May;14(3):263-78
pubmed: 22692086
Stat Methods Med Res. 1992;1(1):69-95
pubmed: 1341653
Ann Stat. 2016 Aug;44(4):1400-1437
pubmed: 28428647
J Mach Learn Res. 2016;17:
pubmed: 29056876
J Am Stat Assoc. 2018;113(521):380-389
pubmed: 29930437
J R Stat Soc Series B Stat Methodol. 2013 Sep 1;75(4):
pubmed: 24348088
Neuroimage. 2011 Apr 1;55(3):856-67
pubmed: 21236349
IEEE Trans Neural Netw Learn Syst. 2016 Nov;27(11):2426-2439
pubmed: 26529787
Ann Appl Stat. 2013 Mar 1;7(1):523-542
pubmed: 23745156
IEEE Trans Med Imaging. 2019 Jun;38(6):1398-1408
pubmed: 30530315
Biometrics. 2020 Mar;76(1):61-74
pubmed: 31444786
Biometrics. 2019 Dec;75(4):1121-1132
pubmed: 31254385
Electron J Stat. 2018;12(2):3908-3952
pubmed: 31666911

Auteurs

Peiyao Wang (P)

University of North Carolina at Chapel Hill.

Quefeng Li (Q)

University of North Carolina at Chapel Hill.

Dinggang Shen (D)

ShanghaiTech University.
Shanghai United Imaging Intelligence Co.
Korea University.

Yufeng Liu (Y)

University of North Carolina at Chapel Hill.

Classifications MeSH