Bag of little bootstraps for massive and distributed longitudinal data.

EMR bags of little bootstraps big data linear mixed models longitudinal data parallel and distributed computing

Journal

Statistical analysis and data mining
ISSN: 1932-1864
Titre abrégé: Stat Anal Data Min
Pays: United States
ID NLM: 101492808

Informations de publication

Date de publication:
Jun 2022
Historique:
entrez: 3 6 2022
pubmed: 4 6 2022
medline: 4 6 2022
Statut: ppublish

Résumé

Linear mixed models are widely used for analyzing longitudinal datasets, and the inference for variance component parameters relies on the bootstrap method. However, health systems and technology companies routinely generate massive longitudinal datasets that make the traditional bootstrap method infeasible. To solve this problem, we extend the highly scalable bag of little bootstraps method for independent data to longitudinal data and develop a highly efficient Julia package MixedModelsBLB.jl. Simulation experiments and real data analysis demonstrate the favorable statistical performance and computational advantages of our method compared to the traditional bootstrap method. For the statistical inference of variance components, it achieves 200 times speedup on the scale of 1 million subjects (20 million total observations), and is the only currently available tool that can handle more than 10 million subjects (200 million total observations) using desktop computers.

Identifiants

pubmed: 35656342
doi: 10.1002/sam.11563
pmc: PMC9159544
mid: NIHMS1774207
doi:

Types de publication

Journal Article

Langues

eng

Pagination

314-321

Subventions

Organisme : NIDDK NIH HHS
ID : K01 DK106116
Pays : United States
Organisme : NHGRI NIH HHS
ID : R01 HG006139
Pays : United States
Organisme : NHLBI NIH HHS
ID : R21 HL150374
Pays : United States
Organisme : NIGMS NIH HHS
ID : R35 GM141798
Pays : United States

Déclaration de conflit d'intérêts

CONFLICT OF INTEREST The authors declare no conflicts of interest.

Références

J Sleep Res. 2003 Mar;12(1):1-12
pubmed: 12603781
Lancet. 2010 Aug 7;376(9739):419-30
pubmed: 20594588
Diabetes Care. 2015 Nov;38(11):2000-8
pubmed: 26464212
Biometrics. 2021 Jun 18;:
pubmed: 34142722

Auteurs

Xinkai Zhou (X)

Department of Biostatistics, University of California, Los Angeles, California, USA.

Jin J Zhou (JJ)

Department of Medicine, University of California, Los Angeles, California, USA.

Hua Zhou (H)

Department of Biostatistics, University of California, Los Angeles, California, USA.
Department of Computational Medicine, University of California, Los Angeles, California, USA.

Classifications MeSH