A New Pipeline for the Normalization and Pooling of Metabolomics Data.
cancer epidemiology
metabolites
metabolomics
normalization
pooling
technical variability
Journal
Metabolites
ISSN: 2218-1989
Titre abrégé: Metabolites
Pays: Switzerland
ID NLM: 101578790
Informations de publication
Date de publication:
17 Sep 2021
17 Sep 2021
Historique:
received:
15
07
2021
revised:
10
09
2021
accepted:
13
09
2021
entrez:
26
9
2021
pubmed:
27
9
2021
medline:
27
9
2021
Statut:
epublish
Résumé
Pooling metabolomics data across studies is often desirable to increase the statistical power of the analysis. However, this can raise methodological challenges as several preanalytical and analytical factors could introduce differences in measured concentrations and variability between datasets. Specifically, different studies may use variable sample types (e.g., serum versus plasma) collected, treated, and stored according to different protocols, and assayed in different laboratories using different instruments. To address these issues, a new pipeline was developed to normalize and pool metabolomics data through a set of sequential steps: (i) exclusions of the least informative observations and metabolites and removal of outliers; imputation of missing data; (ii) identification of the main sources of variability through principal component partial R-square (PC-PR2) analysis; (iii) application of linear mixed models to remove unwanted variability, including samples' originating study and batch, and preserve biological variations while accounting for potential differences in the residual variances across studies. This pipeline was applied to targeted metabolomics data acquired using Biocrates AbsoluteIDQ kits in eight case-control studies nested within the European Prospective Investigation into Cancer and Nutrition (EPIC) cohort. Comprehensive examination of metabolomics measurements indicated that the pipeline improved the comparability of data across the studies. Our pipeline can be adapted to normalize other molecular data, including biomarkers as well as proteomics data, and could be used for pooling molecular datasets, for example in international consortia, to limit biases introduced by inter-study variability. This versatility of the pipeline makes our work of potential interest to molecular epidemiologists.
Identifiants
pubmed: 34564446
pii: metabo11090631
doi: 10.3390/metabo11090631
pmc: PMC8467830
pii:
doi:
Types de publication
Journal Article
Langues
eng
Références
Bioinformatics. 2001 Jun;17(6):520-5
pubmed: 11395428
Bioinformatics. 2012 Mar 15;28(6):882-3
pubmed: 22257669
BMC Bioinformatics. 2019 Jun 14;20(1):334
pubmed: 31200644
Front Nutr. 2019 Apr 09;6:41
pubmed: 31024923
J Intern Med. 2020 Apr;287(4):405-421
pubmed: 31814205
Anal Chem. 2021 Mar 30;93(12):5028-5036
pubmed: 33724799
Stat Med. 2019 Apr 15;38(8):1303-1320
pubmed: 30569596
Am J Epidemiol. 2005 Sep 15;162(6):591-8
pubmed: 16093290
Public Health Nutr. 2002 Dec;5(6B):1113-24
pubmed: 12639222
Metabolites. 2013 Jul 05;3(3):552-74
pubmed: 24958139
Clin Chim Acta. 2018 Jul;482:78-83
pubmed: 29596816
Metabolomics. 2018 Sep 20;14(10):128
pubmed: 30830398
PLoS Genet. 2007 Sep;3(9):1724-35
pubmed: 17907809
BMC Med. 2021 Apr 30;19(1):101
pubmed: 33926456
Int J Cancer. 2020 Feb 1;146(3):720-730
pubmed: 30951192
Bioinformatics. 2015 Mar 1;31(5):788-90
pubmed: 25348215
Biostatistics. 2007 Jan;8(1):118-27
pubmed: 16632515
Am J Epidemiol. 2019 Jun 1;188(6):991-1012
pubmed: 31155658
Nat Methods. 2016 Sep;13(9):770-6
pubmed: 27479327
Proc Natl Acad Sci U S A. 2000 Aug 29;97(18):10101-6
pubmed: 10963673
Anal Chem. 2017 Jan 3;89(1):656-665
pubmed: 27959516
Bioinformatics. 2014 Aug 1;30(15):2155-61
pubmed: 24711654
BMC Med. 2017 Jul 5;15(1):122
pubmed: 28676103
BMC Med. 2019 Sep 24;17(1):178
pubmed: 31547832
Gynecol Oncol. 2021 Aug;162(2):475-481
pubmed: 34099314
Metabolites. 2019 Sep 23;9(10):
pubmed: 31548506
Cancer Epidemiol Biomarkers Prev. 2018 May;27(5):531-540
pubmed: 29563134
Int J Cancer. 2016 Jan 15;138(2):348-60
pubmed: 26238458
Eur Heart J. 2020 Jul 21;41(28):2645-2656
pubmed: 32406924