Addressing the batch effect issue for LC/MS metabolomics data in data preprocessing.


Journal

Scientific reports
ISSN: 2045-2322
Titre abrégé: Sci Rep
Pays: England
ID NLM: 101563288

Informations de publication

Date de publication:
17 08 2020
Historique:
received: 03 02 2020
accepted: 28 07 2020
entrez: 19 8 2020
pubmed: 19 8 2020
medline: 10 2 2021
Statut: epublish

Résumé

With the growth of metabolomics research, more and more studies are conducted on large numbers of samples. Due to technical limitations of the Liquid Chromatography-Mass Spectrometry (LC/MS) platform, samples often need to be processed in multiple batches. Across different batches, we often observe differences in data characteristics. In this work, we specifically focus on data generated in multiple batches on the same LC/MS machinery. Traditional preprocessing methods treat all samples as a single group. Such practice can result in errors in the alignment of peaks, which cannot be corrected by post hoc application of batch effect correction methods. In this work, we developed a new approach that address the batch effect issue in the preprocessing stage, resulting in better peak detection, alignment and quantification. It can be combined with down-stream batch effect correction methods to further correct for between-batch intensity differences. The method is implemented in the existing workflow of the apLCMS platform. Analyzing data with multiple batches, both generated from standardized quality control (QC) plasma samples and from real biological studies, the new method resulted in feature tables with better consistency, as well as better down-stream analysis results. The method can be a useful addition to the tools available for large studies involving multiple batches. The method is available as part of the apLCMS package. Download link and instructions are at https://mypage.cuhk.edu.cn/academics/yutianwei/apLCMS/ .

Identifiants

pubmed: 32807888
doi: 10.1038/s41598-020-70850-0
pii: 10.1038/s41598-020-70850-0
pmc: PMC7431853
doi:

Types de publication

Journal Article Research Support, N.I.H., Extramural

Langues

eng

Sous-ensembles de citation

IM

Pagination

13856

Subventions

Organisme : NIH HHS
ID : U01CA235493
Pays : United States

Références

Aberg, K. M., Torgrip, R. J., Kolmert, J., Schuppe-Koistinen, I. & Lindberg, J. Feature detection and alignment of hyphenated chromatographic-mass spectrometric data. Extraction of pure ion chromatograms using Kalman tracking. J. Chromatogr. A 1192, 139–146. https://doi.org/10.1016/j.chroma.2008.03.033 (2008).
doi: 10.1016/j.chroma.2008.03.033 pubmed: 18378252
Chae, M., Shmookler Reis, R. J. & Thaden, J. J. An iterative block-shifting approach to retention time alignment that preserves the shape and area of gas chromatography-mass spectrometry peaks. BMC Bioinform. 9(Suppl 9), S15. https://doi.org/10.1186/1471-2105-9-S9-S15 (2008).
doi: 10.1186/1471-2105-9-S9-S15
Katajamaa, M., Miettinen, J. & Oresic, M. MZmine: Toolbox for processing and visualization of mass spectrometry based molecular profile data. Bioinformatics (Oxford, England) 22, 634–636 (2006).
doi: 10.1093/bioinformatics/btk039
Li, Z. et al. Nonlinear alignment of chromatograms by means of moving window fast Fourier transfrom cross-correlation. J. Sep. Sci. 36, 1677–1684. https://doi.org/10.1002/jssc.201201021 (2013).
doi: 10.1002/jssc.201201021 pubmed: 23436496
Smith, C. A., Want, E. J., O’Maille, G., Abagyan, R. & Siuzdak, G. XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. Anal. Chem. 78, 779–787 (2006).
doi: 10.1021/ac051437y
Stolt, R. et al. Second-order peak detection for multicomponent high-resolution LC/MS data. Anal. Chem. 78, 975–983. https://doi.org/10.1021/ac050980b (2006).
doi: 10.1021/ac050980b pubmed: 16478086
Takahashi, H., Morimoto, T., Ogasawara, N. & Kanaya, S. AMDORAP: Non-targeted metabolic profiling based on high-resolution LC–MS. BMC Bioinform. 12, 259. https://doi.org/10.1186/1471-2105-12-259 (2011).
doi: 10.1186/1471-2105-12-259
Tautenhahn, R., Bottcher, C. & Neumann, S. Highly sensitive feature detection for high resolution LC/MS. BMC Bioinform. 9, 504. https://doi.org/10.1186/1471-2105-9-504 (2008).
doi: 10.1186/1471-2105-9-504
Trevino, V. et al. GridMass: A fast two-dimensional feature detection method for LC/MS. J. Mass Spectrom. 50, 165–174. https://doi.org/10.1002/jms.3512 (2015).
doi: 10.1002/jms.3512 pubmed: 25601689
Uppal, K. et al. xMSanalyzer: Automated pipeline for improved feature detection and downstream analysis of large-scale, non-targeted metabolomics data. BMC Bioinform. 14, 15. https://doi.org/10.1186/1471-2105-14-15 (2013).
doi: 10.1186/1471-2105-14-15
Yu, T., Park, Y., Johnson, J. M. & Jones, D. P. apLCMS–adaptive processing of high-resolution LC/MS data. Bioinformatics (Oxford, England) 25, 1930–1936. https://doi.org/10.1093/bioinformatics/btp291 (2009).
doi: 10.1093/bioinformatics/btp291
Yu, T., Park, Y., Li, S. & Jones, D. P. Hybrid feature detection and information accumulation using high-resolution LC–MS metabolomics data. J. Proteome Res. 12, 1419–1427. https://doi.org/10.1021/pr301053d (2013).
doi: 10.1021/pr301053d pubmed: 23362826 pmcid: 3624888
Spicer, R., Salek, R. M., Moreno, P., Canueto, D. & Steinbeck, C. Navigating freely-available software tools for metabolomics analysis. Metabolomics 13, 106. https://doi.org/10.1007/s11306-017-1242-7 (2017).
doi: 10.1007/s11306-017-1242-7 pubmed: 28890673 pmcid: 5550549
Kuhl, C., Tautenhahn, R., Bottcher, C., Larson, T. R. & Neumann, S. CAMERA: An integrated strategy for compound spectra extraction and annotation of liquid chromatography/mass spectrometry data sets. Anal. Chem. 84, 283–289. https://doi.org/10.1021/ac202450g (2012).
doi: 10.1021/ac202450g pubmed: 22111785
Blazenovic, I., Kind, T., Ji, J. & Fiehn, O. Software tools and approaches for compound identification of LC–MS/MS data in metabolomics. Metabolites https://doi.org/10.3390/metabo8020031 (2018).
doi: 10.3390/metabo8020031 pubmed: 29748461 pmcid: 6027441
Jaeger, C., Meret, M., Schmitt, C. A. & Lisec, J. Compound annotation in liquid chromatography/high-resolution mass spectrometry based metabolomics: Robust adduct ion determination as a prerequisite to structure prediction in electrospray ionization mass spectra. Rapid. Commun. Mass Spectrom. 31, 1261–1266. https://doi.org/10.1002/rcm.7905 (2017).
doi: 10.1002/rcm.7905 pubmed: 28499062
Zhang, W. et al. MET-COFEA: A liquid chromatography/mass spectrometry data processing platform for metabolite compound feature extraction and annotation. Anal. Chem. 86, 6245–6253. https://doi.org/10.1021/ac501162k (2014).
doi: 10.1021/ac501162k pubmed: 24856452
Uppal, K., Walker, D. I. & Jones, D. P. xMSannotator: An R package for network-based annotation of high-resolution metabolomics data. Anal. Chem. 89, 1063–1067. https://doi.org/10.1021/acs.analchem.6b01214 (2017).
doi: 10.1021/acs.analchem.6b01214 pubmed: 27977166 pmcid: 5447360
Smith, C. A. et al. METLIN: A metabolite mass spectral database. Ther. Drug Monit. 27, 747–751 (2005).
doi: 10.1097/01.ftd.0000179845.53213.39
Wishart, D. S. et al. HMDB: A knowledgebase for the human metabolome. Nucleic Acids Res. 37, D603-610. https://doi.org/10.1093/nar/gkn810 (2009).
doi: 10.1093/nar/gkn810 pubmed: 18953024
Cui, Q. et al. Metabolite identification via the Madison Metabolomics Consortium Database. Nat. Biotechnol. 26, 162–164. https://doi.org/10.1038/nbt0208-162 (2008).
doi: 10.1038/nbt0208-162 pubmed: 18259166
Brunius, C., Shi, L. & Landberg, R. Large-scale untargeted LC–MS metabolomics data correction using between-batch feature alignment and cluster-based within-batch signal intensity drift correction. Metabolomics 12, 173. https://doi.org/10.1007/s11306-016-1124-4 (2016).
doi: 10.1007/s11306-016-1124-4 pubmed: 27746707 pmcid: 5031781
Luan, H., Ji, F., Chen, Y. & Cai, Z. statTarget: A streamlined tool for signal drift correction and interpretations of quantitative mass spectrometry-based omics data. Anal. Chim. Acta 1036, 66–72. https://doi.org/10.1016/j.aca.2018.08.002 (2018).
doi: 10.1016/j.aca.2018.08.002 pubmed: 30253838
Kirwan, J. A., Broadhurst, D. I., Davidson, R. L. & Viant, M. R. Characterising and correcting batch variation in an automated direct infusion mass spectrometry (DIMS) metabolomics workflow. Anal. Bioanal. Chem. 405, 5147–5157. https://doi.org/10.1007/s00216-013-6856-7 (2013).
doi: 10.1007/s00216-013-6856-7 pubmed: 23455646
Kuligowski, J., Sanchez-Illana, A., Sanjuan-Herraez, D., Vento, M. & Quintas, G. Intra-batch effect correction in liquid chromatography-mass spectrometry using quality control samples and support vector regression (QC-SVRC). Analyst 140, 7810–7817. https://doi.org/10.1039/c5an01638j (2015).
doi: 10.1039/c5an01638j pubmed: 26462549
Sanchez-Illana, A. et al. Evaluation of batch effect elimination using quality control replicates in LC–MS metabolite profiling. Anal. Chim. Acta 1019, 38–48. https://doi.org/10.1016/j.aca.2018.02.053 (2018).
doi: 10.1016/j.aca.2018.02.053 pubmed: 29625683
Fei, T. & Yu, T. scBatch: Batch-effect correction of RNA-seq data through sample distance matrix adjustment. Bioinformatics (Oxford, England) 36, 3115–3123. https://doi.org/10.1093/bioinformatics/btaa097 (2020).
doi: 10.1093/bioinformatics/btaa097
Johnson, W. E., Li, C. & Rabinovic, A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8, 118–127. https://doi.org/10.1093/biostatistics/kxj037 (2007).
doi: 10.1093/biostatistics/kxj037 pubmed: 16632515
Deng, K. et al. WaveICA: A novel algorithm to remove batch effects for large-scale untargeted metabolomics data based on wavelet analysis. Anal. Chim Acta 1061, 60–69. https://doi.org/10.1016/j.aca.2019.02.010 (2019).
doi: 10.1016/j.aca.2019.02.010 pubmed: 30926040
Rong, Z. et al. NormAE: Deep adversarial learning model to remove batch effects in liquid chromatography mass spectrometry-based metabolomics data. Anal. Chem. 92, 5082–5090. https://doi.org/10.1021/acs.analchem.9b05460 (2020).
doi: 10.1021/acs.analchem.9b05460 pubmed: 32207605
Salerno, S. Jr. et al. RRmix: A method for simultaneous batch effect correction and analysis of metabolomics data in the absence of internal standards. PLoS ONE 12, e0179530. https://doi.org/10.1371/journal.pone.0179530 (2017).
doi: 10.1371/journal.pone.0179530 pubmed: 28662051 pmcid: 5491020
Dunn, W. B. et al. Procedures for large-scale metabolic profiling of serum and plasma using gas chromatography and liquid chromatography coupled to mass spectrometry. Nat. Protoc. 6, 1060–1083. https://doi.org/10.1038/nprot.2011.335 (2011).
doi: 10.1038/nprot.2011.335 pubmed: 21720319
Fan, S. et al. Systematic error removal using random forest for normalizing large-scale untargeted lipidomics data. Anal. Chem. 91, 3590–3596. https://doi.org/10.1021/acs.analchem.8b05592 (2019).
doi: 10.1021/acs.analchem.8b05592 pubmed: 30758187
https://www.metabolomicsworkbench.org/data/DRCCMetadata.php?Mode=Study&StudyID=ST000868 .
Sud, M. et al. Metabolomics Workbench: An international repository for metabolomics data and metadata, metabolite standards, protocols, tutorials and training, and analysis tools. Nucleic Acids Res 44, D463-470. https://doi.org/10.1093/nar/gkv1042 (2016).
doi: 10.1093/nar/gkv1042 pubmed: 26467476
Tabassum, R. et al. A longitudinal study of health improvement in the Atlanta CHDWB Wellness Cohort. J. Pers. Med. 4, 489–507. https://doi.org/10.3390/jpm4040489 (2014).
doi: 10.3390/jpm4040489 pubmed: 25563459 pmcid: 4282885
Libiseller, G. et al. IPO: A tool for automated optimization of XCMS parameters. BMC Bioinform. 16, 118. https://doi.org/10.1186/s12859-015-0562-8 (2015).
doi: 10.1186/s12859-015-0562-8
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc. B 57, 289–300 (1995).
Ho, J. E. et al. Metabolomic profiles of body mass index in the Framingham heart study reveal distinct Cardiometabolic phenotypes. PLoS ONE 11, e0148361. https://doi.org/10.1371/journal.pone.0148361 (2016).
doi: 10.1371/journal.pone.0148361 pubmed: 26863521 pmcid: 4749349
Li, S. et al. Predicting network activity from high throughput metabolomics. PLoS Comput. Biol. 9, e1003123. https://doi.org/10.1371/journal.pcbi.1003123 (2013).
doi: 10.1371/journal.pcbi.1003123 pubmed: 23861661 pmcid: 3701697
Manna, P. & Jain, S. K. Phosphatidylinositol-3,4,5-triphosphate and cellular signaling: Implications for obesity and diabetes. Cell Physiol. Biochem. 35, 1253–1275. https://doi.org/10.1159/000373949 (2015).
doi: 10.1159/000373949 pubmed: 25721445 pmcid: 4797942

Auteurs

Qin Liu (Q)

School of Software Engineering, Tongji University, Shanghai, 201804, China.

Douglas Walker (D)

Department of Environmental Medicine and Public Health, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA.

Karan Uppal (K)

Department of Medicine, School of Medicine, Emory University, Atlanta, GA, 30322, USA.

Zihe Liu (Z)

School of Software Engineering, Tongji University, Shanghai, 201804, China.

Chunyu Ma (C)

Department of Medicine, School of Medicine, Emory University, Atlanta, GA, 30322, USA.

ViLinh Tran (V)

Department of Medicine, School of Medicine, Emory University, Atlanta, GA, 30322, USA.

Shuzhao Li (S)

The Jackson Laboratory, Farmington, CT, 06032, USA.

Dean P Jones (DP)

Department of Medicine, School of Medicine, Emory University, Atlanta, GA, 30322, USA.

Tianwei Yu (T)

School of Data Science, The Chinese University of Hong Kong - Shenzhen, Shenzhen, 518172, Guangdong Province, China. yutianwei@cuhk.edu.cn.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH