Addressing the batch effect issue for LC/MS metabolomics data in data preprocessing.
Journal
Scientific reports
ISSN: 2045-2322
Titre abrégé: Sci Rep
Pays: England
ID NLM: 101563288
Informations de publication
Date de publication:
17 08 2020
17 08 2020
Historique:
received:
03
02
2020
accepted:
28
07
2020
entrez:
19
8
2020
pubmed:
19
8
2020
medline:
10
2
2021
Statut:
epublish
Résumé
With the growth of metabolomics research, more and more studies are conducted on large numbers of samples. Due to technical limitations of the Liquid Chromatography-Mass Spectrometry (LC/MS) platform, samples often need to be processed in multiple batches. Across different batches, we often observe differences in data characteristics. In this work, we specifically focus on data generated in multiple batches on the same LC/MS machinery. Traditional preprocessing methods treat all samples as a single group. Such practice can result in errors in the alignment of peaks, which cannot be corrected by post hoc application of batch effect correction methods. In this work, we developed a new approach that address the batch effect issue in the preprocessing stage, resulting in better peak detection, alignment and quantification. It can be combined with down-stream batch effect correction methods to further correct for between-batch intensity differences. The method is implemented in the existing workflow of the apLCMS platform. Analyzing data with multiple batches, both generated from standardized quality control (QC) plasma samples and from real biological studies, the new method resulted in feature tables with better consistency, as well as better down-stream analysis results. The method can be a useful addition to the tools available for large studies involving multiple batches. The method is available as part of the apLCMS package. Download link and instructions are at https://mypage.cuhk.edu.cn/academics/yutianwei/apLCMS/ .
Identifiants
pubmed: 32807888
doi: 10.1038/s41598-020-70850-0
pii: 10.1038/s41598-020-70850-0
pmc: PMC7431853
doi:
Types de publication
Journal Article
Research Support, N.I.H., Extramural
Langues
eng
Sous-ensembles de citation
IM
Pagination
13856Subventions
Organisme : NIH HHS
ID : U01CA235493
Pays : United States
Références
Aberg, K. M., Torgrip, R. J., Kolmert, J., Schuppe-Koistinen, I. & Lindberg, J. Feature detection and alignment of hyphenated chromatographic-mass spectrometric data. Extraction of pure ion chromatograms using Kalman tracking. J. Chromatogr. A 1192, 139–146. https://doi.org/10.1016/j.chroma.2008.03.033 (2008).
doi: 10.1016/j.chroma.2008.03.033
pubmed: 18378252
Chae, M., Shmookler Reis, R. J. & Thaden, J. J. An iterative block-shifting approach to retention time alignment that preserves the shape and area of gas chromatography-mass spectrometry peaks. BMC Bioinform. 9(Suppl 9), S15. https://doi.org/10.1186/1471-2105-9-S9-S15 (2008).
doi: 10.1186/1471-2105-9-S9-S15
Katajamaa, M., Miettinen, J. & Oresic, M. MZmine: Toolbox for processing and visualization of mass spectrometry based molecular profile data. Bioinformatics (Oxford, England) 22, 634–636 (2006).
doi: 10.1093/bioinformatics/btk039
Li, Z. et al. Nonlinear alignment of chromatograms by means of moving window fast Fourier transfrom cross-correlation. J. Sep. Sci. 36, 1677–1684. https://doi.org/10.1002/jssc.201201021 (2013).
doi: 10.1002/jssc.201201021
pubmed: 23436496
Smith, C. A., Want, E. J., O’Maille, G., Abagyan, R. & Siuzdak, G. XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. Anal. Chem. 78, 779–787 (2006).
doi: 10.1021/ac051437y
Stolt, R. et al. Second-order peak detection for multicomponent high-resolution LC/MS data. Anal. Chem. 78, 975–983. https://doi.org/10.1021/ac050980b (2006).
doi: 10.1021/ac050980b
pubmed: 16478086
Takahashi, H., Morimoto, T., Ogasawara, N. & Kanaya, S. AMDORAP: Non-targeted metabolic profiling based on high-resolution LC–MS. BMC Bioinform. 12, 259. https://doi.org/10.1186/1471-2105-12-259 (2011).
doi: 10.1186/1471-2105-12-259
Tautenhahn, R., Bottcher, C. & Neumann, S. Highly sensitive feature detection for high resolution LC/MS. BMC Bioinform. 9, 504. https://doi.org/10.1186/1471-2105-9-504 (2008).
doi: 10.1186/1471-2105-9-504
Trevino, V. et al. GridMass: A fast two-dimensional feature detection method for LC/MS. J. Mass Spectrom. 50, 165–174. https://doi.org/10.1002/jms.3512 (2015).
doi: 10.1002/jms.3512
pubmed: 25601689
Uppal, K. et al. xMSanalyzer: Automated pipeline for improved feature detection and downstream analysis of large-scale, non-targeted metabolomics data. BMC Bioinform. 14, 15. https://doi.org/10.1186/1471-2105-14-15 (2013).
doi: 10.1186/1471-2105-14-15
Yu, T., Park, Y., Johnson, J. M. & Jones, D. P. apLCMS–adaptive processing of high-resolution LC/MS data. Bioinformatics (Oxford, England) 25, 1930–1936. https://doi.org/10.1093/bioinformatics/btp291 (2009).
doi: 10.1093/bioinformatics/btp291
Yu, T., Park, Y., Li, S. & Jones, D. P. Hybrid feature detection and information accumulation using high-resolution LC–MS metabolomics data. J. Proteome Res. 12, 1419–1427. https://doi.org/10.1021/pr301053d (2013).
doi: 10.1021/pr301053d
pubmed: 23362826
pmcid: 3624888
Spicer, R., Salek, R. M., Moreno, P., Canueto, D. & Steinbeck, C. Navigating freely-available software tools for metabolomics analysis. Metabolomics 13, 106. https://doi.org/10.1007/s11306-017-1242-7 (2017).
doi: 10.1007/s11306-017-1242-7
pubmed: 28890673
pmcid: 5550549
Kuhl, C., Tautenhahn, R., Bottcher, C., Larson, T. R. & Neumann, S. CAMERA: An integrated strategy for compound spectra extraction and annotation of liquid chromatography/mass spectrometry data sets. Anal. Chem. 84, 283–289. https://doi.org/10.1021/ac202450g (2012).
doi: 10.1021/ac202450g
pubmed: 22111785
Blazenovic, I., Kind, T., Ji, J. & Fiehn, O. Software tools and approaches for compound identification of LC–MS/MS data in metabolomics. Metabolites https://doi.org/10.3390/metabo8020031 (2018).
doi: 10.3390/metabo8020031
pubmed: 29748461
pmcid: 6027441
Jaeger, C., Meret, M., Schmitt, C. A. & Lisec, J. Compound annotation in liquid chromatography/high-resolution mass spectrometry based metabolomics: Robust adduct ion determination as a prerequisite to structure prediction in electrospray ionization mass spectra. Rapid. Commun. Mass Spectrom. 31, 1261–1266. https://doi.org/10.1002/rcm.7905 (2017).
doi: 10.1002/rcm.7905
pubmed: 28499062
Zhang, W. et al. MET-COFEA: A liquid chromatography/mass spectrometry data processing platform for metabolite compound feature extraction and annotation. Anal. Chem. 86, 6245–6253. https://doi.org/10.1021/ac501162k (2014).
doi: 10.1021/ac501162k
pubmed: 24856452
Uppal, K., Walker, D. I. & Jones, D. P. xMSannotator: An R package for network-based annotation of high-resolution metabolomics data. Anal. Chem. 89, 1063–1067. https://doi.org/10.1021/acs.analchem.6b01214 (2017).
doi: 10.1021/acs.analchem.6b01214
pubmed: 27977166
pmcid: 5447360
Smith, C. A. et al. METLIN: A metabolite mass spectral database. Ther. Drug Monit. 27, 747–751 (2005).
doi: 10.1097/01.ftd.0000179845.53213.39
Wishart, D. S. et al. HMDB: A knowledgebase for the human metabolome. Nucleic Acids Res. 37, D603-610. https://doi.org/10.1093/nar/gkn810 (2009).
doi: 10.1093/nar/gkn810
pubmed: 18953024
Cui, Q. et al. Metabolite identification via the Madison Metabolomics Consortium Database. Nat. Biotechnol. 26, 162–164. https://doi.org/10.1038/nbt0208-162 (2008).
doi: 10.1038/nbt0208-162
pubmed: 18259166
Brunius, C., Shi, L. & Landberg, R. Large-scale untargeted LC–MS metabolomics data correction using between-batch feature alignment and cluster-based within-batch signal intensity drift correction. Metabolomics 12, 173. https://doi.org/10.1007/s11306-016-1124-4 (2016).
doi: 10.1007/s11306-016-1124-4
pubmed: 27746707
pmcid: 5031781
Luan, H., Ji, F., Chen, Y. & Cai, Z. statTarget: A streamlined tool for signal drift correction and interpretations of quantitative mass spectrometry-based omics data. Anal. Chim. Acta 1036, 66–72. https://doi.org/10.1016/j.aca.2018.08.002 (2018).
doi: 10.1016/j.aca.2018.08.002
pubmed: 30253838
Kirwan, J. A., Broadhurst, D. I., Davidson, R. L. & Viant, M. R. Characterising and correcting batch variation in an automated direct infusion mass spectrometry (DIMS) metabolomics workflow. Anal. Bioanal. Chem. 405, 5147–5157. https://doi.org/10.1007/s00216-013-6856-7 (2013).
doi: 10.1007/s00216-013-6856-7
pubmed: 23455646
Kuligowski, J., Sanchez-Illana, A., Sanjuan-Herraez, D., Vento, M. & Quintas, G. Intra-batch effect correction in liquid chromatography-mass spectrometry using quality control samples and support vector regression (QC-SVRC). Analyst 140, 7810–7817. https://doi.org/10.1039/c5an01638j (2015).
doi: 10.1039/c5an01638j
pubmed: 26462549
Sanchez-Illana, A. et al. Evaluation of batch effect elimination using quality control replicates in LC–MS metabolite profiling. Anal. Chim. Acta 1019, 38–48. https://doi.org/10.1016/j.aca.2018.02.053 (2018).
doi: 10.1016/j.aca.2018.02.053
pubmed: 29625683
Fei, T. & Yu, T. scBatch: Batch-effect correction of RNA-seq data through sample distance matrix adjustment. Bioinformatics (Oxford, England) 36, 3115–3123. https://doi.org/10.1093/bioinformatics/btaa097 (2020).
doi: 10.1093/bioinformatics/btaa097
Johnson, W. E., Li, C. & Rabinovic, A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8, 118–127. https://doi.org/10.1093/biostatistics/kxj037 (2007).
doi: 10.1093/biostatistics/kxj037
pubmed: 16632515
Deng, K. et al. WaveICA: A novel algorithm to remove batch effects for large-scale untargeted metabolomics data based on wavelet analysis. Anal. Chim Acta 1061, 60–69. https://doi.org/10.1016/j.aca.2019.02.010 (2019).
doi: 10.1016/j.aca.2019.02.010
pubmed: 30926040
Rong, Z. et al. NormAE: Deep adversarial learning model to remove batch effects in liquid chromatography mass spectrometry-based metabolomics data. Anal. Chem. 92, 5082–5090. https://doi.org/10.1021/acs.analchem.9b05460 (2020).
doi: 10.1021/acs.analchem.9b05460
pubmed: 32207605
Salerno, S. Jr. et al. RRmix: A method for simultaneous batch effect correction and analysis of metabolomics data in the absence of internal standards. PLoS ONE 12, e0179530. https://doi.org/10.1371/journal.pone.0179530 (2017).
doi: 10.1371/journal.pone.0179530
pubmed: 28662051
pmcid: 5491020
Dunn, W. B. et al. Procedures for large-scale metabolic profiling of serum and plasma using gas chromatography and liquid chromatography coupled to mass spectrometry. Nat. Protoc. 6, 1060–1083. https://doi.org/10.1038/nprot.2011.335 (2011).
doi: 10.1038/nprot.2011.335
pubmed: 21720319
Fan, S. et al. Systematic error removal using random forest for normalizing large-scale untargeted lipidomics data. Anal. Chem. 91, 3590–3596. https://doi.org/10.1021/acs.analchem.8b05592 (2019).
doi: 10.1021/acs.analchem.8b05592
pubmed: 30758187
https://www.metabolomicsworkbench.org/data/DRCCMetadata.php?Mode=Study&StudyID=ST000868 .
Sud, M. et al. Metabolomics Workbench: An international repository for metabolomics data and metadata, metabolite standards, protocols, tutorials and training, and analysis tools. Nucleic Acids Res 44, D463-470. https://doi.org/10.1093/nar/gkv1042 (2016).
doi: 10.1093/nar/gkv1042
pubmed: 26467476
Tabassum, R. et al. A longitudinal study of health improvement in the Atlanta CHDWB Wellness Cohort. J. Pers. Med. 4, 489–507. https://doi.org/10.3390/jpm4040489 (2014).
doi: 10.3390/jpm4040489
pubmed: 25563459
pmcid: 4282885
Libiseller, G. et al. IPO: A tool for automated optimization of XCMS parameters. BMC Bioinform. 16, 118. https://doi.org/10.1186/s12859-015-0562-8 (2015).
doi: 10.1186/s12859-015-0562-8
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc. B 57, 289–300 (1995).
Ho, J. E. et al. Metabolomic profiles of body mass index in the Framingham heart study reveal distinct Cardiometabolic phenotypes. PLoS ONE 11, e0148361. https://doi.org/10.1371/journal.pone.0148361 (2016).
doi: 10.1371/journal.pone.0148361
pubmed: 26863521
pmcid: 4749349
Li, S. et al. Predicting network activity from high throughput metabolomics. PLoS Comput. Biol. 9, e1003123. https://doi.org/10.1371/journal.pcbi.1003123 (2013).
doi: 10.1371/journal.pcbi.1003123
pubmed: 23861661
pmcid: 3701697
Manna, P. & Jain, S. K. Phosphatidylinositol-3,4,5-triphosphate and cellular signaling: Implications for obesity and diabetes. Cell Physiol. Biochem. 35, 1253–1275. https://doi.org/10.1159/000373949 (2015).
doi: 10.1159/000373949
pubmed: 25721445
pmcid: 4797942