Correcting batch effects in large-scale multiomics studies using a reference-material-based ratio method.
Batch effect
Data integration
Differentially expressed
Metrology
Multiomics
Phenomics
Prediction
Quartet family
Ratio
Reference materials
Journal
Genome biology
ISSN: 1474-760X
Titre abrégé: Genome Biol
Pays: England
ID NLM: 100960660
Informations de publication
Date de publication:
07 09 2023
07 09 2023
Historique:
received:
08
11
2022
accepted:
18
05
2023
medline:
8
9
2023
pubmed:
7
9
2023
entrez:
6
9
2023
Statut:
epublish
Résumé
Batch effects are notoriously common technical variations in multiomics data and may result in misleading outcomes if uncorrected or over-corrected. A plethora of batch-effect correction algorithms are proposed to facilitate data integration. However, their respective advantages and limitations are not adequately assessed in terms of omics types, the performance metrics, and the application scenarios. As part of the Quartet Project for quality control and data integration of multiomics profiling, we comprehensively assess the performance of seven batch effect correction algorithms based on different performance metrics of clinical relevance, i.e., the accuracy of identifying differentially expressed features, the robustness of predictive models, and the ability of accurately clustering cross-batch samples into their own donors. The ratio-based method, i.e., by scaling absolute feature values of study samples relative to those of concurrently profiled reference material(s), is found to be much more effective and broadly applicable than others, especially when batch effects are completely confounded with biological factors of study interests. We further provide practical guidelines for implementing the ratio based approach in increasingly large-scale multiomics studies. Multiomics measurements are prone to batch effects, which can be effectively corrected using ratio-based scaling of the multiomics data. Our study lays the foundation for eliminating batch effects at a ratio scale.
Sections du résumé
BACKGROUND
Batch effects are notoriously common technical variations in multiomics data and may result in misleading outcomes if uncorrected or over-corrected. A plethora of batch-effect correction algorithms are proposed to facilitate data integration. However, their respective advantages and limitations are not adequately assessed in terms of omics types, the performance metrics, and the application scenarios.
RESULTS
As part of the Quartet Project for quality control and data integration of multiomics profiling, we comprehensively assess the performance of seven batch effect correction algorithms based on different performance metrics of clinical relevance, i.e., the accuracy of identifying differentially expressed features, the robustness of predictive models, and the ability of accurately clustering cross-batch samples into their own donors. The ratio-based method, i.e., by scaling absolute feature values of study samples relative to those of concurrently profiled reference material(s), is found to be much more effective and broadly applicable than others, especially when batch effects are completely confounded with biological factors of study interests. We further provide practical guidelines for implementing the ratio based approach in increasingly large-scale multiomics studies.
CONCLUSIONS
Multiomics measurements are prone to batch effects, which can be effectively corrected using ratio-based scaling of the multiomics data. Our study lays the foundation for eliminating batch effects at a ratio scale.
Identifiants
pubmed: 37674217
doi: 10.1186/s13059-023-03047-z
pii: 10.1186/s13059-023-03047-z
pmc: PMC10483871
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
201Informations de copyright
© 2023. BioMed Central Ltd., part of Springer Nature.
Références
Nat Biotechnol. 2023 Sep 7;:
pubmed: 37679543
Genome Biol. 2023 Sep 7;24(1):201
pubmed: 37674217
Brief Bioinform. 2013 Jul;14(4):469-90
pubmed: 22851511
Genome Biol. 2020 Jan 16;21(1):12
pubmed: 31948481
J Proteomics. 2012 Jul 16;75(13):3938-51
pubmed: 22588121
BMC Biol. 2014 May 30;12:42
pubmed: 24885439
NAR Genom Bioinform. 2020 Sep;2(3):lqaa078
pubmed: 33015620
Mol Syst Biol. 2021 Aug;17(8):e10240
pubmed: 34432947
Nat Protoc. 2016 Sep;11(9):1650-67
pubmed: 27560171
Bioinformatics. 2012 Mar 15;28(6):882-3
pubmed: 22257669
Nat Biotechnol. 2014 Sep;32(9):888-95
pubmed: 25150837
Genome Biol. 2015 Jun 25;16:133
pubmed: 26109056
PLoS One. 2016 Jun 07;11(6):e0156594
pubmed: 27272489
Proteomics. 2014 Nov;14(21-22):2443-53
pubmed: 25211154
Biostatistics. 2007 Jan;8(1):118-27
pubmed: 16632515
Genome Biol. 2023 Sep 7;24(1):202
pubmed: 37674236
Nat Biotechnol. 2021 Sep;39(9):1103-1114
pubmed: 33349700
Trends Biotechnol. 2022 Sep;40(9):1029-1040
pubmed: 35282901
Biostatistics. 2018 Jan 1;19(1):71-86
pubmed: 28541380
BMC Genomics. 2017 Mar 14;18(Suppl 2):142
pubmed: 28361693
Nature. 2021 Dec;600(7889):368-369
pubmed: 34887584
Comput Struct Biotechnol J. 2022 Aug 12;20:4369-4375
pubmed: 36051874
Eur J Mass Spectrom (Chichester). 2020 Jun;26(3):165-174
pubmed: 32276547
Nat Methods. 2014 Mar;11(3):333-7
pubmed: 24464287
Mass Spectrom Rev. 2022 May;41(3):421-442
pubmed: 33238061
PLoS One. 2017 May 1;12(5):e0176278
pubmed: 28459819
Nat Methods. 2019 Dec;16(12):1289-1296
pubmed: 31740819
N Engl J Med. 2016 Aug 25;375(8):717-29
pubmed: 27557300
Nat Biotechnol. 2014 Sep;32(9):903-14
pubmed: 25150838
Biostatistics. 2016 Jan;17(1):29-39
pubmed: 26272994
BMC Bioinformatics. 2005 Jul 15;6 Suppl 2:S11
pubmed: 16026596
Anal Chem. 2017 Jan 3;89(1):656-665
pubmed: 27959516
Bioinformatics. 2019 Sep 15;35(18):3357-3364
pubmed: 30715209
Stat Appl Genet Mol Biol. 2012;11(3):Article 10
pubmed: 22611599
Aging Cell. 2011 Oct;10(5):868-78
pubmed: 21668623
Nat Biotechnol. 2010 Aug;28(8):827-38
pubmed: 20676074
Nat Biotechnol. 2023 Sep 7;:
pubmed: 37679545
Nat Commun. 2021 Aug 17;12(1):4992
pubmed: 34404777
Brief Bioinform. 2019 Jan 18;20(1):347-355
pubmed: 30657890
Anal Chem. 2015 Apr 7;87(7):3606-15
pubmed: 25692814
Proc Natl Acad Sci U S A. 2013 Mar 12;110(11):4245-50
pubmed: 23431203
Bioinformatics. 2022 Apr 28;38(9):2657-2658
pubmed: 35238331
Nat Methods. 2013 Jul;10(7):623-9
pubmed: 23685885
Trends Biotechnol. 2017 Jun;35(6):498-507
pubmed: 28351613
Nat Biotechnol. 2023 Jan;41(1):82-95
pubmed: 36109686
J Genet Genomics. 2019 Sep 20;46(9):433-443
pubmed: 31611172
Nat Biotechnol. 2006 Sep;24(9):1162-9
pubmed: 17061323
Nat Commun. 2020 Jul 30;11(1):3793
pubmed: 32732981
Nat Biotechnol. 2014 Sep;32(9):896-902
pubmed: 25150836
Pharmacogenomics J. 2010 Aug;10(4):278-91
pubmed: 20676067
PLoS Biol. 2015 Jun 09;13(6):e1002165
pubmed: 26057340
Genome Biol. 2023 Oct 26;24(1):245
pubmed: 37884999
Nat Rev Genet. 2010 Oct;11(10):733-9
pubmed: 20838408
Genome Biol. 2014 Dec 03;15(12):523
pubmed: 25633159
Nat Biotechnol. 2017 May 9;35(5):409-412
pubmed: 28486446
Nat Biotechnol. 2006 Sep;24(9):1151-61
pubmed: 16964229