A community effort to identify and correct mislabeled samples in proteogenomic studies.
DSML 3: Development/Pre-production: Data science output has been rolled out/validated across multiple domains/problems
Journal
Patterns (New York, N.Y.)
ISSN: 2666-3899
Titre abrégé: Patterns (N Y)
Pays: United States
ID NLM: 101767765
Informations de publication
Date de publication:
14 May 2021
14 May 2021
Historique:
received:
23
12
2020
revised:
27
01
2021
accepted:
31
03
2021
entrez:
26
5
2021
pubmed:
27
5
2021
medline:
27
5
2021
Statut:
epublish
Résumé
Sample mislabeling or misannotation has been a long-standing problem in scientific research, particularly prevalent in large-scale, multi-omic studies due to the complexity of multi-omic workflows. There exists an urgent need for implementing quality controls to automatically screen for and correct sample mislabels or misannotations in multi-omic studies. Here, we describe a crowdsourced precisionFDA NCI-CPTAC Multi-omics Enabled Sample Mislabeling Correction Challenge, which provides a framework for systematic benchmarking and evaluation of mislabel identification and correction methods for integrative proteogenomic studies. The challenge received a large number of submissions from domestic and international data scientists, with highly variable performance observed across the submitted methods. Post-challenge collaboration between the top-performing teams and the challenge organizers has created an open-source software, COSMO, with demonstrated high accuracy and robustness in mislabeling identification and correction in simulated and real multi-omic datasets.
Identifiants
pubmed: 34036290
doi: 10.1016/j.patter.2021.100245
pii: S2666-3899(21)00065-9
pmc: PMC8134945
doi:
Types de publication
Journal Article
Langues
eng
Pagination
100245Informations de copyright
© 2021 The Author(s).
Déclaration de conflit d'intérêts
S.Y. and J.Z. are employees of Sema4, a for-profit organization that promotes a healthcare through information-driven insights. R.P., H.F., and H.C. are employees of Sentieon Inc. A.C. is an employee of Bionamic AB. The other authors declare no competing interests.
Références
Nature. 2016 Jun 15;534(7608):500-5
pubmed: 27309819
Arch Pathol Lab Med. 2006 Aug;130(8):1106-13
pubmed: 16879009
Am J Clin Pathol. 2003 Jul;120(1):18-26
pubmed: 12866368
Cell. 2018 Apr 5;173(2):305-320.e10
pubmed: 29625049
Clin Leadersh Manag Rev. 2001 Nov-Dec;15(6):401-5
pubmed: 11822269
Nature. 2015 Feb 19;518(7539):317-30
pubmed: 25693563
BMC Bioinformatics. 2017 Mar 21;18(1):183
pubmed: 28327092
Cell. 2020 Jan 9;180(1):207
pubmed: 31923397
Arch Toxicol. 2015 Dec;89(12):2265-72
pubmed: 26608184
Cell. 2019 May 2;177(4):1035-1049.e19
pubmed: 31031003
Oncol Rep. 2015 Nov;34(5):2385-94
pubmed: 26329876
Nucleic Acids Res. 2017 Jun 20;45(11):e103
pubmed: 28369524
Cell. 2020 Jul 9;182(1):200-225.e35
pubmed: 32649874
PLoS One. 2013;8(2):e57312
pubmed: 23468968
Nature. 2019 May;569(7757):503-508
pubmed: 31068700
Science. 2015 Feb 6;347(6222):664-7
pubmed: 25657249
Gigascience. 2019 Jul 1;8(7):
pubmed: 31289834
Biostatistics. 2007 Jan;8(1):118-27
pubmed: 16632515
PLoS Comput Biol. 2014 Aug 14;10(8):e1003790
pubmed: 25122495
Nat Commun. 2020 Jul 29;11(1):3697
pubmed: 32728101
Nature. 2012 Oct 4;490(7418):61-70
pubmed: 23000897
F1000Res. 2016 Aug 30;5:2103
pubmed: 27746907
Genome Biol. 2010;11(3):R25
pubmed: 20196867
Nat Rev Genet. 2019 Nov;20(11):631-656
pubmed: 31341269
Cell Syst. 2015 Dec 23;1(6):417-425
pubmed: 26771021
Nat Med. 2018 Sep;24(9):1301-1302
pubmed: 30194412
Proc Natl Acad Sci U S A. 1996 Jul 9;93(14):7085-90
pubmed: 8692949
Cell. 2020 Feb 20;180(4):729-748.e26
pubmed: 32059776
Nat Biotechnol. 2017 Apr 11;35(4):316-319
pubmed: 28398311
Nat Methods. 2010 Sep;7(9):681-5
pubmed: 20805795
Bioinformatics. 2013 Jun 1;29(11):1463-4
pubmed: 23559639
Nature. 2014 Sep 18;513(7518):382-7
pubmed: 25043054