A community effort to identify and correct mislabeled samples in proteogenomic studies.

DSML 3: Development/Pre-production: Data science output has been rolled out/validated across multiple domains/problems

Journal

Patterns (New York, N.Y.)
ISSN: 2666-3899
Titre abrégé: Patterns (N Y)
Pays: United States
ID NLM: 101767765

Informations de publication

Date de publication:
14 May 2021
Historique:
received: 23 12 2020
revised: 27 01 2021
accepted: 31 03 2021
entrez: 26 5 2021
pubmed: 27 5 2021
medline: 27 5 2021
Statut: epublish

Résumé

Sample mislabeling or misannotation has been a long-standing problem in scientific research, particularly prevalent in large-scale, multi-omic studies due to the complexity of multi-omic workflows. There exists an urgent need for implementing quality controls to automatically screen for and correct sample mislabels or misannotations in multi-omic studies. Here, we describe a crowdsourced precisionFDA NCI-CPTAC Multi-omics Enabled Sample Mislabeling Correction Challenge, which provides a framework for systematic benchmarking and evaluation of mislabel identification and correction methods for integrative proteogenomic studies. The challenge received a large number of submissions from domestic and international data scientists, with highly variable performance observed across the submitted methods. Post-challenge collaboration between the top-performing teams and the challenge organizers has created an open-source software, COSMO, with demonstrated high accuracy and robustness in mislabeling identification and correction in simulated and real multi-omic datasets.

Identifiants

pubmed: 34036290
doi: 10.1016/j.patter.2021.100245
pii: S2666-3899(21)00065-9
pmc: PMC8134945
doi:

Types de publication

Journal Article

Langues

eng

Pagination

100245

Informations de copyright

© 2021 The Author(s).

Déclaration de conflit d'intérêts

S.Y. and J.Z. are employees of Sema4, a for-profit organization that promotes a healthcare through information-driven insights. R.P., H.F., and H.C. are employees of Sentieon Inc. A.C. is an employee of Bionamic AB. The other authors declare no competing interests.

Références

Nature. 2016 Jun 15;534(7608):500-5
pubmed: 27309819
Arch Pathol Lab Med. 2006 Aug;130(8):1106-13
pubmed: 16879009
Am J Clin Pathol. 2003 Jul;120(1):18-26
pubmed: 12866368
Cell. 2018 Apr 5;173(2):305-320.e10
pubmed: 29625049
Clin Leadersh Manag Rev. 2001 Nov-Dec;15(6):401-5
pubmed: 11822269
Nature. 2015 Feb 19;518(7539):317-30
pubmed: 25693563
BMC Bioinformatics. 2017 Mar 21;18(1):183
pubmed: 28327092
Cell. 2020 Jan 9;180(1):207
pubmed: 31923397
Arch Toxicol. 2015 Dec;89(12):2265-72
pubmed: 26608184
Cell. 2019 May 2;177(4):1035-1049.e19
pubmed: 31031003
Oncol Rep. 2015 Nov;34(5):2385-94
pubmed: 26329876
Nucleic Acids Res. 2017 Jun 20;45(11):e103
pubmed: 28369524
Cell. 2020 Jul 9;182(1):200-225.e35
pubmed: 32649874
PLoS One. 2013;8(2):e57312
pubmed: 23468968
Nature. 2019 May;569(7757):503-508
pubmed: 31068700
Science. 2015 Feb 6;347(6222):664-7
pubmed: 25657249
Gigascience. 2019 Jul 1;8(7):
pubmed: 31289834
Biostatistics. 2007 Jan;8(1):118-27
pubmed: 16632515
PLoS Comput Biol. 2014 Aug 14;10(8):e1003790
pubmed: 25122495
Nat Commun. 2020 Jul 29;11(1):3697
pubmed: 32728101
Nature. 2012 Oct 4;490(7418):61-70
pubmed: 23000897
F1000Res. 2016 Aug 30;5:2103
pubmed: 27746907
Genome Biol. 2010;11(3):R25
pubmed: 20196867
Nat Rev Genet. 2019 Nov;20(11):631-656
pubmed: 31341269
Cell Syst. 2015 Dec 23;1(6):417-425
pubmed: 26771021
Nat Med. 2018 Sep;24(9):1301-1302
pubmed: 30194412
Proc Natl Acad Sci U S A. 1996 Jul 9;93(14):7085-90
pubmed: 8692949
Cell. 2020 Feb 20;180(4):729-748.e26
pubmed: 32059776
Nat Biotechnol. 2017 Apr 11;35(4):316-319
pubmed: 28398311
Nat Methods. 2010 Sep;7(9):681-5
pubmed: 20805795
Bioinformatics. 2013 Jun 1;29(11):1463-4
pubmed: 23559639
Nature. 2014 Sep 18;513(7518):382-7
pubmed: 25043054

Auteurs

Seungyeul Yoo (S)

Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.
Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.
Sema4, a Mount Sinai Venture, Stamford, CT 06902, USA.

Zhiao Shi (Z)

Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX 77030, USA.
Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA.

Bo Wen (B)

Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX 77030, USA.
Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA.

SoonJye Kho (S)

Wright State University, Dayton, OH 45453, USA.

Renke Pan (R)

Sentieon Inc., San Jose, CA 95134, USA.

Hanying Feng (H)

Sentieon Inc., San Jose, CA 95134, USA.

Hong Chen (H)

Sentieon Inc., San Jose, CA 95134, USA.

Anders Carlsson (A)

Computational Biology & Biological Physics, Lund University, Lund 221-00, Sweden.
Bionamic AB, Lund 221-00, Sweden.

Patrik Edén (P)

Computational Biology & Biological Physics, Lund University, Lund 221-00, Sweden.

Weiping Ma (W)

Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.
Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.

Michael Raymer (M)

Wright State University, Dayton, OH 45453, USA.

Ezekiel J Maier (EJ)

Booz Allen Hamilton, McLean, VA 22102, USA.

Zivana Tezak (Z)

Office of In Vitro Diagnostics and Radiological Health, Center for Devices and Radiological Health, US Food and Drug Administration, Silver Spring, MD 20993, USA.

Elaine Johanson (E)

Office of Health Informatics, Office of the Chief Scientist, Office of the Commissioner, US Food and Drug Administration, Silver Spring, MD 20993, USA.

Denise Hinton (D)

Office of the Chief Scientist, Office of the Commissioner, US Food and Drug Administration, Silver Spring, MD 20993, USA.

Henry Rodriguez (H)

Office of Cancer Clinical Proteomics Research, Center for Strategic Scientific Initiatives, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA.

Jun Zhu (J)

Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.
Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.
Sema4, a Mount Sinai Venture, Stamford, CT 06902, USA.

Emily Boja (E)

Office of Cancer Clinical Proteomics Research, Center for Strategic Scientific Initiatives, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA.

Pei Wang (P)

Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.
Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.

Bing Zhang (B)

Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX 77030, USA.
Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA.

Classifications MeSH