Best practices for benchmarking germline small-variant calls in human genomes.


Journal

Nature biotechnology
ISSN: 1546-1696
Titre abrégé: Nat Biotechnol
Pays: United States
ID NLM: 9604648

Informations de publication

Date de publication:
05 2019
Historique:
received: 23 05 2018
accepted: 10 01 2019
pubmed: 13 3 2019
medline: 4 7 2019
entrez: 13 3 2019
Statut: ppublish

Résumé

Standardized benchmarking approaches are required to assess the accuracy of variants called from sequence data. Although variant-calling tools and the metrics used to assess their performance continue to improve, important challenges remain. Here, as part of the Global Alliance for Genomics and Health (GA4GH), we present a benchmarking framework for variant calling. We provide guidance on how to match variant calls with different representations, define standard performance metrics, and stratify performance by variant type and genome context. We describe limitations of high-confidence calls and regions that can be used as truth sets (for example, single-nucleotide variant concordance of two methods is 99.7% inside versus 76.5% outside high-confidence regions). Our web-based app enables comparison of variant calls against truth sets to obtain a standardized performance report. Our approach has been piloted in the PrecisionFDA variant-calling challenges to identify the best-in-class variant-calling methods within high-confidence regions. Finally, we recommend a set of best practices for using our tools and evaluating the results.

Identifiants

pubmed: 30858580
doi: 10.1038/s41587-019-0054-x
pii: 10.1038/s41587-019-0054-x
pmc: PMC6699627
mid: NIHMS1533783
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

555-560

Subventions

Organisme : Intramural NIST DOC
ID : 9999-NIST
Pays : United States

Commentaires et corrections

Type : ErratumIn

Références

Nat Biotechnol. 2012 Nov;30(11):1033-6
pubmed: 23138292
Eur J Hum Genet. 2010 Dec;18(12):1276-88
pubmed: 20664632
Nat Biotechnol. 2014 Mar;32(3):246-51
pubmed: 24531798
JAMA. 2014 Nov 12;312(18):1870-9
pubmed: 25326635
J Mol Diagn. 2018 Jan;20(1):4-27
pubmed: 29154853
Genome Res. 2017 Jan;27(1):157-164
pubmed: 27903644
Bioinformatics. 2015 Jul 1;31(13):2202-4
pubmed: 25701572
Bioinformatics. 2017 May 1;33(9):1301-1308
pubmed: 28011786
Sci Rep. 2017 Oct 26;7(1):14106
pubmed: 29074871
MMWR Recomm Rep. 2009 Jun 12;58(RR-6):1-37; quiz CE-1-4
pubmed: 19521335
Nat Methods. 2015 Jul;12(7):623-30
pubmed: 25984700
Sci Data. 2016 Jun 07;3:160025
pubmed: 27271295
Arch Pathol Lab Med. 2015 Apr;139(4):481-93
pubmed: 25152313
Proc Natl Acad Sci U S A. 2012 Jul 24;109(30):11920-7
pubmed: 22797899
Genet Med. 2015 Jun;17(6):444-51
pubmed: 25232854
Nat Methods. 2018 Aug;15(8):595-597
pubmed: 30013044
Genome Res. 2017 May;27(5):849-864
pubmed: 28396521
Nat Commun. 2015 Feb 25;6:6275
pubmed: 25711446
Bioinformatics. 2014 Oct;30(19):2787-95
pubmed: 24894505
Genet Med. 2013 Sep;15(9):733-47
pubmed: 23887774

Auteurs

Peter Krusche (P)

Illumina Cambridge Ltd, Little Chesterford, UK.

Len Trigg (L)

Real Time Genomics, Hamilton, New Zealand.

Paul C Boutros (PC)

Ontario Institute for Cancer Research, Toronto, Ontario, Canada.

Christopher E Mason (CE)

Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA.
The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, USA.
The Feil Family Brain and Mind Research Institute, Weill Cornell Medicine, New York, NY, USA.
The WorldQuant Initiative for Quantitative Prediction, Weill Cornell Medicine, New York, NY, USA.

Francisco M De La Vega (FM)

Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, USA.

Benjamin L Moore (BL)

Illumina Cambridge Ltd, Little Chesterford, UK.

Mar Gonzalez-Porta (M)

Illumina Cambridge Ltd, Little Chesterford, UK.

Michael A Eberle (MA)

Illumina Inc., San Diego, CA, USA.

Zivana Tezak (Z)

Center for Devices and Radiological Health, FDA, Silver Spring, MD, USA.

Samir Lababidi (S)

Office of Health Informatics, Office of the Commissioner, FDA, Silver Spring, MD, USA.

Rebecca Truty (R)

Invitae, San Francisco, CA, USA.

George Asimenos (G)

DNAnexus, San Francisco, CA, USA.

Birgit Funke (B)

Veritas Genetics, Danvers, MA, USA.

Mark Fleharty (M)

Broad Institute, Cambridge, MA, USA.

Brad A Chapman (BA)

Bioinformatics Core, Harvard T.H. Chan School of Public Health, Boston, MA, USA.

Marc Salit (M)

Joint Initiative for Metrology in Biology, Stanford University, Stanford, CA, USA.

Justin M Zook (JM)

Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA. jzook@nist.gov.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C

Classifications MeSH