NPSV: A simulation-driven approach to genotyping structural variants in whole-genome sequencing data.

Genome, Human Genomic Structural Variation Genomics Genotype Humans Software Whole Genome Sequencing

next-generation sequencing structural variants whole-genome sequencing

Journal

GigaScience

ISSN: 2047-217X

Titre abrégé: Gigascience

Pays: United States

ID NLM: 101596872

Informations de publication

Date de publication:
01 07 2021

Historique:

received: 21 12 2020

revised: 04 05 2021

accepted: 07 06 2021

entrez: 1 7 2021

pubmed: 2 7 2021

medline: 15 3 2022

Statut: ppublish

Résumé

Structural variants (SVs) play a causal role in numerous diseases but are difficult to detect and accurately genotype (determine zygosity) in whole-genome next-generation sequencing data. SV genotypers that assume that the aligned sequencing data uniformly reflect the underlying SV or use existing SV call sets as training data can only partially account for variant and sample-specific biases. We introduce NPSV, a machine learning-based approach for genotyping previously discovered SVs that uses next-generation sequencing simulation to model the combined effects of the genomic region, sequencer, and alignment pipeline on the observed SV evidence. We evaluate NPSV alongside existing SV genotypers on multiple benchmark call sets. We show that NPSV consistently achieves or exceeds state-of-the-art genotyping accuracy across SV call sets, samples, and variant types. NPSV can specifically identify putative de novo SVs in a trio context and is robust to offset SV breakpoints. Growing SV databases and the increasing availability of SV calls from long-read sequencing make stand-alone genotyping of previously identified SVs an increasingly important component of genome analyses. By treating potential biases as a "black box" that can be simulated, NPSV provides a framework for accurately genotyping a broad range of SVs in both targeted and genome-scale applications.

Sections du résumé

BACKGROUND

RESULTS

We introduce NPSV, a machine learning-based approach for genotyping previously discovered SVs that uses next-generation sequencing simulation to model the combined effects of the genomic region, sequencer, and alignment pipeline on the observed SV evidence. We evaluate NPSV alongside existing SV genotypers on multiple benchmark call sets. We show that NPSV consistently achieves or exceeds state-of-the-art genotyping accuracy across SV call sets, samples, and variant types. NPSV can specifically identify putative de novo SVs in a trio context and is robust to offset SV breakpoints.

CONCLUSIONS

Growing SV databases and the increasing availability of SV calls from long-read sequencing make stand-alone genotyping of previously identified SVs an increasingly important component of genome analyses. By treating potential biases as a "black box" that can be simulated, NPSV provides a framework for accurately genotyping a broad range of SVs in both targeted and genome-scale applications.

Identifiants

DOI: 10.1093/gigascience/giab046 PMID: 34195837 PMC: PMC8246072

pubmed: 34195837

pii: 6312216

doi: 10.1093/gigascience/giab046

pmc: PMC8246072

pii:

doi:

Types de publication

Journal Article Research Support, N.I.H., Extramural Research Support, U.S. Gov't, Non-P.H.S.

Langues

eng

Sous-ensembles de citation

Subventions

Organisme : NIGMS NIH HHS

ID : P20 GM103449

Pays : United States

Organisme : NHLBI NIH HHS

ID : U01 HL153009

Pays : United States

Organisme : NHLBI NIH HHS

ID : UM1 HL098123

Pays : United States

Informations de copyright

Références

BMC Bioinformatics. 2014 Jun 10;15:180

pubmed: 24915764

Genome Res. 2017 May;27(5):677-685

pubmed: 27895111

Cell. 2019 Jan 24;176(3):663-675.e19

pubmed: 30661756

Genome Res. 2017 Jan;27(1):157-164

pubmed: 27903644

Bioinformatics. 2014 Sep 1;30(17):2503-5

pubmed: 24812344

Nucleic Acids Res. 2013 Jan;41(Database issue):D936-41

pubmed: 23193291

Bioinformatics. 2012 Nov 1;28(21):2711-8

pubmed: 22942022

Nat Methods. 2015 Oct;12(10):966-8

pubmed: 26258291

Nat Rev Genet. 2011 May;12(5):363-76

pubmed: 21358748

Bioinformatics. 2015 Jun 15;31(12):2032-4

pubmed: 25697820

Nat Rev Genet. 2013 Feb;14(2):125-38

pubmed: 23329113

Bioinformatics. 2017 Mar 1;33(5):751-753

pubmed: 28011768

Gigascience. 2018 Jul 1;7(7):

pubmed: 29860504

Bioinformatics. 2018 May 15;34(10):1774-1777

pubmed: 29300834

Bioinformatics. 2012 Sep 15;28(18):i333-i339

pubmed: 22962449

Genome Biol. 2019 Dec 19;20(1):291

pubmed: 31856913

Nat Genet. 2015 Mar;47(3):296-303

pubmed: 25621458

Nat Biotechnol. 2020 Nov;38(11):1347-1355

pubmed: 32541955

Nature. 2015 Oct 1;526(7571):75-81

pubmed: 26432246

Nat Methods. 2018 Jun;15(6):461-468

pubmed: 29713083

Bioinformatics. 2012 Feb 15;28(4):593-4

pubmed: 22199392

Gigascience. 2021 Feb 16;10(2):

pubmed: 33590861

Genome Biol. 2014 Jun 26;15(6):R84

pubmed: 24970577

Gigascience. 2021 Jul 1;10(7):

pubmed: 34195837

Am J Hum Genet. 2016 Apr 7;98(4):667-79

pubmed: 27018473

Nat Commun. 2019 Nov 27;10(1):5402

pubmed: 31776332

Bioinformatics. 2016 Apr 15;32(8):1220-2

pubmed: 26647377

Nature. 2020 May;581(7809):444-451

pubmed: 32461652

Genome Res. 2015 Jun;25(6):792-801

pubmed: 25883321

Nat Rev Genet. 2016 May 17;17(6):333-51

pubmed: 27184599

Genome Biol. 2020 Feb 12;21(1):35

pubmed: 32051000

Methods. 2016 Jun 1;102:36-49

pubmed: 26845461

Genome Biol. 2019 Nov 20;20(1):246

pubmed: 31747936

Gigascience. 2019 Sep 1;8(9):

pubmed: 31494671

BMC Genomics. 2016 Jan 16;17:64

pubmed: 26772178

Front Bioeng Biotechnol. 2015 Jun 25;3:92

pubmed: 26161383

Gigascience. 2019 Apr 1;8(4):

pubmed: 31222198

Bioinformatics. 2015 Aug 15;31(16):2741-4

pubmed: 25861968

Nature. 2011 Feb 3;470(7332):59-65

pubmed: 21293372

Bioinformatics. 2015 Dec 15;31(24):3994-6

pubmed: 26286809

Sci Data. 2016 Jun 07;3:160025

pubmed: 27271295

Bioinformatics. 2009 Aug 15;25(16):2078-9

pubmed: 19505943

Gigascience. 2017 Nov 1;6(11):1-6

pubmed: 29048539

Genome Biol. 2019 Jun 3;20(1):117

pubmed: 31159850

Bioinformatics. 2010 Mar 15;26(6):841-2

pubmed: 20110278

PLoS One. 2014 Nov 25;9(11):e113324

pubmed: 25423315

NPSV: A simulation-driven approach to genotyping structural variants in whole-genome sequencing data.

Journal

Informations de publication

Résumé

Sections du résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Subventions

Informations de copyright

Références

Auteurs

Michael D Linderman (MD)

Crystal Paudyal (C)

Musab Shakeel (M)

William Kelley (W)

Ali Bashir (A)

Bruce D Gelb (BD)

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Smoking Cessation and Incident Cardiovascular Disease.

Evaluation of Low-Value Services Across Major Medicare Advantage Insurers and Traditional Medicare.

Effectiveness of Virtual Yoga for Chronic Low Back Pain: A Randomized Clinical Trial.

Classifications MeSH