SVJedi: genotyping structural variations with long reads.


Journal

Bioinformatics (Oxford, England)
ISSN: 1367-4811
Titre abrégé: Bioinformatics
Pays: England
ID NLM: 9808944

Informations de publication

Date de publication:
01 11 2020
Historique:
received: 22 11 2019
revised: 27 03 2020
accepted: 18 05 2020
pubmed: 22 5 2020
medline: 20 2 2021
entrez: 22 5 2020
Statut: ppublish

Résumé

Studies on structural variants (SVs) are expanding rapidly. As a result, and thanks to third generation sequencing technologies, the number of discovered SVs is increasing, especially in the human genome. At the same time, for several applications such as clinical diagnoses, it is important to genotype newly sequenced individuals on well-defined and characterized SVs. Whereas several SV genotypers have been developed for short read data, there is a lack of such dedicated tool to assess whether known SVs are present or not in a new long read sequenced sample, such as the one produced by Pacific Biosciences or Oxford Nanopore Technologies. We present a novel method to genotype known SVs from long read sequencing data. The method is based on the generation of a set of representative allele sequences that represent the two alleles of each structural variant. Long reads are aligned to these allele sequences. Alignments are then analyzed and filtered out to keep only informative ones, to quantify and estimate the presence of each SV allele and the allele frequencies. We provide an implementation of the method, SVJedi, to genotype SVs with long reads. The tool has been applied to both simulated and real human datasets and achieves high genotyping accuracy. We show that SVJedi obtains better performances than other existing long read genotyping tools and we also demonstrate that SV genotyping is considerably improved with SVJedi compared to other approaches, namely SV discovery and short read SV genotyping approaches. https://github.com/llecompte/SVJedi.git. Supplementary data are available at Bioinformatics online.

Identifiants

pubmed: 32437523
pii: 5841661
doi: 10.1093/bioinformatics/btaa527
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

4568-4575

Commentaires et corrections

Type : ErratumIn

Informations de copyright

© The Author(s) 2020. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

Auteurs

Lolita Lecompte (L)

Univ Rennes, Inria, CNRS, IRISA, F-35000 Rennes, France.

Pierre Peterlongo (P)

Univ Rennes, Inria, CNRS, IRISA, F-35000 Rennes, France.

Dominique Lavenier (D)

Univ Rennes, Inria, CNRS, IRISA, F-35000 Rennes, France.

Claire Lemaitre (C)

Univ Rennes, Inria, CNRS, IRISA, F-35000 Rennes, France.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C

Classifications MeSH