Calling structural variants with confidence from short-read data in wild bird populations.

curation strategies high-confidence variants putative false positives rapid manual curation short-reads structural variation

Journal

Genome biology and evolution
ISSN: 1759-6653
Titre abrégé: Genome Biol Evol
Pays: England
ID NLM: 101509707

Informations de publication

Date de publication:
15 Mar 2024
Historique:
received: 29 11 2022
revised: 28 02 2024
accepted: 07 03 2024
medline: 15 3 2024
pubmed: 15 3 2024
entrez: 15 3 2024
Statut: aheadofprint

Résumé

Comprehensive characterisation of structural variation in natural populations has only become feasible in the last decade. To investigate the population genomic nature of structural variation (SV), reproducible and high-confidence SV callsets are first required. We created a population-scale reference of the genome-wide landscape of structural variation across 33 Nordic house sparrows (Passer domesticus) individuals. To produce a consensus callset across all samples using short-read data, we compare heuristic-based quality-filtering and visual curation (Samplot/PlotCritic and Samplot-ML) approaches. We demonstrate that curation of SVs is important for reducing putative false positives and that the time invested in this step outweighs the potential costs of analysing short-read discovered SV datasets that include many potential false positives. We find that even a lenient manual curation strategy (e.g. applied by a single curator) can reduce the proportion of putative false positives by up to 80%, thus enriching the proportion of high-confidence variants. Crucially, in applying a lenient manual curation strategy with a single curator, nearly all (>99%) variants rejected as putative false positives were also classified as such by a more stringent curation strategy using three additional curators. Furthermore, variants rejected by manual curation failed to reflect expected population structure from SNPs, whereas variants passing curation did. Combining heuristic-based quality-filtering with rapid manual curation of structural variants in short-read data can therefore become a time- and cost-effective first step for functional and population genomic studies requiring high-confidence SV callsets.

Identifiants

pubmed: 38489588
pii: 7630036
doi: 10.1093/gbe/evae049
pii:
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Informations de copyright

© The Author(s) 2024. Published by Oxford University Press on behalf of Society for Molecular Biology and Evolution.

Auteurs

Gabriel David (G)

Department of Ecology and Genetics, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden.

Alicia Bertolotti (A)

School of Biological Sciences, University of Aberdeen, Tillydrone Avenue, Aberdeen, UK.

Ryan Layer (R)

BioFrontiers Institute, University of Colorado, Boulder, CO, USA, Department of Computer Science, University of Colorado, Boulder, CO, USA.

Douglas Scofield (D)

Department of Ecology and Genetics, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden.

Alexander Hayward (A)

Centre for Ecology and Conservation, University of Exeter, Penryn Campus, Penryn, Cornwall, UK.

Tobias Baril (T)

Centre for Ecology and Conservation, University of Exeter, Penryn Campus, Penryn, Cornwall, UK.

Hamish A Burnett (HA)

Centre for Biodiversity Dynamics, Department of Biology, Norwegian University of Science and Technology, Trondheim, Norway.

Erik Gudmunds (E)

Department of Ecology and Genetics, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden.

Henrik Jensen (H)

Centre for Biodiversity Dynamics, Department of Biology, Norwegian University of Science and Technology, Trondheim, Norway.

Arild Husby (A)

Department of Ecology and Genetics, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden.

Classifications MeSH