ComHapDet: a spatial community detection algorithm for haplotype assembly.

Algorithms Diploidy Haplotypes High-Throughput Nucleotide Sequencing Humans Polyploidy Sequence Analysis, DNA

Graph clustering Haplotype assembly Spatial random graph

Journal

BMC genomics

ISSN: 1471-2164

Titre abrégé: BMC Genomics

Pays: England

ID NLM: 100965258

Informations de publication

Date de publication:
09 Sep 2020

Historique:

entrez: 9 9 2020

pubmed: 10 9 2020

medline: 15 5 2021

Statut: epublish

Résumé

Haplotypes, the ordered lists of single nucleotide variations that distinguish chromosomal sequences from their homologous pairs, may reveal an individual's susceptibility to hereditary and complex diseases and affect how our bodies respond to therapeutic drugs. Reconstructing haplotypes of an individual from short sequencing reads is an NP-hard problem that becomes even more challenging in the case of polyploids. While increasing lengths of sequencing reads and insert sizes helps improve accuracy of reconstruction, it also exacerbates computational complexity of the haplotype assembly task. This has motivated the pursuit of algorithmic frameworks capable of accurate yet efficient assembly of haplotypes from high-throughput sequencing data. We propose a novel graphical representation of sequencing reads and pose the haplotype assembly problem as an instance of community detection on a spatial random graph. To this end, we construct a graph where each read is a node with an unknown community label associating the read with the haplotype it samples. Haplotype reconstruction can then be thought of as a two-step procedure: first, one recovers the community labels on the nodes (i.e., the reads), and then uses the estimated labels to assemble the haplotypes. Based on this observation, we propose ComHapDet - a novel assembly algorithm for diploid and ployploid haplotypes which allows both bialleleic and multi-allelic variants. Performance of the proposed algorithm is benchmarked on simulated as well as experimental data obtained by sequencing Chromosome 5 of tetraploid biallelic Solanum-Tuberosum (Potato). The results demonstrate the efficacy of the proposed method and that it compares favorably with the existing techniques.

Sections du résumé

BACKGROUND BACKGROUND

RESULTS RESULTS

We propose a novel graphical representation of sequencing reads and pose the haplotype assembly problem as an instance of community detection on a spatial random graph. To this end, we construct a graph where each read is a node with an unknown community label associating the read with the haplotype it samples. Haplotype reconstruction can then be thought of as a two-step procedure: first, one recovers the community labels on the nodes (i.e., the reads), and then uses the estimated labels to assemble the haplotypes. Based on this observation, we propose ComHapDet - a novel assembly algorithm for diploid and ployploid haplotypes which allows both bialleleic and multi-allelic variants.

CONCLUSIONS CONCLUSIONS

Performance of the proposed algorithm is benchmarked on simulated as well as experimental data obtained by sequencing Chromosome 5 of tetraploid biallelic Solanum-Tuberosum (Potato). The results demonstrate the efficacy of the proposed method and that it compares favorably with the existing techniques.

Identifiants

DOI: 10.1186/s12864-020-06935-x PMID: 32900369 PMC: PMC7488034

pubmed: 32900369

doi: 10.1186/s12864-020-06935-x

pii: 10.1186/s12864-020-06935-x

pmc: PMC7488034

doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

Pagination

586

Références

BMC Genomics. 2018 Mar 21;19(Suppl 4):191

pubmed: 29589554

IEEE/ACM Trans Comput Biol Bioinform. 2016 May-Jun;13(3):518-30

pubmed: 27295635

J Comput Biol. 2016 Sep;23(9):718-36

pubmed: 27280382

J Comput Biol. 2012 Jun;19(6):577-90

pubmed: 22697235

Bioinformatics. 2008 Aug 15;24(16):i153-9

pubmed: 18689818

Bioinformatics. 2018 Nov 15;34(22):3864-3872

pubmed: 29868858

PLoS Comput Biol. 2014 Mar 27;10(3):e1003502

pubmed: 24675685

Genome Res. 2007 Jul;17(7):1101-10

pubmed: 17567986

Bioinformatics. 2010 Sep 15;26(18):2217-25

pubmed: 20624781

Genet Epidemiol. 2004 Dec;27(4):321-33

pubmed: 15368617

Res Comput Mol Biol. 2017 May;10229:117-133

pubmed: 28808695

PLoS Biol. 2007 Sep 4;5(10):e254

pubmed: 17803354

Genome Res. 2008 Aug;18(8):1336-46

pubmed: 18676820

Nature. 2002 Oct 24;419(6909):832-7

pubmed: 12397357

Bioinformatics. 2016 Dec 15;32(24):3735-3744

pubmed: 27531103

Bioinformatics. 2016 Jun 1;32(11):1610-7

pubmed: 26315913

BMC Genomics. 2015 Apr 03;16:260

pubmed: 25885901

Brief Bioinform. 2002 Mar;3(1):23-31

pubmed: 12002221

Bioinformatics. 2010 Jun 15;26(12):i183-90

pubmed: 20529904

Brief Bioinform. 2018 May 1;19(3):387-403

pubmed: 28065918

Nature. 2011 Jul 10;475(7355):189-95

pubmed: 21743474

Bioinformatics. 2014 Sep 1;30(17):i379-85

pubmed: 25161223

Bioinformatics. 2013 Aug 15;29(16):1938-45

pubmed: 23782612

ComHapDet: a spatial community detection algorithm for haplotype assembly.

Journal

Informations de publication

Résumé

Sections du résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Pagination

Références

Auteurs

Abishek Sankararaman (A)

Haris Vikalo (H)

François Baccelli (F)

Articles similaires

Comprehensive comparative analysis and development of molecular markers for Lasianthus species based on complete chloroplast genome sequences.

[Redispensing of expensive oral anticancer medicines: a practical application].

Smoking Cessation and Incident Cardiovascular Disease.

Evaluation of Low-Value Services Across Major Medicare Advantage Insurers and Traditional Medicare.

Classifications MeSH