DiscoverY: a classifier for identifying Y chromosome sequences in male assemblies.
Genome assembly
Male genome
Y chromosome
Journal
BMC genomics
ISSN: 1471-2164
Titre abrégé: BMC Genomics
Pays: England
ID NLM: 100965258
Informations de publication
Date de publication:
09 Aug 2019
09 Aug 2019
Historique:
received:
28
03
2019
accepted:
26
07
2019
entrez:
11
8
2019
pubmed:
11
8
2019
medline:
4
1
2020
Statut:
epublish
Résumé
Although the Y chromosome plays an important role in male sex determination and fertility, it is currently understudied due to its haploid and repetitive nature. Methods to isolate Y-specific contigs from a whole-genome assembly broadly fall into two categories. The first involves retrieving Y-contigs using proportion sharing with a female, but such a strategy is prone to false positives in the absence of a high-quality, complete female reference. A second strategy uses the ratio of depth of coverage from male and female reads to select Y-contigs, but such a method requires high-depth sequencing of a female and cannot utilize existing female references. We develop a k-mer based method called DiscoverY, which combines proportion sharing with female with depth of coverage from male reads to classify contigs as Y-chromosomal. We evaluate the performance of DiscoverY on human and gorilla genomes, across different sequencing platforms including Illumina, 10X, and PacBio. In the cases where the male and female data are of high quality, DiscoverY has a high precision and recall and outperforms existing methods. For cases when a high quality female reference is not available, we quantify the effect of using draft reference or even just raw sequencing reads from a female. DiscoverY is an effective method to isolate Y-specific contigs from a whole-genome assembly. However, regions homologous to the X chromosome remain difficult to detect.
Sections du résumé
BACKGROUND
BACKGROUND
Although the Y chromosome plays an important role in male sex determination and fertility, it is currently understudied due to its haploid and repetitive nature. Methods to isolate Y-specific contigs from a whole-genome assembly broadly fall into two categories. The first involves retrieving Y-contigs using proportion sharing with a female, but such a strategy is prone to false positives in the absence of a high-quality, complete female reference. A second strategy uses the ratio of depth of coverage from male and female reads to select Y-contigs, but such a method requires high-depth sequencing of a female and cannot utilize existing female references.
RESULTS
RESULTS
We develop a k-mer based method called DiscoverY, which combines proportion sharing with female with depth of coverage from male reads to classify contigs as Y-chromosomal. We evaluate the performance of DiscoverY on human and gorilla genomes, across different sequencing platforms including Illumina, 10X, and PacBio. In the cases where the male and female data are of high quality, DiscoverY has a high precision and recall and outperforms existing methods. For cases when a high quality female reference is not available, we quantify the effect of using draft reference or even just raw sequencing reads from a female.
CONCLUSION
CONCLUSIONS
DiscoverY is an effective method to isolate Y-specific contigs from a whole-genome assembly. However, regions homologous to the X chromosome remain difficult to detect.
Identifiants
pubmed: 31399045
doi: 10.1186/s12864-019-5996-3
pii: 10.1186/s12864-019-5996-3
pmc: PMC6688218
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
641Subventions
Organisme : NIGMS NIH HHS
ID : R01 GM130691
Pays : United States
Organisme : Directorate for Computer and Information Science and Engineering
ID : CCF-1439057, IIS-1453527, and IIS-1421908
Organisme : Directorate for Biological Sciences
ID : DBI-1356529
Organisme : NIH HHS
ID : R01GM130691
Pays : United States
Références
Nature. 2003 Jun 19;423(6942):825-37
pubmed: 12815422
J Mol Evol. 2009 Feb;68(2):134-44
pubmed: 19142680
Nature. 2010 Jan 28;463(7280):536-9
pubmed: 20072128
Nature. 2012 Feb 22;483(7387):82-6
pubmed: 22367542
Nature. 1990 Nov 29;348(6300):448-50
pubmed: 2247149
Bioinformatics. 2013 Mar 1;29(5):652-3
pubmed: 23325618
BMC Genomics. 2013 Apr 23;14:273
pubmed: 23617698
Genome Res. 2013 Nov;23(11):1894-907
pubmed: 23921660
PLoS Biol. 2013;11(8):e1001643
pubmed: 24015111
Cell. 2014 Nov 6;159(4):800-13
pubmed: 25417157
Nat Commun. 2015 Jun 04;6:7330
pubmed: 26040272
BMC Genomics. 2016 Feb 29;17:157
pubmed: 26925773
Genome Res. 2016 Apr;26(4):530-40
pubmed: 26934921
Science. 2016 Apr 1;352(6281):aae0344
pubmed: 27034376
Trends Genet. 2017 Apr;33(4):266-282
pubmed: 28236503
Genome Res. 2017 May;27(5):757-767
pubmed: 28381613
Bioinformatics. 2018 Apr 1;34(7):1125-1131
pubmed: 29194476
Bioinformatics. 2018 Sep 15;34(18):3094-3100
pubmed: 29750242
Nat Commun. 2019 Jan 2;10(1):4
pubmed: 30602775
Semin Cell Dev Biol. 1998 Aug;9(4):423-32
pubmed: 9813189