Assembly and annotation of an Ashkenazi human reference genome.

Genome, Human Humans Molecular Sequence Annotation Reference Values Translocation, Genetic

Journal

Genome biology

ISSN: 1474-760X

Titre abrégé: Genome Biol

Pays: England

ID NLM: 100960660

Informations de publication

Date de publication:
02 06 2020

Historique:

received: 06 04 2020

accepted: 15 05 2020

entrez: 4 6 2020

pubmed: 4 6 2020

medline: 2 4 2021

Statut: epublish

Résumé

Thousands of experiments and studies use the human reference genome as a resource each year. This single reference genome, GRCh38, is a mosaic created from a small number of individuals, representing a very small sample of the human population. There is a need for reference genomes from multiple human populations to avoid potential biases. Here, we describe the assembly and annotation of the genome of an Ashkenazi individual and the creation of a new, population-specific human reference genome. This genome is more contiguous and more complete than GRCh38, the latest version of the human reference genome, and is annotated with highly similar gene content. The Ashkenazi reference genome, Ash1, contains 2,973,118,650 nucleotides as compared to 2,937,639,212 in GRCh38. Annotation identified 20,157 protein-coding genes, of which 19,563 are > 99% identical to their counterparts on GRCh38. Most of the remaining genes have small differences. Forty of the protein-coding genes in GRCh38 are missing from Ash1; however, all of these genes are members of multi-gene families for which Ash1 contains other copies. Eleven genes appear on different chromosomes from their homologs in GRCh38. Alignment of DNA sequences from an unrelated Ashkenazi individual to Ash1 identified ~ 1 million fewer homozygous SNPs than alignment of those same sequences to the more-distant GRCh38 genome, illustrating one of the benefits of population-specific reference genomes. The Ash1 genome is presented as a reference for any genetic studies involving Ashkenazi Jewish individuals.

Sections du résumé

BACKGROUND

RESULTS

Here, we describe the assembly and annotation of the genome of an Ashkenazi individual and the creation of a new, population-specific human reference genome. This genome is more contiguous and more complete than GRCh38, the latest version of the human reference genome, and is annotated with highly similar gene content. The Ashkenazi reference genome, Ash1, contains 2,973,118,650 nucleotides as compared to 2,937,639,212 in GRCh38. Annotation identified 20,157 protein-coding genes, of which 19,563 are > 99% identical to their counterparts on GRCh38. Most of the remaining genes have small differences. Forty of the protein-coding genes in GRCh38 are missing from Ash1; however, all of these genes are members of multi-gene families for which Ash1 contains other copies. Eleven genes appear on different chromosomes from their homologs in GRCh38. Alignment of DNA sequences from an unrelated Ashkenazi individual to Ash1 identified ~ 1 million fewer homozygous SNPs than alignment of those same sequences to the more-distant GRCh38 genome, illustrating one of the benefits of population-specific reference genomes.

CONCLUSIONS

The Ash1 genome is presented as a reference for any genetic studies involving Ashkenazi Jewish individuals.

Identifiants

DOI: 10.1186/s13059-020-02047-7 PMID: 32487205 PMC: PMC7265644

pubmed: 32487205

doi: 10.1186/s13059-020-02047-7

pii: 10.1186/s13059-020-02047-7

pmc: PMC7265644

doi:

Types de publication

Comparative Study Journal Article Research Support, N.I.H., Extramural Research Support, U.S. Gov't, Non-P.H.S.

Langues

eng

Sous-ensembles de citation

Pagination

129

Subventions

Organisme : NHGRI NIH HHS

ID : R01 HG006677

Pays : United States

Organisme : NIGMS NIH HHS

ID : R35 GM130151

Pays : United States

Organisme : NHGRI NIH HHS

ID : R01-HG006677

Pays : United States

Organisme : NIGMS NIH HHS

ID : R35-GM130151

Pays : United States

Références

Genome Res. 2016 May;26(5):588-600

pubmed: 26941250

Nucleic Acids Res. 2006 May 08;34(8):2408-17

pubmed: 16682448

Bioinformatics. 2013 Nov 1;29(21):2669-77

pubmed: 23990416

PLoS One. 2015 Jul 06;10(7):e0132180

pubmed: 26147798

Nature. 2016 Oct 13;538(7624):243-247

pubmed: 27706134

Genome Biol. 2018 Nov 28;19(1):208

pubmed: 30486838

Nucleic Acids Res. 2019 Jul 2;47(W1):W636-W641

pubmed: 30976793

Bioinformatics. 2018 Sep 15;34(18):3094-3100

pubmed: 29750242

Bioinformatics. 2010 Mar 15;26(6):841-2

pubmed: 20110278

Nat Methods. 2018 Aug;15(8):595-597

pubmed: 30013044

Nature. 2020 May;581(7809):434-443

pubmed: 32461654

Nat Biotechnol. 2014 Mar;32(3):246-51

pubmed: 24531798

BMC Genomics. 2015 Apr 24;16:340

pubmed: 25903059

Genet Med. 2018 Mar;20(3):360-364

pubmed: 29155419

Trends Genet. 2009 Nov;25(11):489-94

pubmed: 19836853

Genes Genet Syst. 2018 Jan 20;92(3):135-152

pubmed: 29162774

Mol Syst Biol. 2005;1:2005.0030

pubmed: 16729065

Nat Commun. 2018 Aug 2;9(1):3040

pubmed: 30072691

Nat Methods. 2012 Mar 04;9(4):357-9

pubmed: 22388286

Genome Med. 2014 Feb 28;6(2):10

pubmed: 24713084

Nat Biotechnol. 2019 May;37(5):555-560

pubmed: 30858580

Science. 2010 May 7;328(5979):710-722

pubmed: 20448178

PLoS Comput Biol. 2018 Jan 26;14(1):e1005944

pubmed: 29373581

Nat Biotechnol. 2019 May;37(5):561-566

pubmed: 30936564

Genome Biol. 2019 Aug 9;20(1):159

pubmed: 31399121

Nature. 2016 Oct 12;538(7624):161-164

pubmed: 27734877

Nature. 2001 Feb 15;409(6822):860-921

pubmed: 11237011

BMC Bioinformatics. 2012 Sep 19;13:238

pubmed: 22988817

Sci Data. 2016 Jun 07;3:160025

pubmed: 27271295

Assembly and annotation of an Ashkenazi human reference genome.

Journal

Informations de publication

Résumé

Sections du résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Pagination

Subventions

Références

Auteurs

Alaina Shumate (A)

Aleksey V Zimin (AV)

Rachel M Sherman (RM)

Daniela Puiu (D)

Justin M Wagner (JM)

Nathan D Olson (ND)

Mihaela Pertea (M)

Marc L Salit (ML)

Justin M Zook (JM)

Steven L Salzberg (SL)

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Smoking Cessation and Incident Cardiovascular Disease.

Evaluation of Low-Value Services Across Major Medicare Advantage Insurers and Traditional Medicare.

Effectiveness of Virtual Yoga for Chronic Low Back Pain: A Randomized Clinical Trial.

Classifications MeSH