Prot2HG: a database of protein domains mapped to the human genome.
Journal
Database : the journal of biological databases and curation
ISSN: 1758-0463
Titre abrégé: Database (Oxford)
Pays: England
ID NLM: 101517697
Informations de publication
Date de publication:
01 01 2020
01 01 2020
Historique:
received:
26
08
2019
revised:
19
11
2019
accepted:
31
12
2019
entrez:
16
4
2020
pubmed:
16
4
2020
medline:
2
2
2021
Statut:
ppublish
Résumé
Genetic variation occurring within conserved functional protein domains warrants special attention when examining DNA variation in the context of disease causation. Here we introduce a resource, freely available at www.prot2hg.com, that addresses the question of whether a particular variant falls onto an annotated protein domain and directly translates chromosomal coordinates onto protein residues. The tool can perform a multiple-site query in a simple way, and the whole dataset is available for download as well as incorporated into our own accessible pipeline. To create this resource, National Center for Biotechnology Information protein data were retrieved using the Entrez Programming Utilities. After processing all human protein domains, residue positions were reverse translated and mapped to the reference genome hg19 and stored in a MySQL database. In total, 760 487 protein domains from 42 371 protein models were mapped to hg19 coordinates and made publicly available for search or download (www.prot2hg.com). In addition, this annotation was implemented into the genomics research platform GENESIS in order to query nearly 8000 exomes and genomes of families with rare Mendelian disorders (tgp-foundation.org). When applied to patient genetic data, we found that rare (<1%) variants in the Genome Aggregation Database were significantly more annotated onto a protein domain in comparison to common (>1%) variants. Similarly, variants described as pathogenic or likely pathogenic in ClinVar were more likely to be annotated onto a domain. In addition, we tested a dataset consisting of 60 causal variants in a cohort of patients with epileptic encephalopathy and found that 71% of them (43 variants) were propagated onto protein domains. In summary, we developed a resource that annotates variants in the coding part of the genome onto conserved protein domains in order to increase variant prioritization efficiency. Database URL: www.prot2hg.com.
Identifiants
pubmed: 32293014
pii: 5820062
doi: 10.1093/database/baz161
pmc: PMC7157182
pii:
doi:
Substances chimiques
Proteins
0
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Subventions
Organisme : NINDS NIH HHS
ID : R01 NS105755
Pays : United States
Organisme : NINDS NIH HHS
ID : U54 NS065712
Pays : United States
Informations de copyright
© The Author(s) 2020. Published by Oxford University Press.
Références
Bioinformatics. 2016 Dec 15;32(24):3833-3835
pubmed: 27551105
EMBO Mol Med. 2019 May;11(5):
pubmed: 30979709
Genome Biol. 2016 Jun 06;17(1):122
pubmed: 27268795
Genome Res. 2005 Aug;15(8):1034-50
pubmed: 16024819
Annu Rev Biochem. 1995;64:287-314
pubmed: 7574483
Nature. 2016 Aug 17;536(7616):285-91
pubmed: 27535533
Nucleic Acids Res. 2012 Jul;40(Web Server issue):W452-7
pubmed: 22689647
Genet Med. 2015 May;17(5):405-24
pubmed: 25741868
Nucleic Acids Res. 2019 Jan 8;47(D1):D1018-D1027
pubmed: 30476213
Nucleic Acids Res. 2019 Jan 8;47(D1):D886-D894
pubmed: 30371827
Nucleic Acids Res. 2016 Jan 4;44(D1):D733-45
pubmed: 26553804
Nucleic Acids Res. 2018 Jan 4;46(D1):D1062-D1067
pubmed: 29165669
Front Neurol. 2018 Nov 26;9:958
pubmed: 30534106
Eur J Biochem. 1997 Jul 15;247(2):733-9
pubmed: 9266720
Nat Genet. 2014 Mar;46(3):310-5
pubmed: 24487276
Nat Methods. 2010 Apr;7(4):248-9
pubmed: 20354512
Genome Med. 2017 Mar 21;9(1):26
pubmed: 28327206
Orphanet J Rare Dis. 2018 May 2;13(1):71
pubmed: 29720203