PDBrenum: A webserver and program providing Protein Data Bank files renumbered according to their UniProt sequences.
Journal
PloS one
ISSN: 1932-6203
Titre abrégé: PLoS One
Pays: United States
ID NLM: 101285081
Informations de publication
Date de publication:
2021
2021
Historique:
received:
05
05
2021
accepted:
05
06
2021
entrez:
6
7
2021
pubmed:
7
7
2021
medline:
3
11
2021
Statut:
epublish
Résumé
The Protein Data Bank (PDB) was established at Brookhaven National Laboratories in 1971 as an archive for biological macromolecular crystal structures. In mid 2021, the database has almost 180,000 structures solved by X-ray crystallography, nuclear magnetic resonance, cryo-electron microscopy, and other methods. Many proteins have been studied under different conditions, including binding partners such as ligands, nucleic acids, or other proteins; mutations, and post-translational modifications, thus enabling extensive comparative structure-function studies. However, these studies are made more difficult because authors are allowed by the PDB to number the amino acids in each protein sequence in any manner they wish. This results in the same protein being numbered differently in the available PDB entries. For instance, some authors may include N-terminal signal peptides or the N-terminal methionine in the sequence numbering and others may not. In addition to the coordinates, there are many fields that contain structural and functional information regarding specific residues numbered according to the author. Here we provide a webserver and Python3 application that fixes the PDB sequence numbering problem by replacing the author numbering with numbering derived from the corresponding UniProt sequences. We obtain this correspondence from the SIFTS database from PDBe. The server and program can take a list of PDB entries or a list of UniProt identifiers (e.g., "P04637" or "P53_HUMAN") and provide renumbered files in mmCIF format and the legacy PDB format for both asymmetric unit files and biological assembly files provided by PDBe.
Identifiants
pubmed: 34228733
doi: 10.1371/journal.pone.0253411
pii: PONE-D-21-14953
pmc: PMC8259974
doi:
Types de publication
Journal Article
Research Support, N.I.H., Extramural
Langues
eng
Sous-ensembles de citation
IM
Pagination
e0253411Subventions
Organisme : NIGMS NIH HHS
ID : R35 GM122517
Pays : United States
Déclaration de conflit d'intérêts
The authors have declared that no competing interests exist.
Références
Chem Biol Drug Des. 2010 Aug;76(2):100-6
pubmed: 20545947
BMC Bioinformatics. 2008 Sep 23;9:391
pubmed: 18811932
Mol Immunol. 2008 Aug;45(14):3832-9
pubmed: 18614234
Nucleic Acids Res. 2020 Jan 8;48(D1):D335-D343
pubmed: 31691821
Nat Commun. 2020 Feb 5;11(1):711
pubmed: 32024829
J Mol Biol. 2007 Sep 21;372(3):774-97
pubmed: 17681537
EMBO J. 2008 Jul 23;27(14):1985-94
pubmed: 18566589
Methods Enzymol. 1997;277:556-71
pubmed: 9379928
Nucleic Acids Res. 2007 Jan;35(Database issue):D301-3
pubmed: 17142228
Nucleic Acids Res. 2019 Jan 8;47(D1):D482-D489
pubmed: 30445541
Nucleic Acids Res. 2019 Jan 8;47(D1):D94-D99
pubmed: 30365038
Nucleic Acids Res. 2019 Jan 8;47(D1):D464-D474
pubmed: 30357411
Protein Sci. 2018 Jan;27(1):95-102
pubmed: 28815765
Nucleic Acids Res. 2019 Jan 8;47(D1):D506-D515
pubmed: 30395287
Bioinformatics. 2005 Feb 15;21(4):551-3
pubmed: 15454411
Bioinformatics. 2003 Nov 22;19(17):2308-10
pubmed: 14630660
Bioinformatics. 2005 Dec 1;21(23):4297-301
pubmed: 16188924