Deep-time phylogenetic inference by paleoproteomic analysis of dental enamel.


Journal

Nature protocols
ISSN: 1750-2799
Titre abrégé: Nat Protoc
Pays: England
ID NLM: 101284307

Informations de publication

Date de publication:
26 Apr 2024
Historique:
received: 14 03 2023
accepted: 12 01 2024
medline: 27 4 2024
pubmed: 27 4 2024
entrez: 26 4 2024
Statut: aheadofprint

Résumé

In temperate and subtropical regions, ancient proteins are reported to survive up to about 2 million years, far beyond the known limits of ancient DNA preservation in the same areas. Accordingly, their amino acid sequences currently represent the only source of genetic information available to pursue phylogenetic inference involving species that went extinct too long ago to be amenable for ancient DNA analysis. Here we present a complete workflow, including sample preparation, mass spectrometric data acquisition and computational analysis, to recover and interpret million-year-old dental enamel protein sequences. During sample preparation, the proteolytic digestion step, usually an integral part of conventional bottom-up proteomics, is omitted to increase the recovery of the randomly degraded peptides spontaneously generated by extensive diagenetic hydrolysis of ancient proteins over geological time. Similarly, we describe other solutions we have adopted to (1) authenticate the endogenous origin of the protein traces we identify, (2) detect and validate amino acid variation in the ancient protein sequences and (3) attempt phylogenetic inference. Sample preparation and data acquisition can be completed in 3-4 working days, while subsequent data analysis usually takes 2-5 days. The workflow described requires basic expertise in ancient biomolecules analysis, mass spectrometry-based proteomics and molecular phylogeny. Finally, we describe the limits of this approach and its potential for the reconstruction of evolutionary relationships in paleontology and paleoanthropology.

Identifiants

pubmed: 38671208
doi: 10.1038/s41596-024-00975-3
pii: 10.1038/s41596-024-00975-3
doi:

Types de publication

Journal Article Review

Langues

eng

Sous-ensembles de citation

IM

Subventions

Organisme : Villum Fonden (Villum Foundation)
ID : 17649
Organisme : Villum Fonden (Villum Foundation)
ID : 17649
Organisme : EC | Horizon 2020 Framework Programme (EU Framework Programme for Research and Innovation H2020)
ID : 722606
Organisme : EC | Horizon 2020 Framework Programme (EU Framework Programme for Research and Innovation H2020)
ID : 861389
Organisme : EC | Horizon 2020 Framework Programme (EU Framework Programme for Research and Innovation H2020)
ID : 101021361
Organisme : EC | Horizon 2020 Framework Programme (EU Framework Programme for Research and Innovation H2020)
ID : 101021361
Organisme : EC | Horizon 2020 Framework Programme (EU Framework Programme for Research and Innovation H2020)
ID : 722606
Organisme : EC | Horizon 2020 Framework Programme (EU Framework Programme for Research and Innovation H2020)
ID : 861389
Organisme : EC | Horizon 2020 Framework Programme (EU Framework Programme for Research and Innovation H2020)
ID : 861389
Organisme : EC | Horizon 2020 Framework Programme (EU Framework Programme for Research and Innovation H2020)
ID : 861389
Organisme : EC | Horizon 2020 Framework Programme (EU Framework Programme for Research and Innovation H2020)
ID : 861389
Organisme : EC | Horizon 2020 Framework Programme (EU Framework Programme for Research and Innovation H2020)
ID : 948365
Organisme : EC | Horizon 2020 Framework Programme (EU Framework Programme for Research and Innovation H2020)
ID : 101021361
Organisme : EC | Horizon 2020 Framework Programme (EU Framework Programme for Research and Innovation H2020)
ID : 722606
Organisme : EC | Horizon 2020 Framework Programme (EU Framework Programme for Research and Innovation H2020)
ID : 861389
Organisme : EC | Horizon 2020 Framework Programme (EU Framework Programme for Research and Innovation H2020)
ID : 101021361
Organisme : Danmarks Grundforskningsfond (Danish National Research Foundation)
ID : PROTEIOS (DNRF128)
Organisme : Novo Nordisk Fonden (Novo Nordisk Foundation)
ID : NNF14CC0001

Informations de copyright

© 2024. Springer Nature Limited.

Références

Higuchi, R., Bowman, B., Freiberger, M., Ryder, O. A. & Wilson, A. C. DNA sequences from the quagga, an extinct member of the horse family. Nature 312, 282–284 (1984).
doi: 10.1038/312282a0 pubmed: 6504142
Pääbo, S., Gifford, J. A. & Wilson, A. C. Mitochondrial DNA sequences from a 7000-year old brain. Nucleic Acids Res. 16, 9775–9787 (1988).
pmcid: 338778 doi: 10.1093/nar/16.20.9775 pubmed: 3186445
Hagelberg, E. & Clegg, J. B. Isolation and characterization of DNA from archaeological bone. Proc. Biol. Sci. 244, 45–50 (1991).
doi: 10.1098/rspb.1991.0049 pubmed: 1677195
Poinar, H. N. et al. Metagenomics to paleogenomics: large-scale sequencing of mammoth DNA. Science 311, 392–394 (2006).
doi: 10.1126/science.1123360 pubmed: 16368896
Willerslev, E. et al. Ancient biomolecules from deep ice cores reveal a forested southern Greenland. Science 317, 111–114 (2007).
pmcid: 2694912 doi: 10.1126/science.1141758 pubmed: 17615355
Rasmussen, M. et al. Ancient human genome sequence of an extinct Palaeo-Eskimo. Nature 463, 757–762 (2010).
pmcid: 3951495 doi: 10.1038/nature08835 pubmed: 20148029
Green, R. E. et al. A draft sequence of the Neandertal genome. Science 328, 710–722 (2010).
pmcid: 5100745 doi: 10.1126/science.1188021 pubmed: 20448178
Orlando, L. et al. Recalibrating Equus evolution using the genome sequence of an early Middle Pleistocene horse. Nature 499, 74–78 (2013).
doi: 10.1038/nature12323 pubmed: 23803765
van der Valk, T. et al. Million-year-old DNA sheds light on the genomic history of mammoths. Nature 591, 265–269 (2021).
pmcid: 7116897 doi: 10.1038/s41586-021-03224-9 pubmed: 33597750
Meyer, M. et al. A mitochondrial genome sequence of a hominin from Sima de los Huesos. Nature 505, 403–406 (2014).
doi: 10.1038/nature12788 pubmed: 24305051
Lipson, M. et al. Ancient DNA and deep population structure in sub-Saharan African foragers. Nature 603, 290–296 (2022).
pmcid: 8907066 doi: 10.1038/s41586-022-04430-9 pubmed: 35197631
Cappellini, E. et al. Early Pleistocene enamel proteome from Dmanisi resolves Stephanorhinus phylogeny. Nature 574, 103–107 (2019).
pmcid: 6894936 doi: 10.1038/s41586-019-1555-y pubmed: 31511700
Welker, F. et al. Enamel proteome shows that Gigantopithecus was an early diverging pongine. Nature 576, 262–265 (2019).
pmcid: 6908745 doi: 10.1038/s41586-019-1728-8 pubmed: 31723270
Welker, F. et al. The dental proteome of Homo antecessor. Nature 580, 235–238 (2020).
pmcid: 7582224 doi: 10.1038/s41586-020-2153-8 pubmed: 32269345
Warinner, C., Korzow Richter, K. & Collins, M. J. Paleoproteomics. Chem. Rev. 122, 13401–13446 (2022).
pmcid: 9412968 doi: 10.1021/acs.chemrev.1c00703 pubmed: 35839101
Olsen, J. V., Ong, S.-E. & Mann, M. Trypsin cleaves exclusively C-terminal to arginine and lysine residues. Mol. Cell. Proteom. 3, 608–614 (2004).
doi: 10.1074/mcp.T400003-MCP200
Stewart, N. A., Gerlach, R. F., Gowland, R. L., Gron, K. J. & Montgomery, J. Sex determination of human remains from peptides in tooth enamel. Proc. Natl Acad. Sci. USA 114, 13649–13654 (2017).
pmcid: 5748210 doi: 10.1073/pnas.1714926115 pubmed: 29229823
Cappellini, E. et al. Proteomic analysis of a pleistocene mammoth femur reveals more than one hundred ancient bone proteins. J. Proteome Res. 11, 917–926 (2012).
doi: 10.1021/pr200721u pubmed: 22103443
Mackie, M. et al. Palaeoproteomic profiling of conservation layers on a 14th century Italian wall painting. Angew. Chem. 57, 7369–7374 (2018).
doi: 10.1002/anie.201713020
Rappsilber, J., Mann, M. & Ishihama, Y. Protocol for micro-purification, enrichment, pre-fractionation and storage of peptides for proteomics using StageTips. Nat. Protoc. 2, 1896–1906 (2007).
doi: 10.1038/nprot.2007.261 pubmed: 17703201
Parker, G. J. et al. Sex estimation using sexually dimorphic amelogenin protein fragments in human enamel. J. Archaeol. Sci. 101, 169–180 (2019).
doi: 10.1016/j.jas.2018.08.011
Rappsilber, J., Ishihama, Y. & Mann, M. Stop and go extraction tips for matrix-assisted laser desorption/ionization, nanoelectrospray, and LC/MS sample pretreatment in proteomics. Anal. Chem. 75, 663–670 (2003).
doi: 10.1021/ac026117i pubmed: 12585499
Peng, W., Pronker, M. F. & Snijder, J. Mass spectrometry-based de novo sequencing of monoclonal antibodies using multiple proteases and a dual fragmentation scheme. J. Proteome Res. 20, 3559–3566 (2021).
pmcid: 8256418 doi: 10.1021/acs.jproteome.1c00169 pubmed: 34121409
Cox, J. & Mann, M. MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat. Biotechnol. 26, 1367–1372 (2008).
doi: 10.1038/nbt.1511 pubmed: 19029910
Cox, J. et al. Andromeda: a peptide search engine integrated into the MaxQuant environment. J. Proteome Res. 10, 1794–1805 (2011).
doi: 10.1021/pr101065j pubmed: 21254760
Zhang, J. et al. PEAKS DB: de novo sequencing assisted database search for sensitive and accurate peptide identification. Mol. Cell. Proteom. 11, M111.010587 (2012).
doi: 10.1074/mcp.M111.010587
Orlando, L. et al. Ancient DNA analysis. Nat. Rev. Methods Prim. 1, 1–26 (2021).
Renaud, G., Schubert, M., Sawyer, S. & Orlando, L. Authentication and assessment of contamination in ancient DNA. Methods Mol. Biol. 1963, 163–194 (2019).
doi: 10.1007/978-1-4939-9176-1_17 pubmed: 30875054
Radzicka, A. & Wolfenden, R. Rates of uncatalyzed peptide bond hydrolysis in neutral solution and the transition state affinities of proteases. J. Am. Chem. Soc. 118, 6105–6109 (1996).
doi: 10.1021/ja954077c
Iwata, T. et al. Processing of ameloblastin by MMP-20. J. Dent. Res. 86, 153–157 (2007).
doi: 10.1177/154405910708600209 pubmed: 17251515
Yamakoshi, Y., Hu, J. C.-C., Fukae, M., Yamakoshi, F. & Simmer, J. P. How do enamelysin and kallikrein 4 process the 32-kDa enamelin? Eur. J. Oral. Sci. 114, 45–51 (2006). 379–80.
doi: 10.1111/j.1600-0722.2006.00281.x pubmed: 16674662
van Doorn, N. L., Wilson, J., Hollund, H., Soressi, M. & Collins, M. J. Site-specific deamidation of glutamine: a new marker of bone collagen deterioration. Rapid Commun. Mass Spectrom. 26, 2319–2327 (2012).
doi: 10.1002/rcm.6351 pubmed: 22956324
Schroeter, E. R. & Cleland, T. P. Glutamine deamidation: an indicator of antiquity, or preservational quality? Rapid Commun. Mass Spectrom. 30, 251–255 (2016).
doi: 10.1002/rcm.7445 pubmed: 26689157
Ramsøe, A. et al. DeamiDATE 1.0: site-specific deamidation as a tool to assess authenticity of members of ancient proteomes. J. Archaeol. Sci. 115, 105080 (2020).
doi: 10.1016/j.jas.2020.105080
Tagliabracci, V. S. et al. Secreted kinase phosphorylates extracellular proteins that regulate biomineralization. Science 336, 1150–1153 (2012).
pmcid: 3754843 doi: 10.1126/science.1217817 pubmed: 22582013
Penkman, K. E. H., Kaufman, D. S., Maddy, D. & Collins, M. J. Closed-system behaviour of the intra-crystalline fraction of amino acids in mollusc shells. Quat. Geochronol. 3, 2–25 (2008).
pmcid: 2727006 doi: 10.1016/j.quageo.2007.07.001 pubmed: 19684879
Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797 (2004).
pmcid: 390337 doi: 10.1093/nar/gkh340 pubmed: 15034147
Katoh, K., Misawa, K., Kuma, K.-I. & Miyata, T. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 30, 3059–3066 (2002).
pmcid: 135756 doi: 10.1093/nar/gkf436 pubmed: 12136088
Kearse, M. et al. Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics 28, 1647–1649 (2012).
pmcid: 3371832 doi: 10.1093/bioinformatics/bts199 pubmed: 22543367
Xiao, Y., Vecchi, M. M. & Wen, D. Distinguishing between leucine and isoleucine by integrated LC–MS analysis using an orbitrap fusion mass spectrometer. Anal. Chem. 88, 10757–10766 (2016).
doi: 10.1021/acs.analchem.6b03409 pubmed: 27704771
Gabriels, R., Martens, L. & Degroeve, S. Updated MS
pmcid: 6602496 doi: 10.1093/nar/gkz299 pubmed: 31028400
Tiwary, S. et al. High-quality MS/MS spectrum prediction for data-dependent and data-independent acquisition data analysis. Nat. Methods 16, 519–525 (2019).
doi: 10.1038/s41592-019-0427-6 pubmed: 31133761
Gessulat, S. et al. Prosit: proteome-wide prediction of peptide tandem mass spectra by deep learning. Nat. Methods 16, 509–518 (2019).
doi: 10.1038/s41592-019-0426-7 pubmed: 31133760
Wilhelm, M. et al. Deep learning boosts sensitivity of mass spectrometry-based immunopeptidomics. Nat. Commun. 12, 3346 (2021).
pmcid: 8184761 doi: 10.1038/s41467-021-23713-9 pubmed: 34099720
Gilbert, C. et al. Species identification of ivory and bone museum objects using minimally invasive proteomics. Sci. Adv. 10, eadi9028 (2024).
pmcid: 10816696 doi: 10.1126/sciadv.adi9028 pubmed: 38277452
Guindon, S. et al. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst. Biol. 59, 307–321 (2010).
doi: 10.1093/sysbio/syq010 pubmed: 20525638
Ronquist, F. et al. MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Syst. Biol. 61, 539–542 (2012).
pmcid: 3329765 doi: 10.1093/sysbio/sys029 pubmed: 22357727
Patramanis, I., Ramos-Madrigal, J., Cappellini, E. & Racimo, F. PaleoProPhyler: a reproducible pipeline for phylogenetic inference using ancient proteins. Peer Community J. 3, e112 (2023).
doi: 10.24072/pcjournal.344
Pamilo, P. & Nei, M. Relationships between gene trees and species trees. Mol. Biol. Evol. https://doi.org/10.1093/oxfordjournals.molbev.a040517 (1988).
Takahata, N. Gene genealogy in three related populations: consistency probability between gene and population trees. Genetics 122, 957–966 (1989).
pmcid: 1203770 doi: 10.1093/genetics/122.4.957 pubmed: 2759432
Maddison, W. P. Gene trees in species trees. Syst. Biol. 46, 523–536 (1997).
doi: 10.1093/sysbio/46.3.523
Nichols, R. Gene trees and species trees are not the same. Trends Ecol. Evol. 16, 358–364 (2001).
doi: 10.1016/S0169-5347(01)02203-0 pubmed: 11403868
Hobolth, A., Dutheil, J. Y., Hawks, J., Schierup, M. H. & Mailund, T. Incomplete lineage sorting patterns among human, chimpanzee, and orangutan suggest recent orangutan speciation and widespread selection. Genome Res. 21, 349–356 (2011).
pmcid: 3044849 doi: 10.1101/gr.114751.110 pubmed: 21270173
Mailund, T., Munch, K. & Schierup, M. H. Lineage sorting in apes. Annu. Rev. Genet. 48, 519–535 (2014).
doi: 10.1146/annurev-genet-120213-092532 pubmed: 25251849
Sousa, F., Bertrand, Y. J. K., Doyle, J. J., Oxelman, B. & Pfeil, B. E. Using genomic location and coalescent simulation to investigate gene tree discordance in Medicago L. Syst. Biol. 66, 934–949 (2017).
doi: 10.1093/sysbio/syx035 pubmed: 28177088
Scally, A. et al. Insights into hominid evolution from the gorilla genome sequence. Nature 483, 169–175 (2012).
pmcid: 3303130 doi: 10.1038/nature10842 pubmed: 22398555
Huerta-Sánchez, E. et al. Altitude adaptation in Tibetans caused by introgression of Denisovan-like DNA. Nature 512, 194–197 (2014).
pmcid: 4134395 doi: 10.1038/nature13408 pubmed: 25043035
Lanier, H. C., Huang, H. & Knowles, L. L. How low can you go? The effects of mutation rate on the accuracy of species-tree estimation. Mol. Phylogenet. Evol. 70, 112–119 (2014).
doi: 10.1016/j.ympev.2013.09.006 pubmed: 24060367
Madupe, P. P. et al. Enamel proteins reveal biological sex and genetic variability within southern African Paranthropus. Preprint at bioRxiv https://doi.org/10.1101/2023.07.03.547326 (2023).
Yu, Y., Yu, Y., Smith, M. & Pieper, R. A spinnable and automatable StageTip for high throughput peptide desalting and proteomics. Protoc. Exch. https://doi.org/10.1038/protex.2014.033 (2014).
doi: 10.1038/protex.2014.033
Prüfer, K. et al. The complete genome sequence of a Neanderthal from the Altai Mountains. Nature 505, 43–49 (2014).
doi: 10.1038/nature12886 pubmed: 24352235
UniProt: the universal protein knowledgebase. Nucleic Acids Res. 45, D158–D169 (2017).
O’Leary, N. A. et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 44, D733–D745 (2016).
doi: 10.1093/nar/gkv1189 pubmed: 26553804
Hall, T. BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids Symp. Ser. 41, 95–98 (1999).
Sievers, F. et al. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol. Syst. Biol. 7, 539 (2011).
pmcid: 3261699 doi: 10.1038/msb.2011.75 pubmed: 21988835
Kumar, S., Stecher, G., Li, M., Knyaz, C. & Tamura, K. MEGA X: molecular evolutionary genetics analysis across computing platforms. Mol. Biol. Evol. 35, 1547–1549 (2018).
pmcid: 5967553 doi: 10.1093/molbev/msy096 pubmed: 29722887
Posada, D. & Crandall, K. A. MODELTEST: testing the model of DNA substitution. Bioinformatics 14, 817–818 (1998).
doi: 10.1093/bioinformatics/14.9.817 pubmed: 9918953
Demarchi, B. et al. Protein sequences bound to mineral surfaces persist into deep time. eLife 5, e17092 (2016).
pmcid: 5039028 doi: 10.7554/eLife.17092 pubmed: 27668515
Lacruz, R. S., Habelitz, S., Wright, J. T. & Paine, M. L. Dental enamel formation and implications for oral health and disease. Physiol. Rev. 97, 939–993 (2017).
pmcid: 6151498 doi: 10.1152/physrev.00030.2016 pubmed: 28468833
Blausen.com staff. Medical gallery of Blausen Medical 2014. WikiJournal Med. https://doi.org/10.15347/wjm/2014.010 (2014).
Ahmadi, S. & Winter, D. Identification of poly(ethylene glycol) and poly(ethylene glycol)-based detergents using peptide search engines. Anal. Chem. 90, 6594–6600 (2018).
doi: 10.1021/acs.analchem.8b00365 pubmed: 29726681
Bartlett, J. D. Dental enamel development: proteinases and their enamel matrix substrates. ISRN Dent. 2013, 684607 (2013).
pmcid: 3789414 pubmed: 24159389
Lu, Y. et al. Functions of KLK4 and MMP-20 in dental enamel formation. Biol. Chem. 389, 695–700 (2008).
pmcid: 2688471 doi: 10.1515/BC.2008.080 pubmed: 18627287
The UniProt Consortium. UniProt: the Universal Protein Knowledgebase in 2023. Nucleic Acids Res. 51, D523–D531 (2023).
Sayers, E. W. et al. Database resources of the national center for biotechnology information. Nucleic Acids Res. 50, D20–D26 (2022).
doi: 10.1093/nar/gkab1112 pubmed: 34850941
Clark, K., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J. & Sayers, E. W. GenBank. Nucleic Acids Res. 44, D67–D72 (2016).
doi: 10.1093/nar/gkv1276 pubmed: 26590407
The NCBI C++ Toolkit. National Center for Biotechnology Information (2003).
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).
doi: 10.1016/S0022-2836(05)80360-2 pubmed: 2231712
Prüfer, K. et al. Computational challenges in the analysis of ancient DNA. Genome Biol. 11, R47 (2010).
pmcid: 2898072 doi: 10.1186/gb-2010-11-5-r47 pubmed: 20441577
Hendy, J. et al. A guide to ancient protein studies. Nat. Ecol. Evol. 2, 791–799 (2018).
doi: 10.1038/s41559-018-0510-x pubmed: 29581591

Auteurs

Alberto J Taurozzi (AJ)

Globe Institute, University of Copenhagen, Copenhagen, Denmark.

Patrick L Rüther (PL)

Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen, Denmark.

Ioannis Patramanis (I)

Globe Institute, University of Copenhagen, Copenhagen, Denmark.

Claire Koenig (C)

Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen, Denmark.

Ryan Sinclair Paterson (R)

Globe Institute, University of Copenhagen, Copenhagen, Denmark.

Palesa P Madupe (PP)

Globe Institute, University of Copenhagen, Copenhagen, Denmark.

Florian Simon Harking (FS)

Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen, Denmark.

Frido Welker (F)

Globe Institute, University of Copenhagen, Copenhagen, Denmark.

Meaghan Mackie (M)

Globe Institute, University of Copenhagen, Copenhagen, Denmark.
Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen, Denmark.

Jazmín Ramos-Madrigal (J)

Globe Institute, University of Copenhagen, Copenhagen, Denmark. jazmin.madrigal@sund.ku.dk.

Jesper V Olsen (JV)

Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen, Denmark.

Enrico Cappellini (E)

Globe Institute, University of Copenhagen, Copenhagen, Denmark. ecappellini@sund.ku.dk.

Classifications MeSH