Deep-time phylogenetic inference by paleoproteomic analysis of dental enamel.

Journal

Nature protocols

ISSN: 1750-2799

Titre abrégé: Nat Protoc

Pays: England

ID NLM: 101284307

Informations de publication

Date de publication:
26 Apr 2024

Historique:

received: 14 03 2023

accepted: 12 01 2024

medline: 27 4 2024

pubmed: 27 4 2024

entrez: 26 4 2024

Statut: aheadofprint

Résumé

In temperate and subtropical regions, ancient proteins are reported to survive up to about 2 million years, far beyond the known limits of ancient DNA preservation in the same areas. Accordingly, their amino acid sequences currently represent the only source of genetic information available to pursue phylogenetic inference involving species that went extinct too long ago to be amenable for ancient DNA analysis. Here we present a complete workflow, including sample preparation, mass spectrometric data acquisition and computational analysis, to recover and interpret million-year-old dental enamel protein sequences. During sample preparation, the proteolytic digestion step, usually an integral part of conventional bottom-up proteomics, is omitted to increase the recovery of the randomly degraded peptides spontaneously generated by extensive diagenetic hydrolysis of ancient proteins over geological time. Similarly, we describe other solutions we have adopted to (1) authenticate the endogenous origin of the protein traces we identify, (2) detect and validate amino acid variation in the ancient protein sequences and (3) attempt phylogenetic inference. Sample preparation and data acquisition can be completed in 3-4 working days, while subsequent data analysis usually takes 2-5 days. The workflow described requires basic expertise in ancient biomolecules analysis, mass spectrometry-based proteomics and molecular phylogeny. Finally, we describe the limits of this approach and its potential for the reconstruction of evolutionary relationships in paleontology and paleoanthropology.

Identifiants

DOI: 10.1038/s41596-024-00975-3 PMID: 38671208

pubmed: 38671208

doi: 10.1038/s41596-024-00975-3

pii: 10.1038/s41596-024-00975-3

doi:

Types de publication

Journal Article Review

Langues

eng

Sous-ensembles de citation

IM

Subventions

Organisme : Villum Fonden (Villum Foundation)

ID : 17649

Organisme : Villum Fonden (Villum Foundation)

ID : 17649

Organisme : EC | Horizon 2020 Framework Programme (EU Framework Programme for Research and Innovation H2020)

ID : 722606

Organisme : EC | Horizon 2020 Framework Programme (EU Framework Programme for Research and Innovation H2020)

ID : 861389

Organisme : EC | Horizon 2020 Framework Programme (EU Framework Programme for Research and Innovation H2020)

ID : 101021361

Organisme : EC | Horizon 2020 Framework Programme (EU Framework Programme for Research and Innovation H2020)

ID : 101021361

Organisme : EC | Horizon 2020 Framework Programme (EU Framework Programme for Research and Innovation H2020)

ID : 722606

Organisme : EC | Horizon 2020 Framework Programme (EU Framework Programme for Research and Innovation H2020)

ID : 861389

Organisme : EC | Horizon 2020 Framework Programme (EU Framework Programme for Research and Innovation H2020)

ID : 861389

Organisme : EC | Horizon 2020 Framework Programme (EU Framework Programme for Research and Innovation H2020)

ID : 861389

Organisme : EC | Horizon 2020 Framework Programme (EU Framework Programme for Research and Innovation H2020)

ID : 861389

Organisme : EC | Horizon 2020 Framework Programme (EU Framework Programme for Research and Innovation H2020)

ID : 948365

Organisme : EC | Horizon 2020 Framework Programme (EU Framework Programme for Research and Innovation H2020)

ID : 101021361

Organisme : EC | Horizon 2020 Framework Programme (EU Framework Programme for Research and Innovation H2020)

ID : 722606

Organisme : EC | Horizon 2020 Framework Programme (EU Framework Programme for Research and Innovation H2020)

ID : 861389

Organisme : EC | Horizon 2020 Framework Programme (EU Framework Programme for Research and Innovation H2020)

ID : 101021361

Organisme : Danmarks Grundforskningsfond (Danish National Research Foundation)

ID : PROTEIOS (DNRF128)

Organisme : Novo Nordisk Fonden (Novo Nordisk Foundation)

ID : NNF14CC0001

Informations de copyright

© 2024. Springer Nature Limited.

Références

Higuchi, R., Bowman, B., Freiberger, M., Ryder, O. A. & Wilson, A. C. DNA sequences from the quagga, an extinct member of the horse family. Nature 312, 282–284 (1984).

doi: 10.1038/312282a0 pubmed: 6504142

Pääbo, S., Gifford, J. A. & Wilson, A. C. Mitochondrial DNA sequences from a 7000-year old brain. Nucleic Acids Res. 16, 9775–9787 (1988).

pmcid: 338778 doi: 10.1093/nar/16.20.9775 pubmed: 3186445

Hagelberg, E. & Clegg, J. B. Isolation and characterization of DNA from archaeological bone. Proc. Biol. Sci. 244, 45–50 (1991).

doi: 10.1098/rspb.1991.0049 pubmed: 1677195

Poinar, H. N. et al. Metagenomics to paleogenomics: large-scale sequencing of mammoth DNA. Science 311, 392–394 (2006).

doi: 10.1126/science.1123360 pubmed: 16368896

Willerslev, E. et al. Ancient biomolecules from deep ice cores reveal a forested southern Greenland. Science 317, 111–114 (2007).

pmcid: 2694912 doi: 10.1126/science.1141758 pubmed: 17615355

Rasmussen, M. et al. Ancient human genome sequence of an extinct Palaeo-Eskimo. Nature 463, 757–762 (2010).

pmcid: 3951495 doi: 10.1038/nature08835 pubmed: 20148029

Green, R. E. et al. A draft sequence of the Neandertal genome. Science 328, 710–722 (2010).

pmcid: 5100745 doi: 10.1126/science.1188021 pubmed: 20448178

Orlando, L. et al. Recalibrating Equus evolution using the genome sequence of an early Middle Pleistocene horse. Nature 499, 74–78 (2013).

doi: 10.1038/nature12323 pubmed: 23803765

van der Valk, T. et al. Million-year-old DNA sheds light on the genomic history of mammoths. Nature 591, 265–269 (2021).

pmcid: 7116897 doi: 10.1038/s41586-021-03224-9 pubmed: 33597750

Meyer, M. et al. A mitochondrial genome sequence of a hominin from Sima de los Huesos. Nature 505, 403–406 (2014).

doi: 10.1038/nature12788 pubmed: 24305051

Lipson, M. et al. Ancient DNA and deep population structure in sub-Saharan African foragers. Nature 603, 290–296 (2022).

pmcid: 8907066 doi: 10.1038/s41586-022-04430-9 pubmed: 35197631

Cappellini, E. et al. Early Pleistocene enamel proteome from Dmanisi resolves Stephanorhinus phylogeny. Nature 574, 103–107 (2019).

pmcid: 6894936 doi: 10.1038/s41586-019-1555-y pubmed: 31511700

Welker, F. et al. Enamel proteome shows that Gigantopithecus was an early diverging pongine. Nature 576, 262–265 (2019).

pmcid: 6908745 doi: 10.1038/s41586-019-1728-8 pubmed: 31723270

Welker, F. et al. The dental proteome of Homo antecessor. Nature 580, 235–238 (2020).

pmcid: 7582224 doi: 10.1038/s41586-020-2153-8 pubmed: 32269345

Warinner, C., Korzow Richter, K. & Collins, M. J. Paleoproteomics. Chem. Rev. 122, 13401–13446 (2022).

pmcid: 9412968 doi: 10.1021/acs.chemrev.1c00703 pubmed: 35839101

Olsen, J. V., Ong, S.-E. & Mann, M. Trypsin cleaves exclusively C-terminal to arginine and lysine residues. Mol. Cell. Proteom. 3, 608–614 (2004).

doi: 10.1074/mcp.T400003-MCP200

Stewart, N. A., Gerlach, R. F., Gowland, R. L., Gron, K. J. & Montgomery, J. Sex determination of human remains from peptides in tooth enamel. Proc. Natl Acad. Sci. USA 114, 13649–13654 (2017).

pmcid: 5748210 doi: 10.1073/pnas.1714926115 pubmed: 29229823

Cappellini, E. et al. Proteomic analysis of a pleistocene mammoth femur reveals more than one hundred ancient bone proteins. J. Proteome Res. 11, 917–926 (2012).

doi: 10.1021/pr200721u pubmed: 22103443

Mackie, M. et al. Palaeoproteomic profiling of conservation layers on a 14th century Italian wall painting. Angew. Chem. 57, 7369–7374 (2018).

doi: 10.1002/anie.201713020

Rappsilber, J., Mann, M. & Ishihama, Y. Protocol for micro-purification, enrichment, pre-fractionation and storage of peptides for proteomics using StageTips. Nat. Protoc. 2, 1896–1906 (2007).

doi: 10.1038/nprot.2007.261 pubmed: 17703201

Parker, G. J. et al. Sex estimation using sexually dimorphic amelogenin protein fragments in human enamel. J. Archaeol. Sci. 101, 169–180 (2019).

doi: 10.1016/j.jas.2018.08.011

Rappsilber, J., Ishihama, Y. & Mann, M. Stop and go extraction tips for matrix-assisted laser desorption/ionization, nanoelectrospray, and LC/MS sample pretreatment in proteomics. Anal. Chem. 75, 663–670 (2003).

doi: 10.1021/ac026117i pubmed: 12585499

Peng, W., Pronker, M. F. & Snijder, J. Mass spectrometry-based de novo sequencing of monoclonal antibodies using multiple proteases and a dual fragmentation scheme. J. Proteome Res. 20, 3559–3566 (2021).

pmcid: 8256418 doi: 10.1021/acs.jproteome.1c00169 pubmed: 34121409

Cox, J. & Mann, M. MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat. Biotechnol. 26, 1367–1372 (2008).

doi: 10.1038/nbt.1511 pubmed: 19029910

Cox, J. et al. Andromeda: a peptide search engine integrated into the MaxQuant environment. J. Proteome Res. 10, 1794–1805 (2011).

doi: 10.1021/pr101065j pubmed: 21254760

Zhang, J. et al. PEAKS DB: de novo sequencing assisted database search for sensitive and accurate peptide identification. Mol. Cell. Proteom. 11, M111.010587 (2012).

doi: 10.1074/mcp.M111.010587

Orlando, L. et al. Ancient DNA analysis. Nat. Rev. Methods Prim. 1, 1–26 (2021).

Renaud, G., Schubert, M., Sawyer, S. & Orlando, L. Authentication and assessment of contamination in ancient DNA. Methods Mol. Biol. 1963, 163–194 (2019).

doi: 10.1007/978-1-4939-9176-1_17 pubmed: 30875054

Radzicka, A. & Wolfenden, R. Rates of uncatalyzed peptide bond hydrolysis in neutral solution and the transition state affinities of proteases. J. Am. Chem. Soc. 118, 6105–6109 (1996).

doi: 10.1021/ja954077c

Iwata, T. et al. Processing of ameloblastin by MMP-20. J. Dent. Res. 86, 153–157 (2007).

doi: 10.1177/154405910708600209 pubmed: 17251515

Yamakoshi, Y., Hu, J. C.-C., Fukae, M., Yamakoshi, F. & Simmer, J. P. How do enamelysin and kallikrein 4 process the 32-kDa enamelin? Eur. J. Oral. Sci. 114, 45–51 (2006). 379–80.

doi: 10.1111/j.1600-0722.2006.00281.x pubmed: 16674662

van Doorn, N. L., Wilson, J., Hollund, H., Soressi, M. & Collins, M. J. Site-specific deamidation of glutamine: a new marker of bone collagen deterioration. Rapid Commun. Mass Spectrom. 26, 2319–2327 (2012).

doi: 10.1002/rcm.6351 pubmed: 22956324

Schroeter, E. R. & Cleland, T. P. Glutamine deamidation: an indicator of antiquity, or preservational quality? Rapid Commun. Mass Spectrom. 30, 251–255 (2016).

doi: 10.1002/rcm.7445 pubmed: 26689157

Ramsøe, A. et al. DeamiDATE 1.0: site-specific deamidation as a tool to assess authenticity of members of ancient proteomes. J. Archaeol. Sci. 115, 105080 (2020).

doi: 10.1016/j.jas.2020.105080

Tagliabracci, V. S. et al. Secreted kinase phosphorylates extracellular proteins that regulate biomineralization. Science 336, 1150–1153 (2012).

pmcid: 3754843 doi: 10.1126/science.1217817 pubmed: 22582013

Penkman, K. E. H., Kaufman, D. S., Maddy, D. & Collins, M. J. Closed-system behaviour of the intra-crystalline fraction of amino acids in mollusc shells. Quat. Geochronol. 3, 2–25 (2008).

pmcid: 2727006 doi: 10.1016/j.quageo.2007.07.001 pubmed: 19684879

Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797 (2004).

pmcid: 390337 doi: 10.1093/nar/gkh340 pubmed: 15034147

Katoh, K., Misawa, K., Kuma, K.-I. & Miyata, T. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 30, 3059–3066 (2002).

pmcid: 135756 doi: 10.1093/nar/gkf436 pubmed: 12136088

Kearse, M. et al. Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics 28, 1647–1649 (2012).

pmcid: 3371832 doi: 10.1093/bioinformatics/bts199 pubmed: 22543367

Xiao, Y., Vecchi, M. M. & Wen, D. Distinguishing between leucine and isoleucine by integrated LC–MS analysis using an orbitrap fusion mass spectrometer. Anal. Chem. 88, 10757–10766 (2016).

doi: 10.1021/acs.analchem.6b03409 pubmed: 27704771

Gabriels, R., Martens, L. & Degroeve, S. Updated MS

pmcid: 6602496 doi: 10.1093/nar/gkz299 pubmed: 31028400

Tiwary, S. et al. High-quality MS/MS spectrum prediction for data-dependent and data-independent acquisition data analysis. Nat. Methods 16, 519–525 (2019).

doi: 10.1038/s41592-019-0427-6 pubmed: 31133761

Gessulat, S. et al. Prosit: proteome-wide prediction of peptide tandem mass spectra by deep learning. Nat. Methods 16, 509–518 (2019).

doi: 10.1038/s41592-019-0426-7 pubmed: 31133760

Wilhelm, M. et al. Deep learning boosts sensitivity of mass spectrometry-based immunopeptidomics. Nat. Commun. 12, 3346 (2021).

pmcid: 8184761 doi: 10.1038/s41467-021-23713-9 pubmed: 34099720

Gilbert, C. et al. Species identification of ivory and bone museum objects using minimally invasive proteomics. Sci. Adv. 10, eadi9028 (2024).

pmcid: 10816696 doi: 10.1126/sciadv.adi9028 pubmed: 38277452

Guindon, S. et al. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst. Biol. 59, 307–321 (2010).

doi: 10.1093/sysbio/syq010 pubmed: 20525638

Ronquist, F. et al. MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Syst. Biol. 61, 539–542 (2012).

pmcid: 3329765 doi: 10.1093/sysbio/sys029 pubmed: 22357727

Patramanis, I., Ramos-Madrigal, J., Cappellini, E. & Racimo, F. PaleoProPhyler: a reproducible pipeline for phylogenetic inference using ancient proteins. Peer Community J. 3, e112 (2023).

doi: 10.24072/pcjournal.344

Pamilo, P. & Nei, M. Relationships between gene trees and species trees. Mol. Biol. Evol. https://doi.org/10.1093/oxfordjournals.molbev.a040517 (1988).

Takahata, N. Gene genealogy in three related populations: consistency probability between gene and population trees. Genetics 122, 957–966 (1989).

pmcid: 1203770 doi: 10.1093/genetics/122.4.957 pubmed: 2759432

Maddison, W. P. Gene trees in species trees. Syst. Biol. 46, 523–536 (1997).

doi: 10.1093/sysbio/46.3.523

Nichols, R. Gene trees and species trees are not the same. Trends Ecol. Evol. 16, 358–364 (2001).

doi: 10.1016/S0169-5347(01)02203-0 pubmed: 11403868

Hobolth, A., Dutheil, J. Y., Hawks, J., Schierup, M. H. & Mailund, T. Incomplete lineage sorting patterns among human, chimpanzee, and orangutan suggest recent orangutan speciation and widespread selection. Genome Res. 21, 349–356 (2011).

pmcid: 3044849 doi: 10.1101/gr.114751.110 pubmed: 21270173

Mailund, T., Munch, K. & Schierup, M. H. Lineage sorting in apes. Annu. Rev. Genet. 48, 519–535 (2014).

doi: 10.1146/annurev-genet-120213-092532 pubmed: 25251849

Sousa, F., Bertrand, Y. J. K., Doyle, J. J., Oxelman, B. & Pfeil, B. E. Using genomic location and coalescent simulation to investigate gene tree discordance in Medicago L. Syst. Biol. 66, 934–949 (2017).

doi: 10.1093/sysbio/syx035 pubmed: 28177088

Scally, A. et al. Insights into hominid evolution from the gorilla genome sequence. Nature 483, 169–175 (2012).

pmcid: 3303130 doi: 10.1038/nature10842 pubmed: 22398555

Huerta-Sánchez, E. et al. Altitude adaptation in Tibetans caused by introgression of Denisovan-like DNA. Nature 512, 194–197 (2014).

pmcid: 4134395 doi: 10.1038/nature13408 pubmed: 25043035

Lanier, H. C., Huang, H. & Knowles, L. L. How low can you go? The effects of mutation rate on the accuracy of species-tree estimation. Mol. Phylogenet. Evol. 70, 112–119 (2014).

doi: 10.1016/j.ympev.2013.09.006 pubmed: 24060367

Madupe, P. P. et al. Enamel proteins reveal biological sex and genetic variability within southern African Paranthropus. Preprint at bioRxiv https://doi.org/10.1101/2023.07.03.547326 (2023).

Yu, Y., Yu, Y., Smith, M. & Pieper, R. A spinnable and automatable StageTip for high throughput peptide desalting and proteomics. Protoc. Exch. https://doi.org/10.1038/protex.2014.033 (2014).

doi: 10.1038/protex.2014.033

Prüfer, K. et al. The complete genome sequence of a Neanderthal from the Altai Mountains. Nature 505, 43–49 (2014).

doi: 10.1038/nature12886 pubmed: 24352235

UniProt: the universal protein knowledgebase. Nucleic Acids Res. 45, D158–D169 (2017).

O’Leary, N. A. et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 44, D733–D745 (2016).

doi: 10.1093/nar/gkv1189 pubmed: 26553804

Hall, T. BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids Symp. Ser. 41, 95–98 (1999).

Sievers, F. et al. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol. Syst. Biol. 7, 539 (2011).

pmcid: 3261699 doi: 10.1038/msb.2011.75 pubmed: 21988835

Kumar, S., Stecher, G., Li, M., Knyaz, C. & Tamura, K. MEGA X: molecular evolutionary genetics analysis across computing platforms. Mol. Biol. Evol. 35, 1547–1549 (2018).

pmcid: 5967553 doi: 10.1093/molbev/msy096 pubmed: 29722887

Posada, D. & Crandall, K. A. MODELTEST: testing the model of DNA substitution. Bioinformatics 14, 817–818 (1998).

doi: 10.1093/bioinformatics/14.9.817 pubmed: 9918953

Demarchi, B. et al. Protein sequences bound to mineral surfaces persist into deep time. eLife 5, e17092 (2016).

pmcid: 5039028 doi: 10.7554/eLife.17092 pubmed: 27668515

Lacruz, R. S., Habelitz, S., Wright, J. T. & Paine, M. L. Dental enamel formation and implications for oral health and disease. Physiol. Rev. 97, 939–993 (2017).

pmcid: 6151498 doi: 10.1152/physrev.00030.2016 pubmed: 28468833

Blausen.com staff. Medical gallery of Blausen Medical 2014. WikiJournal Med. https://doi.org/10.15347/wjm/2014.010 (2014).

Ahmadi, S. & Winter, D. Identification of poly(ethylene glycol) and poly(ethylene glycol)-based detergents using peptide search engines. Anal. Chem. 90, 6594–6600 (2018).

doi: 10.1021/acs.analchem.8b00365 pubmed: 29726681

Bartlett, J. D. Dental enamel development: proteinases and their enamel matrix substrates. ISRN Dent. 2013, 684607 (2013).

pmcid: 3789414 pubmed: 24159389

Lu, Y. et al. Functions of KLK4 and MMP-20 in dental enamel formation. Biol. Chem. 389, 695–700 (2008).

pmcid: 2688471 doi: 10.1515/BC.2008.080 pubmed: 18627287

The UniProt Consortium. UniProt: the Universal Protein Knowledgebase in 2023. Nucleic Acids Res. 51, D523–D531 (2023).

Sayers, E. W. et al. Database resources of the national center for biotechnology information. Nucleic Acids Res. 50, D20–D26 (2022).

doi: 10.1093/nar/gkab1112 pubmed: 34850941

Clark, K., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J. & Sayers, E. W. GenBank. Nucleic Acids Res. 44, D67–D72 (2016).

doi: 10.1093/nar/gkv1276 pubmed: 26590407

The NCBI C++ Toolkit. National Center for Biotechnology Information (2003).

Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).

doi: 10.1016/S0022-2836(05)80360-2 pubmed: 2231712

Prüfer, K. et al. Computational challenges in the analysis of ancient DNA. Genome Biol. 11, R47 (2010).

pmcid: 2898072 doi: 10.1186/gb-2010-11-5-r47 pubmed: 20441577

Hendy, J. et al. A guide to ancient protein studies. Nat. Ecol. Evol. 2, 791–799 (2018).

doi: 10.1038/s41559-018-0510-x pubmed: 29581591

Auteurs

Alberto J Taurozzi (AJ)

Globe Institute, University of Copenhagen, Copenhagen, Denmark.

ORCID: 0000-0003-0378-1626

Patrick L Rüther (PL)

Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen, Denmark.

Ioannis Patramanis (I)

Globe Institute, University of Copenhagen, Copenhagen, Denmark.

Claire Koenig (C)

Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen, Denmark.

ORCID: 0000-0002-7327-2723

Ryan Sinclair Paterson (R)

Globe Institute, University of Copenhagen, Copenhagen, Denmark.

Palesa P Madupe (PP)

Globe Institute, University of Copenhagen, Copenhagen, Denmark.

Florian Simon Harking (FS)

Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen, Denmark.

ORCID: 0000-0001-5553-1421

Frido Welker (F)

Globe Institute, University of Copenhagen, Copenhagen, Denmark.

ORCID: 0000-0002-4846-6104

Meaghan Mackie (M)

Globe Institute, University of Copenhagen, Copenhagen, Denmark.

Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen, Denmark.

ORCID: 0000-0003-0763-7592

Jazmín Ramos-Madrigal (J)

Globe Institute, University of Copenhagen, Copenhagen, Denmark. jazmin.madrigal@sund.ku.dk.

ORCID: 0000-0002-1661-7991

Jesper V Olsen (JV)

Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen, Denmark.

ORCID: 0000-0002-4747-4938

Enrico Cappellini (E)

Globe Institute, University of Copenhagen, Copenhagen, Denmark. ecappellini@sund.ku.dk.

ORCID: 0000-0001-7885-7811

Classifications MeSH