Deep-time phylogenetic inference by paleoproteomic analysis of dental enamel.
Journal
Nature protocols
ISSN: 1750-2799
Titre abrégé: Nat Protoc
Pays: England
ID NLM: 101284307
Informations de publication
Date de publication:
26 Apr 2024
26 Apr 2024
Historique:
received:
14
03
2023
accepted:
12
01
2024
medline:
27
4
2024
pubmed:
27
4
2024
entrez:
26
4
2024
Statut:
aheadofprint
Résumé
In temperate and subtropical regions, ancient proteins are reported to survive up to about 2 million years, far beyond the known limits of ancient DNA preservation in the same areas. Accordingly, their amino acid sequences currently represent the only source of genetic information available to pursue phylogenetic inference involving species that went extinct too long ago to be amenable for ancient DNA analysis. Here we present a complete workflow, including sample preparation, mass spectrometric data acquisition and computational analysis, to recover and interpret million-year-old dental enamel protein sequences. During sample preparation, the proteolytic digestion step, usually an integral part of conventional bottom-up proteomics, is omitted to increase the recovery of the randomly degraded peptides spontaneously generated by extensive diagenetic hydrolysis of ancient proteins over geological time. Similarly, we describe other solutions we have adopted to (1) authenticate the endogenous origin of the protein traces we identify, (2) detect and validate amino acid variation in the ancient protein sequences and (3) attempt phylogenetic inference. Sample preparation and data acquisition can be completed in 3-4 working days, while subsequent data analysis usually takes 2-5 days. The workflow described requires basic expertise in ancient biomolecules analysis, mass spectrometry-based proteomics and molecular phylogeny. Finally, we describe the limits of this approach and its potential for the reconstruction of evolutionary relationships in paleontology and paleoanthropology.
Identifiants
pubmed: 38671208
doi: 10.1038/s41596-024-00975-3
pii: 10.1038/s41596-024-00975-3
doi:
Types de publication
Journal Article
Review
Langues
eng
Sous-ensembles de citation
IM
Subventions
Organisme : Villum Fonden (Villum Foundation)
ID : 17649
Organisme : Villum Fonden (Villum Foundation)
ID : 17649
Organisme : EC | Horizon 2020 Framework Programme (EU Framework Programme for Research and Innovation H2020)
ID : 722606
Organisme : EC | Horizon 2020 Framework Programme (EU Framework Programme for Research and Innovation H2020)
ID : 861389
Organisme : EC | Horizon 2020 Framework Programme (EU Framework Programme for Research and Innovation H2020)
ID : 101021361
Organisme : EC | Horizon 2020 Framework Programme (EU Framework Programme for Research and Innovation H2020)
ID : 101021361
Organisme : EC | Horizon 2020 Framework Programme (EU Framework Programme for Research and Innovation H2020)
ID : 722606
Organisme : EC | Horizon 2020 Framework Programme (EU Framework Programme for Research and Innovation H2020)
ID : 861389
Organisme : EC | Horizon 2020 Framework Programme (EU Framework Programme for Research and Innovation H2020)
ID : 861389
Organisme : EC | Horizon 2020 Framework Programme (EU Framework Programme for Research and Innovation H2020)
ID : 861389
Organisme : EC | Horizon 2020 Framework Programme (EU Framework Programme for Research and Innovation H2020)
ID : 861389
Organisme : EC | Horizon 2020 Framework Programme (EU Framework Programme for Research and Innovation H2020)
ID : 948365
Organisme : EC | Horizon 2020 Framework Programme (EU Framework Programme for Research and Innovation H2020)
ID : 101021361
Organisme : EC | Horizon 2020 Framework Programme (EU Framework Programme for Research and Innovation H2020)
ID : 722606
Organisme : EC | Horizon 2020 Framework Programme (EU Framework Programme for Research and Innovation H2020)
ID : 861389
Organisme : EC | Horizon 2020 Framework Programme (EU Framework Programme for Research and Innovation H2020)
ID : 101021361
Organisme : Danmarks Grundforskningsfond (Danish National Research Foundation)
ID : PROTEIOS (DNRF128)
Organisme : Novo Nordisk Fonden (Novo Nordisk Foundation)
ID : NNF14CC0001
Informations de copyright
© 2024. Springer Nature Limited.
Références
Higuchi, R., Bowman, B., Freiberger, M., Ryder, O. A. & Wilson, A. C. DNA sequences from the quagga, an extinct member of the horse family. Nature 312, 282–284 (1984).
doi: 10.1038/312282a0
pubmed: 6504142
Pääbo, S., Gifford, J. A. & Wilson, A. C. Mitochondrial DNA sequences from a 7000-year old brain. Nucleic Acids Res. 16, 9775–9787 (1988).
pmcid: 338778
doi: 10.1093/nar/16.20.9775
pubmed: 3186445
Hagelberg, E. & Clegg, J. B. Isolation and characterization of DNA from archaeological bone. Proc. Biol. Sci. 244, 45–50 (1991).
doi: 10.1098/rspb.1991.0049
pubmed: 1677195
Poinar, H. N. et al. Metagenomics to paleogenomics: large-scale sequencing of mammoth DNA. Science 311, 392–394 (2006).
doi: 10.1126/science.1123360
pubmed: 16368896
Willerslev, E. et al. Ancient biomolecules from deep ice cores reveal a forested southern Greenland. Science 317, 111–114 (2007).
pmcid: 2694912
doi: 10.1126/science.1141758
pubmed: 17615355
Rasmussen, M. et al. Ancient human genome sequence of an extinct Palaeo-Eskimo. Nature 463, 757–762 (2010).
pmcid: 3951495
doi: 10.1038/nature08835
pubmed: 20148029
Green, R. E. et al. A draft sequence of the Neandertal genome. Science 328, 710–722 (2010).
pmcid: 5100745
doi: 10.1126/science.1188021
pubmed: 20448178
Orlando, L. et al. Recalibrating Equus evolution using the genome sequence of an early Middle Pleistocene horse. Nature 499, 74–78 (2013).
doi: 10.1038/nature12323
pubmed: 23803765
van der Valk, T. et al. Million-year-old DNA sheds light on the genomic history of mammoths. Nature 591, 265–269 (2021).
pmcid: 7116897
doi: 10.1038/s41586-021-03224-9
pubmed: 33597750
Meyer, M. et al. A mitochondrial genome sequence of a hominin from Sima de los Huesos. Nature 505, 403–406 (2014).
doi: 10.1038/nature12788
pubmed: 24305051
Lipson, M. et al. Ancient DNA and deep population structure in sub-Saharan African foragers. Nature 603, 290–296 (2022).
pmcid: 8907066
doi: 10.1038/s41586-022-04430-9
pubmed: 35197631
Cappellini, E. et al. Early Pleistocene enamel proteome from Dmanisi resolves Stephanorhinus phylogeny. Nature 574, 103–107 (2019).
pmcid: 6894936
doi: 10.1038/s41586-019-1555-y
pubmed: 31511700
Welker, F. et al. Enamel proteome shows that Gigantopithecus was an early diverging pongine. Nature 576, 262–265 (2019).
pmcid: 6908745
doi: 10.1038/s41586-019-1728-8
pubmed: 31723270
Welker, F. et al. The dental proteome of Homo antecessor. Nature 580, 235–238 (2020).
pmcid: 7582224
doi: 10.1038/s41586-020-2153-8
pubmed: 32269345
Warinner, C., Korzow Richter, K. & Collins, M. J. Paleoproteomics. Chem. Rev. 122, 13401–13446 (2022).
pmcid: 9412968
doi: 10.1021/acs.chemrev.1c00703
pubmed: 35839101
Olsen, J. V., Ong, S.-E. & Mann, M. Trypsin cleaves exclusively C-terminal to arginine and lysine residues. Mol. Cell. Proteom. 3, 608–614 (2004).
doi: 10.1074/mcp.T400003-MCP200
Stewart, N. A., Gerlach, R. F., Gowland, R. L., Gron, K. J. & Montgomery, J. Sex determination of human remains from peptides in tooth enamel. Proc. Natl Acad. Sci. USA 114, 13649–13654 (2017).
pmcid: 5748210
doi: 10.1073/pnas.1714926115
pubmed: 29229823
Cappellini, E. et al. Proteomic analysis of a pleistocene mammoth femur reveals more than one hundred ancient bone proteins. J. Proteome Res. 11, 917–926 (2012).
doi: 10.1021/pr200721u
pubmed: 22103443
Mackie, M. et al. Palaeoproteomic profiling of conservation layers on a 14th century Italian wall painting. Angew. Chem. 57, 7369–7374 (2018).
doi: 10.1002/anie.201713020
Rappsilber, J., Mann, M. & Ishihama, Y. Protocol for micro-purification, enrichment, pre-fractionation and storage of peptides for proteomics using StageTips. Nat. Protoc. 2, 1896–1906 (2007).
doi: 10.1038/nprot.2007.261
pubmed: 17703201
Parker, G. J. et al. Sex estimation using sexually dimorphic amelogenin protein fragments in human enamel. J. Archaeol. Sci. 101, 169–180 (2019).
doi: 10.1016/j.jas.2018.08.011
Rappsilber, J., Ishihama, Y. & Mann, M. Stop and go extraction tips for matrix-assisted laser desorption/ionization, nanoelectrospray, and LC/MS sample pretreatment in proteomics. Anal. Chem. 75, 663–670 (2003).
doi: 10.1021/ac026117i
pubmed: 12585499
Peng, W., Pronker, M. F. & Snijder, J. Mass spectrometry-based de novo sequencing of monoclonal antibodies using multiple proteases and a dual fragmentation scheme. J. Proteome Res. 20, 3559–3566 (2021).
pmcid: 8256418
doi: 10.1021/acs.jproteome.1c00169
pubmed: 34121409
Cox, J. & Mann, M. MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat. Biotechnol. 26, 1367–1372 (2008).
doi: 10.1038/nbt.1511
pubmed: 19029910
Cox, J. et al. Andromeda: a peptide search engine integrated into the MaxQuant environment. J. Proteome Res. 10, 1794–1805 (2011).
doi: 10.1021/pr101065j
pubmed: 21254760
Zhang, J. et al. PEAKS DB: de novo sequencing assisted database search for sensitive and accurate peptide identification. Mol. Cell. Proteom. 11, M111.010587 (2012).
doi: 10.1074/mcp.M111.010587
Orlando, L. et al. Ancient DNA analysis. Nat. Rev. Methods Prim. 1, 1–26 (2021).
Renaud, G., Schubert, M., Sawyer, S. & Orlando, L. Authentication and assessment of contamination in ancient DNA. Methods Mol. Biol. 1963, 163–194 (2019).
doi: 10.1007/978-1-4939-9176-1_17
pubmed: 30875054
Radzicka, A. & Wolfenden, R. Rates of uncatalyzed peptide bond hydrolysis in neutral solution and the transition state affinities of proteases. J. Am. Chem. Soc. 118, 6105–6109 (1996).
doi: 10.1021/ja954077c
Iwata, T. et al. Processing of ameloblastin by MMP-20. J. Dent. Res. 86, 153–157 (2007).
doi: 10.1177/154405910708600209
pubmed: 17251515
Yamakoshi, Y., Hu, J. C.-C., Fukae, M., Yamakoshi, F. & Simmer, J. P. How do enamelysin and kallikrein 4 process the 32-kDa enamelin? Eur. J. Oral. Sci. 114, 45–51 (2006). 379–80.
doi: 10.1111/j.1600-0722.2006.00281.x
pubmed: 16674662
van Doorn, N. L., Wilson, J., Hollund, H., Soressi, M. & Collins, M. J. Site-specific deamidation of glutamine: a new marker of bone collagen deterioration. Rapid Commun. Mass Spectrom. 26, 2319–2327 (2012).
doi: 10.1002/rcm.6351
pubmed: 22956324
Schroeter, E. R. & Cleland, T. P. Glutamine deamidation: an indicator of antiquity, or preservational quality? Rapid Commun. Mass Spectrom. 30, 251–255 (2016).
doi: 10.1002/rcm.7445
pubmed: 26689157
Ramsøe, A. et al. DeamiDATE 1.0: site-specific deamidation as a tool to assess authenticity of members of ancient proteomes. J. Archaeol. Sci. 115, 105080 (2020).
doi: 10.1016/j.jas.2020.105080
Tagliabracci, V. S. et al. Secreted kinase phosphorylates extracellular proteins that regulate biomineralization. Science 336, 1150–1153 (2012).
pmcid: 3754843
doi: 10.1126/science.1217817
pubmed: 22582013
Penkman, K. E. H., Kaufman, D. S., Maddy, D. & Collins, M. J. Closed-system behaviour of the intra-crystalline fraction of amino acids in mollusc shells. Quat. Geochronol. 3, 2–25 (2008).
pmcid: 2727006
doi: 10.1016/j.quageo.2007.07.001
pubmed: 19684879
Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797 (2004).
pmcid: 390337
doi: 10.1093/nar/gkh340
pubmed: 15034147
Katoh, K., Misawa, K., Kuma, K.-I. & Miyata, T. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 30, 3059–3066 (2002).
pmcid: 135756
doi: 10.1093/nar/gkf436
pubmed: 12136088
Kearse, M. et al. Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics 28, 1647–1649 (2012).
pmcid: 3371832
doi: 10.1093/bioinformatics/bts199
pubmed: 22543367
Xiao, Y., Vecchi, M. M. & Wen, D. Distinguishing between leucine and isoleucine by integrated LC–MS analysis using an orbitrap fusion mass spectrometer. Anal. Chem. 88, 10757–10766 (2016).
doi: 10.1021/acs.analchem.6b03409
pubmed: 27704771
Gabriels, R., Martens, L. & Degroeve, S. Updated MS
pmcid: 6602496
doi: 10.1093/nar/gkz299
pubmed: 31028400
Tiwary, S. et al. High-quality MS/MS spectrum prediction for data-dependent and data-independent acquisition data analysis. Nat. Methods 16, 519–525 (2019).
doi: 10.1038/s41592-019-0427-6
pubmed: 31133761
Gessulat, S. et al. Prosit: proteome-wide prediction of peptide tandem mass spectra by deep learning. Nat. Methods 16, 509–518 (2019).
doi: 10.1038/s41592-019-0426-7
pubmed: 31133760
Wilhelm, M. et al. Deep learning boosts sensitivity of mass spectrometry-based immunopeptidomics. Nat. Commun. 12, 3346 (2021).
pmcid: 8184761
doi: 10.1038/s41467-021-23713-9
pubmed: 34099720
Gilbert, C. et al. Species identification of ivory and bone museum objects using minimally invasive proteomics. Sci. Adv. 10, eadi9028 (2024).
pmcid: 10816696
doi: 10.1126/sciadv.adi9028
pubmed: 38277452
Guindon, S. et al. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst. Biol. 59, 307–321 (2010).
doi: 10.1093/sysbio/syq010
pubmed: 20525638
Ronquist, F. et al. MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Syst. Biol. 61, 539–542 (2012).
pmcid: 3329765
doi: 10.1093/sysbio/sys029
pubmed: 22357727
Patramanis, I., Ramos-Madrigal, J., Cappellini, E. & Racimo, F. PaleoProPhyler: a reproducible pipeline for phylogenetic inference using ancient proteins. Peer Community J. 3, e112 (2023).
doi: 10.24072/pcjournal.344
Pamilo, P. & Nei, M. Relationships between gene trees and species trees. Mol. Biol. Evol. https://doi.org/10.1093/oxfordjournals.molbev.a040517 (1988).
Takahata, N. Gene genealogy in three related populations: consistency probability between gene and population trees. Genetics 122, 957–966 (1989).
pmcid: 1203770
doi: 10.1093/genetics/122.4.957
pubmed: 2759432
Maddison, W. P. Gene trees in species trees. Syst. Biol. 46, 523–536 (1997).
doi: 10.1093/sysbio/46.3.523
Nichols, R. Gene trees and species trees are not the same. Trends Ecol. Evol. 16, 358–364 (2001).
doi: 10.1016/S0169-5347(01)02203-0
pubmed: 11403868
Hobolth, A., Dutheil, J. Y., Hawks, J., Schierup, M. H. & Mailund, T. Incomplete lineage sorting patterns among human, chimpanzee, and orangutan suggest recent orangutan speciation and widespread selection. Genome Res. 21, 349–356 (2011).
pmcid: 3044849
doi: 10.1101/gr.114751.110
pubmed: 21270173
Mailund, T., Munch, K. & Schierup, M. H. Lineage sorting in apes. Annu. Rev. Genet. 48, 519–535 (2014).
doi: 10.1146/annurev-genet-120213-092532
pubmed: 25251849
Sousa, F., Bertrand, Y. J. K., Doyle, J. J., Oxelman, B. & Pfeil, B. E. Using genomic location and coalescent simulation to investigate gene tree discordance in Medicago L. Syst. Biol. 66, 934–949 (2017).
doi: 10.1093/sysbio/syx035
pubmed: 28177088
Scally, A. et al. Insights into hominid evolution from the gorilla genome sequence. Nature 483, 169–175 (2012).
pmcid: 3303130
doi: 10.1038/nature10842
pubmed: 22398555
Huerta-Sánchez, E. et al. Altitude adaptation in Tibetans caused by introgression of Denisovan-like DNA. Nature 512, 194–197 (2014).
pmcid: 4134395
doi: 10.1038/nature13408
pubmed: 25043035
Lanier, H. C., Huang, H. & Knowles, L. L. How low can you go? The effects of mutation rate on the accuracy of species-tree estimation. Mol. Phylogenet. Evol. 70, 112–119 (2014).
doi: 10.1016/j.ympev.2013.09.006
pubmed: 24060367
Madupe, P. P. et al. Enamel proteins reveal biological sex and genetic variability within southern African Paranthropus. Preprint at bioRxiv https://doi.org/10.1101/2023.07.03.547326 (2023).
Yu, Y., Yu, Y., Smith, M. & Pieper, R. A spinnable and automatable StageTip for high throughput peptide desalting and proteomics. Protoc. Exch. https://doi.org/10.1038/protex.2014.033 (2014).
doi: 10.1038/protex.2014.033
Prüfer, K. et al. The complete genome sequence of a Neanderthal from the Altai Mountains. Nature 505, 43–49 (2014).
doi: 10.1038/nature12886
pubmed: 24352235
UniProt: the universal protein knowledgebase. Nucleic Acids Res. 45, D158–D169 (2017).
O’Leary, N. A. et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 44, D733–D745 (2016).
doi: 10.1093/nar/gkv1189
pubmed: 26553804
Hall, T. BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids Symp. Ser. 41, 95–98 (1999).
Sievers, F. et al. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol. Syst. Biol. 7, 539 (2011).
pmcid: 3261699
doi: 10.1038/msb.2011.75
pubmed: 21988835
Kumar, S., Stecher, G., Li, M., Knyaz, C. & Tamura, K. MEGA X: molecular evolutionary genetics analysis across computing platforms. Mol. Biol. Evol. 35, 1547–1549 (2018).
pmcid: 5967553
doi: 10.1093/molbev/msy096
pubmed: 29722887
Posada, D. & Crandall, K. A. MODELTEST: testing the model of DNA substitution. Bioinformatics 14, 817–818 (1998).
doi: 10.1093/bioinformatics/14.9.817
pubmed: 9918953
Demarchi, B. et al. Protein sequences bound to mineral surfaces persist into deep time. eLife 5, e17092 (2016).
pmcid: 5039028
doi: 10.7554/eLife.17092
pubmed: 27668515
Lacruz, R. S., Habelitz, S., Wright, J. T. & Paine, M. L. Dental enamel formation and implications for oral health and disease. Physiol. Rev. 97, 939–993 (2017).
pmcid: 6151498
doi: 10.1152/physrev.00030.2016
pubmed: 28468833
Blausen.com staff. Medical gallery of Blausen Medical 2014. WikiJournal Med. https://doi.org/10.15347/wjm/2014.010 (2014).
Ahmadi, S. & Winter, D. Identification of poly(ethylene glycol) and poly(ethylene glycol)-based detergents using peptide search engines. Anal. Chem. 90, 6594–6600 (2018).
doi: 10.1021/acs.analchem.8b00365
pubmed: 29726681
Bartlett, J. D. Dental enamel development: proteinases and their enamel matrix substrates. ISRN Dent. 2013, 684607 (2013).
pmcid: 3789414
pubmed: 24159389
Lu, Y. et al. Functions of KLK4 and MMP-20 in dental enamel formation. Biol. Chem. 389, 695–700 (2008).
pmcid: 2688471
doi: 10.1515/BC.2008.080
pubmed: 18627287
The UniProt Consortium. UniProt: the Universal Protein Knowledgebase in 2023. Nucleic Acids Res. 51, D523–D531 (2023).
Sayers, E. W. et al. Database resources of the national center for biotechnology information. Nucleic Acids Res. 50, D20–D26 (2022).
doi: 10.1093/nar/gkab1112
pubmed: 34850941
Clark, K., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J. & Sayers, E. W. GenBank. Nucleic Acids Res. 44, D67–D72 (2016).
doi: 10.1093/nar/gkv1276
pubmed: 26590407
The NCBI C++ Toolkit. National Center for Biotechnology Information (2003).
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).
doi: 10.1016/S0022-2836(05)80360-2
pubmed: 2231712
Prüfer, K. et al. Computational challenges in the analysis of ancient DNA. Genome Biol. 11, R47 (2010).
pmcid: 2898072
doi: 10.1186/gb-2010-11-5-r47
pubmed: 20441577
Hendy, J. et al. A guide to ancient protein studies. Nat. Ecol. Evol. 2, 791–799 (2018).
doi: 10.1038/s41559-018-0510-x
pubmed: 29581591