Rp3: Ribosome profiling-assisted proteogenomics improves coverage and confidence during microprotein discovery.


Journal

Nature communications
ISSN: 2041-1723
Titre abrégé: Nat Commun
Pays: England
ID NLM: 101528555

Informations de publication

Date de publication:
09 Aug 2024
Historique:
received: 10 10 2023
accepted: 08 07 2024
medline: 10 8 2024
pubmed: 10 8 2024
entrez: 9 8 2024
Statut: epublish

Résumé

There has been a dramatic increase in the identification of non-canonical translation and a significant expansion of the protein-coding genome. Among the strategies used to identify unannotated small Open Reading Frames (smORFs) that encode microproteins, Ribosome profiling (Ribo-Seq) is the gold standard for the annotation of novel coding sequences by reporting on smORF translation. In Ribo-Seq, ribosome-protected footprints (RPFs) that map to multiple genomic sites are removed since they cannot be unambiguously assigned to a specific genomic location. Furthermore, RPFs necessarily result in short (25-34 nucleotides) reads, increasing the chance of multi-mapping alignments, such that smORFs residing in these regions cannot be identified by Ribo-Seq. Moreover, it has been challenging to identify protein evidence for Ribo-Seq. To solve this, we developed Rp3, a pipeline that integrates proteogenomics and Ribosome profiling to provide unambiguous evidence for a subset of microproteins missed by current Ribo-Seq pipelines. Here, we show that Rp3 maximizes proteomics detection and confidence of microprotein-encoding smORFs.

Identifiants

pubmed: 39122697
doi: 10.1038/s41467-024-50301-4
pii: 10.1038/s41467-024-50301-4
doi:

Substances chimiques

Proteins 0

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

6839

Informations de copyright

© 2024. The Author(s).

Références

Chothani, S. P. et al. A high-resolution map of human RNA translation. Mol. Cell 82, 2885–2899 (2022).
pubmed: 35841888 doi: 10.1016/j.molcel.2022.06.023
Martinez, T. F. et al. Accurate annotation of human protein-coding small open reading frames. Nat. Chem. Biol. 16, 458–468 (2020).
pubmed: 31819274 doi: 10.1038/s41589-019-0425-0
Mumtaz, M. A. S. & Couso, J. P. Ribosomal profiling adds new coding sequences to the proteome. Biochem. Soc. Trans. 43, 1271–1276 (2015).
pubmed: 26614672 doi: 10.1042/BST20150170
Aspden, J. L. et al. Extensive translation of small open reading frames revealed by Poly-Ribo-Seq. elife 3, e03528 (2014).
pubmed: 25144939 pmcid: 4359375 doi: 10.7554/eLife.03528
Orr, M. W., Mao, Y., Storz, G. & Qian, S. B. Alternative ORFs and small ORFs: shedding light on the dark proteome. Nucleic Acids Res. https://doi.org/10.1093/nar/gkz734 (2020).
Basrai, M. A., Hieter, P. & Boeke, J. D. Small open reading frames: beautiful needles in the haystack. Genome Res. 7, 768–771 (1997).
pubmed: 9267801 doi: 10.1101/gr.7.8.768
Ingolia, N. T., Brar, G. A., Rouskin, S., McGeachy, A. M. & Weissman, J. S. The ribosome profiling strategy for monitoring translation in vivo by deep sequencing of ribosome-protected mRNA fragments. Nat. Protoc. 7, 1534–1550 (2012).
pubmed: 22836135 pmcid: 3535016 doi: 10.1038/nprot.2012.086
Ingolia, N. T., Ghaemmaghami, S., Newman, J. R. & Weissman, J. S. Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling. Science 324, 218–223 (2009).
pubmed: 19213877 pmcid: 2746483 doi: 10.1126/science.1168978
Ji, Z., Song, R., Regev, A. & Struhl, K. Many lncRNAs, 5’UTRs, and pseudogenes are translated and some are likely to express functional proteins. elife 4, e08890 (2015).
pubmed: 26687005 pmcid: 4739776 doi: 10.7554/eLife.08890
Xiao, Z. et al. De novo annotation and characterization of the translatome with ribosome profiling data. Nucleic Acids Res. 46, e61–e61 (2018).
pubmed: 29538776 pmcid: 6007384 doi: 10.1093/nar/gky179
Li, K., Hope, C. M., Wang, X. A. & Wang, J.-P. RiboDiPA: a novel tool for differential pattern analysis in Ribo-seq data. Nucleic Acids Res. 48, 12016–12029 (2020).
pubmed: 33211868 pmcid: 7708064 doi: 10.1093/nar/gkaa1049
Fields, A. P. et al. A regression-based analysis of ribosome-profiling data reveals a conserved complexity to mammalian translation. Mol. Cell 60, 816–827 (2015).
pubmed: 26638175 pmcid: 4720255 doi: 10.1016/j.molcel.2015.11.013
Calviello, L. et al. Detecting actively translated open reading frames in ribosome profiling data. Nat. Methods 13, 165–170 (2016).
pubmed: 26657557 doi: 10.1038/nmeth.3688
Bazzini, A. A. et al. Identification of small ORFs in vertebrates using ribosome footprinting and evolutionary conservation. EMBO J. 33, 981–993 (2014).
pubmed: 24705786 pmcid: 4193932 doi: 10.1002/embj.201488411
Malone, B. et al. Bayesian prediction of RNA translation from ribosome profiling. Nucleic Acids Res. 45, 2960–2972 (2017).
pubmed: 28126919 pmcid: 5389577
Erhard, F. et al. Improved Ribo-seq enables identification of cryptic translation events. Nat. Methods 15, 363–366 (2018).
pubmed: 29529017 pmcid: 6152898 doi: 10.1038/nmeth.4631
Deschamps-Francoeur, G., Simoneau, J. & Scott, M. S. Handling multi-mapped reads in RNA-seq. Comput. Struct. Biotechnol. J. 18, 1569–1576 (2020).
pubmed: 32637053 pmcid: 7330433 doi: 10.1016/j.csbj.2020.06.014
Wolfe, K. H. & Shields, D. C. Molecular evidence for an ancient duplication of the entire yeast genome. Nature 387, 708–713 (1997).
pubmed: 9192896 doi: 10.1038/42711
McLysaght, A., Hokamp, K. & Wolfe, K. H. Extensive genomic duplication during early chordate evolution. Nat. Genet. 31, 200–204 (2002).
pubmed: 12032567 doi: 10.1038/ng884
Ohta, T. Role of gene duplication in evolution. Genome 31, 304–310 (1989).
pubmed: 2687099 doi: 10.1139/g89-048
Kazazian, H. H. Jr Mobile elements: drivers of genome evolution. Science 303, 1626–1632 (2004).
pubmed: 15016989 doi: 10.1126/science.1089670
Magadum, S., Banerjee, U., Murugan, P., Gangapur, D. & Ravikesavan, R. Gene duplication as a major force in evolution. J. Genet. 92, 155–161 (2013).
pubmed: 23640422 doi: 10.1007/s12041-013-0212-8
Goodwin, S., McPherson, J. D. & McCombie, W. R. Coming of age: ten years of next-generation sequencing technologies. Nat. Rev. Genet. 17, 333–351 (2016).
pubmed: 27184599 pmcid: 10373632 doi: 10.1038/nrg.2016.49
Zhu, Y. et al. Discovery of coding regions in the human genome by integrated proteogenomics analysis workflow. Nat. Commun. 9, 1–14 (2018).
Nesvizhskii, A. I. Proteogenomics: concepts, applications and computational strategies. Nat. Methods 11, 1114–1125 (2014).
pubmed: 25357241 pmcid: 4392723 doi: 10.1038/nmeth.3144
Martinez, T. F. et al. Profiling mouse brown and white adipocytes to identify metabolically relevant small ORFs and functional microproteins. Cell Metab. 35, 166–183 (2023).
pubmed: 36599300 pmcid: 9889109 doi: 10.1016/j.cmet.2022.12.004
Morgado-Palacin, L. et al. The TINCR ubiquitin-like microprotein is a tumor suppressor in squamous cell carcinoma. Nat. Commun. 14, 1328 (2023).
pubmed: 36899004 pmcid: 10006087 doi: 10.1038/s41467-023-36713-8
van Heesch, S. et al. The translational landscape of the human heart. Cell 178, 242–260 (2019).
pubmed: 31155234 doi: 10.1016/j.cell.2019.05.010
Zhang, Y., Fonslow, B. R., Shan, B., Baek, M.-C. & Yates, J. R. III Protein analysis by shotgun/bottom-up proteomics. Chem. Rev. 113, 2343–2394 (2013).
pubmed: 23438204 pmcid: 3751594 doi: 10.1021/cr3003533
Grewal, R. N., El Aribi, H., Harrison, A. G., Siu, K. M. & Hopkinson, A. C. Fragmentation of protonated tripeptides: the proline effect revisited. J. Phys. Chem. B 108, 4899–4908 (2004).
doi: 10.1021/jp031093k
Liao, Y., Smyth, G. K. & Shi, W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30, 923–930 (2014).
pubmed: 24227677 doi: 10.1093/bioinformatics/btt656
Bassani-Sternberg, M., Pletscher-Frankild, S., Jensen, L. J. & Mann, M. Mass spectrometry of human leukocyte antigen class I peptidomes reveals strong effects of protein abundance and turnover on antigen presentation*[S]. Mol. Cell. Proteom. 14, 658–673 (2015).
doi: 10.1074/mcp.M114.042812
Carvunis, A.-R. et al. Proto-genes and de novo gene birth. Nature 487, 370–374 (2012).
pubmed: 22722833 pmcid: 3401362 doi: 10.1038/nature11184
Suenaga, Y. et al. NCYM, a Cis-antisense gene of MYCN, encodes a de novo evolved protein that inhibits GSK3β resulting in the stabilization of MYCN in human neuroblastomas. PLoS Genet. 10, e1003996 (2014).
pubmed: 24391509 pmcid: 3879166 doi: 10.1371/journal.pgen.1003996
Van Oss, S. B. & Carvunis, A.-R. De novo gene birth. PLoS Genet. 15, e1008160 (2019).
pubmed: 31120894 pmcid: 6542195 doi: 10.1371/journal.pgen.1008160
D’Errico, I., Gadaleta, G. & Saccone, C. Pseudogenes in metazoa: origin and features. Brief. Funct. Genom. 3, 157–167 (2004).
doi: 10.1093/bfgp/3.2.157
Hancks, D. C. & Kazazian, H. H. Roles for retrotransposon insertions in human disease. Mob. DNA 7, 1–28 (2016).
doi: 10.1186/s13100-016-0065-9
Tong, G., Hah, N. & Martinez, T. F. Comparison of software packages for detecting unannotated translated small open reading frames by Ribo-seq. Brief. Bioinform. 25, bbae268 (2024).
pubmed: 38842510 pmcid: 11155197 doi: 10.1093/bib/bbae268
Ma, J., Saghatelian, A. & Shokhirev, M. N. The influence of transcript assembly on the proteogenomics discovery of microproteins. PLoS ONE 13, e0194518 (2018).
pubmed: 29584760 pmcid: 5870951 doi: 10.1371/journal.pone.0194518
Kong, A. T., Leprevost, F. V., Avtonomov, D. M., Mellacheruvu, D. & Nesvizhskii, A. I. MSFragger: ultrafast and comprehensive peptide identification in mass spectrometry-based proteomics. Nat. Methods 14, 513–520 (2017).
pubmed: 28394336 pmcid: 5409104 doi: 10.1038/nmeth.4256
Käll, L., Canterbury, J. D., Weston, J., Noble, W. S. & MacCoss, M. J. Semi-supervised learning for peptide identification from shotgun proteomics datasets. Nat. Methods 4, 923–925 (2007).
pubmed: 17952086 doi: 10.1038/nmeth1113
Yang, K. L. et al. MSBooster: improving peptide identification rates using deep learning-based features. Nat. Commun. 14, 4539 (2023).
pubmed: 37500632 pmcid: 10374903 doi: 10.1038/s41467-023-40129-9
Demichev, V., Messner, C. B., Vernardis, S. I., Lilley, K. S. & Ralser, M. DIA-NN: neural networks and interference correction enable deep proteome coverage in high throughput. Nat. Methods 17, 41–44 (2020).
pubmed: 31768060 doi: 10.1038/s41592-019-0638-x
Pertea, G. & Pertea, M. GFF utilities: GffRead and GffCompare. F1000Research 9, ISCB Comm J-304 (2020).
Katoh, K. & Standley, D. M. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772–780 (2013).
pubmed: 23329690 pmcid: 3603318 doi: 10.1093/molbev/mst010
Egorov, A. A. & Atkinson, G. C. uORF4u: a tool for annotation of conserved upstream open reading frames. Bioinformatics 39, btad323 (2023).
pubmed: 37184890 pmcid: 10219788 doi: 10.1093/bioinformatics/btad323
Maillet, N. Rapid Peptides Generator: fast and efficient in silico protein digestion. NAR Genom. Bioinform. 2, lqz004 (2020).
pubmed: 33575558 doi: 10.1093/nargab/lqz004
Subramanian, B., Gao, S., Lercher, M. J., Hu, S. & Chen, W.-H. Evolview v3: a webserver for visualization, annotation, and management of phylogenetic trees. Nucleic Acids Res. 47, W270–W275 (2019).
pubmed: 31114888 pmcid: 6602473 doi: 10.1093/nar/gkz357
De Souza, E. V. et al. Rp3: Ribosome Profiling-assisted Proteogenomics Improves Coverage and Confidence During Microprotein Discovery https://github.com/Eduardo-vsouza/rp3 (2024) 10.5281/zenodo.12092044.
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
pubmed: 23104886 doi: 10.1093/bioinformatics/bts635
Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33, 290–295 (2015).
pubmed: 25690850 pmcid: 4643835 doi: 10.1038/nbt.3122

Auteurs

Eduardo Vieira de Souza (E)

Centro de Pesquisas em Biologia Molecular e Funcional (CPBMF) and Instituto Nacional de Ciência e Tecnologia em Tuberculose (INCT-TB), Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS), Porto Alegre, Brazil.
Programa de Pós-Graduação em Biologia Celular e Molecular, Pontifícia Universidade Católica do Rio Grande do Sul, 90616-900, Porto Alegre, Rio Grande do Sul, Brazil.
Clayton Foundation Laboratories for Peptide Biology, Salk Institute for Biological Studies, La Jolla, CA, USA.

Angie L Bookout (A)

Novo Nordisk Research Center Seattle Inc., Seattle, WA, USA.

Christopher A Barnes (CA)

Novo Nordisk Research Center Seattle Inc., Seattle, WA, USA.

Brendan Miller (B)

Clayton Foundation Laboratories for Peptide Biology, Salk Institute for Biological Studies, La Jolla, CA, USA.

Pablo Machado (P)

Centro de Pesquisas em Biologia Molecular e Funcional (CPBMF) and Instituto Nacional de Ciência e Tecnologia em Tuberculose (INCT-TB), Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS), Porto Alegre, Brazil.
Programa de Pós-Graduação em Biologia Celular e Molecular, Pontifícia Universidade Católica do Rio Grande do Sul, 90616-900, Porto Alegre, Rio Grande do Sul, Brazil.

Luiz A Basso (LA)

Centro de Pesquisas em Biologia Molecular e Funcional (CPBMF) and Instituto Nacional de Ciência e Tecnologia em Tuberculose (INCT-TB), Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS), Porto Alegre, Brazil.
Programa de Pós-Graduação em Biologia Celular e Molecular, Pontifícia Universidade Católica do Rio Grande do Sul, 90616-900, Porto Alegre, Rio Grande do Sul, Brazil.

Cristiano V Bizarro (CV)

Centro de Pesquisas em Biologia Molecular e Funcional (CPBMF) and Instituto Nacional de Ciência e Tecnologia em Tuberculose (INCT-TB), Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS), Porto Alegre, Brazil. cristiano.bizarro@pucrs.br.
Programa de Pós-Graduação em Biologia Celular e Molecular, Pontifícia Universidade Católica do Rio Grande do Sul, 90616-900, Porto Alegre, Rio Grande do Sul, Brazil. cristiano.bizarro@pucrs.br.

Alan Saghatelian (A)

Clayton Foundation Laboratories for Peptide Biology, Salk Institute for Biological Studies, La Jolla, CA, USA. asaghatelian@salk.edu.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH