Noncanonical open reading frames encode functional proteins essential for cancer cell survival.
Journal
Nature biotechnology
ISSN: 1546-1696
Titre abrégé: Nat Biotechnol
Pays: United States
ID NLM: 9604648
Informations de publication
Date de publication:
06 2021
06 2021
Historique:
received:
18
02
2020
accepted:
16
12
2020
pubmed:
30
1
2021
medline:
28
8
2021
entrez:
29
1
2021
Statut:
ppublish
Résumé
Although genomic analyses predict many noncanonical open reading frames (ORFs) in the human genome, it is unclear whether they encode biologically active proteins. Here we experimentally interrogated 553 candidates selected from noncanonical ORF datasets. Of these, 57 induced viability defects when knocked out in human cancer cell lines. Following ectopic expression, 257 showed evidence of protein expression and 401 induced gene expression changes. Clustered regularly interspaced short palindromic repeat (CRISPR) tiling and start codon mutagenesis indicated that their biological effects required translation as opposed to RNA-mediated effects. We found that one of these ORFs, G029442-renamed glycine-rich extracellular protein-1 (GREP1)-encodes a secreted protein highly expressed in breast cancer, and its knockout in 263 cancer cell lines showed preferential essentiality in breast cancer-derived lines. The secretome of GREP1-expressing cells has an increased abundance of the oncogenic cytokine GDF15, and GDF15 supplementation mitigated the growth-inhibitory effect of GREP1 knockout. Our experiments suggest that noncanonical ORFs can express biologically active proteins that are potential therapeutic targets.
Identifiants
pubmed: 33510483
doi: 10.1038/s41587-020-00806-2
pii: 10.1038/s41587-020-00806-2
pmc: PMC8195866
mid: NIHMS1698134
doi:
Substances chimiques
Neoplasm Proteins
0
Types de publication
Letter
Research Support, N.I.H., Extramural
Langues
eng
Sous-ensembles de citation
IM
Pagination
697-704Subventions
Organisme : NICHD NIH HHS
ID : R01 HD091846
Pays : United States
Organisme : NIGMS NIH HHS
ID : R35 GM138192
Pays : United States
Organisme : NICHD NIH HHS
ID : R01 HD073104
Pays : United States
Organisme : NCI NIH HHS
ID : K12 CA090354
Pays : United States
Organisme : NCI NIH HHS
ID : R00 CA207865
Pays : United States
Références
Ewing, B. & Green, P. Analysis of expressed sequence tags indicates 35,000 human genes. Nat. Genet. 25, 232–234 (2000).
pubmed: 10835644
doi: 10.1038/76115
Fields, C., Adams, M. D., White, O. & Venter, J. C. How many genes in the human genome? Nat. Genet. 7, 345–346 (1994).
pubmed: 7920649
doi: 10.1038/ng0794-345
Liang, F. et al. Gene index analysis of the human genome estimates approximately 120,000 genes. Nat. Genet. 25, 239–240 (2000).
pubmed: 10835646
doi: 10.1038/76126
Omenn, G. S. et al. Progress on identifying and characterizing the human proteome: 2018 metrics from the HUPO Human Proteome Project. J. Proteome Res. 17, 4031–4041 (2018).
pubmed: 30099871
pmcid: 6387656
doi: 10.1021/acs.jproteome.8b00441
Ingolia, N. T. et al. Ribosome profiling reveals pervasive translation outside of annotated protein-coding genes. Cell Rep. 8, 1365–1379 (2014).
pubmed: 25159147
pmcid: 4216110
doi: 10.1016/j.celrep.2014.07.045
Ji, Z., Song, R., Regev, A. & Struhl, K. Many lncRNAs, 5′UTRs, and pseudogenes are translated and some are likely to express functional proteins. eLife 4, e08890 (2015).
pubmed: 26687005
pmcid: 4739776
doi: 10.7554/eLife.08890
Pertea, M. et al. CHESS: a new human gene catalog curated from thousands of large-scale RNA sequencing experiments reveals extensive transcriptional noise. Genome Biol. 19, 208 (2018).
pubmed: 30486838
pmcid: 6260756
doi: 10.1186/s13059-018-1590-2
van Heesch, S. et al. The translational landscape of the human heart. Cell 178, 242–260 (2019).
pubmed: 31155234
doi: 10.1016/j.cell.2019.05.010
Burge, C. & Karlin, S. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268, 78–94 (1997).
pubmed: 9149143
doi: 10.1006/jmbi.1997.0951
Dinger, M. E., Pang, K. C., Mercer, T. R. & Mattick, J. S. Differentiating protein-coding and noncoding RNA: challenges and ambiguities. PLoS Comput. Biol. 4, e1000176 (2008).
pubmed: 19043537
pmcid: 2518207
doi: 10.1371/journal.pcbi.1000176
Lander, E. S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).
pubmed: 11237011
doi: 10.1038/35057062
Mouse Genome Sequencing Consortium Initial sequencing and comparative analysis of the mouse genome. Nature 420, 520–562 (2002).
doi: 10.1038/nature01262
Mudge, J. M. et al. Discovery of high-confidence human protein-coding genes and exons by whole-genome PhyloCSF helps elucidate 118 GWAS loci. Genome Res. 29, 2073–2087 (2019).
pubmed: 31537640
pmcid: 6886504
doi: 10.1101/gr.246462.118
Banfai, B. et al. Long noncoding RNAs are rarely translated in two human cell lines. Genome Res. 22, 1646–1657 (2012).
pubmed: 22955977
pmcid: 3431482
doi: 10.1101/gr.134767.111
Jungreis, I. et al. Nearly all new protein-coding predictions in the CHESS database are not protein-coding. Preprint at bioRxiv https://doi.org/10.1101/360602 (2018).
Bazzini, A. A. et al. Identification of small ORFs in vertebrates using ribosome footprinting and evolutionary conservation. EMBO J. 33, 981–993 (2014).
pubmed: 24705786
pmcid: 4193932
doi: 10.1002/embj.201488411
Branca, R. M. et al. HiRIEF LC–MS enables deep proteome coverage and unbiased proteogenomics. Nat. Methods 11, 59–62 (2014).
pubmed: 24240322
doi: 10.1038/nmeth.2732
Cabili, M. N. et al. Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev. 25, 1915–1927 (2011).
pubmed: 21890647
pmcid: 3185964
doi: 10.1101/gad.17446611
Calviello, L. et al. Detecting actively translated open reading frames in ribosome profiling data. Nat. Methods 13, 165–170 (2016).
pubmed: 26657557
doi: 10.1038/nmeth.3688
Gao, X. et al. Quantitative profiling of initiating ribosomes in vivo. Nat. Methods 12, 147–153 (2015).
pubmed: 25486063
doi: 10.1038/nmeth.3208
Gascoigne, D. K. et al. Pinstripe: a suite of programs for integrating transcriptomic and proteomic datasets identifies novel proteins and improves differentiation of protein-coding and non-coding genes. Bioinformatics 28, 3042–3050 (2012).
pubmed: 23044541
doi: 10.1093/bioinformatics/bts582
Iyer, M. K. et al. The landscape of long noncoding RNAs in the human transcriptome. Nat. Genet. 47, 199–208 (2015).
pubmed: 25599403
pmcid: 4417758
doi: 10.1038/ng.3192
Kim, M. S. et al. A draft map of the human proteome. Nature 509, 575–581 (2014).
pubmed: 24870542
pmcid: 4403737
doi: 10.1038/nature13302
Koch, A. et al. A proteogenomics approach integrating proteomics and ribosome profiling increases the efficiency of protein identification and enables the discovery of alternative translation start sites. Proteomics 14, 2688–2698 (2014).
pubmed: 25156699
pmcid: 4391000
doi: 10.1002/pmic.201400180
Ma, J. et al. Discovery of human sORF-encoded polypeptides (SEPs) in cell lines and tissue. J. Proteome Res. 13, 1757–1765 (2014).
pubmed: 24490786
pmcid: 3993966
doi: 10.1021/pr401280w
Mackowiak, S. D. et al. Extensive identification and analysis of conserved small ORFs in animals. Genome Biol. 16, 179 (2015).
pubmed: 26364619
pmcid: 4568590
doi: 10.1186/s13059-015-0742-x
Ruiz-Orera, J., Messeguer, X., Subirana, J. A. & Alba, M. M. Long non-coding RNAs as a source of new peptides. eLife 3, e03523 (2014).
pubmed: 25233276
pmcid: 4359382
doi: 10.7554/eLife.03523
Schwaid, A. G. et al. Chemoproteomic discovery of cysteine-containing human short open reading frames. J. Am. Chem. Soc. 135, 16750–16753 (2013).
pubmed: 24152191
doi: 10.1021/ja406606j
Slavoff, S. A. et al. Peptidomic discovery of short open reading frame-encoded peptides in human cells. Nat. Chem. Biol. 9, 59–64 (2013).
pubmed: 23160002
doi: 10.1038/nchembio.1120
Sun, H. et al. Integration of mass spectrometry and RNA-seq data to confirm human ab initio predicted genes and lncRNAs. Proteomics 14, 2760–2768 (2014).
pubmed: 25339270
doi: 10.1002/pmic.201400174
Zhang, C. et al. Systematic analysis of missing proteins provides clues to help define all of the protein-coding genes on human chromosome 1. J. Proteome Res. 13, 114–125 (2014).
pubmed: 24256544
doi: 10.1021/pr400900j
Vanderperre, B. et al. Direct detection of alternative open reading frames translation products in human significantly expands the proteome. PLoS ONE 8, e70698 (2013).
pubmed: 23950983
pmcid: 3741303
doi: 10.1371/journal.pone.0070698
Wilhelm, M. et al. Mass-spectrometry-based draft of the human proteome. Nature 509, 582–587 (2014).
pubmed: 24870543
doi: 10.1038/nature13319
Subramanian, A. et al. A next generation connectivity map: L1000 platform and the first 1,000,000 profiles. Cell 171, 1437–1452 (2017).
pubmed: 29195078
pmcid: 5990023
doi: 10.1016/j.cell.2017.10.049
Nassa, M. et al. Analysis of human collagen sequences. Bioinformation 8, 26–33 (2012).
pubmed: 22359431
pmcid: 3282272
doi: 10.6026/97320630008026
Breit, S. N., Tsai, V. W. & Brown, D. A. Targeting obesity and cachexia: Identification of the GFRAL receptor-MIC-1/GDF15 pathway. Trends Mol. Med. 23, 1065–1067 (2017).
pubmed: 29129392
doi: 10.1016/j.molmed.2017.10.005
Mullican, S. E. & Rangwala, S. M. Uniting GDF15 and GFRAL: therapeutic opportunities in obesity and beyond. Trends Endocrinol. Metab. 29, 560–570 (2018).
pubmed: 29866502
doi: 10.1016/j.tem.2018.05.002
Baroni, M. et al. Distinct response to GDF15 knockdown in pediatric and adult glioblastoma cell lines. J. Neurooncol. 139, 51–60 (2018).
pubmed: 29671197
doi: 10.1007/s11060-018-2853-1
Huang, C. Y. et al. Molecular alterations in prostate carcinomas that associate with in vivo exposure to chemotherapy: identification of a cytoprotective mechanism involving growth differentiation factor 15. Clin. Cancer Res. 13, 5825–5833 (2007).
pubmed: 17908975
doi: 10.1158/1078-0432.CCR-07-1037
Ratnam, N. M. et al. NF-kappaB regulates GDF-15 to suppress macrophage surveillance during early tumor development. J. Clin. Invest. 127, 3796–3809 (2017).
pubmed: 28891811
pmcid: 5617672
doi: 10.1172/JCI91561
Corre, J. et al. Bioactivity and prognostic significance of growth differentiation factor GDF15 secreted by bone marrow mesenchymal stem cells in multiple myeloma. Cancer Res. 72, 1395–1406 (2012).
pubmed: 22301101
doi: 10.1158/0008-5472.CAN-11-0188
Peake, B. F., Eze, S. M., Yang, L., Castellino, R. C. & Nahta, R. Growth differentiation factor 15 mediates epithelial mesenchymal transition and invasion of breast cancers through IGF-1R-FoxM1 signaling. Oncotarget 8, 94393–94406 (2017).
pubmed: 29212236
pmcid: 5706882
doi: 10.18632/oncotarget.21765
Martinez, T. F. et al. Accurate annotation of human protein-coding small open reading frames. Nat. Chem. Biol. 16, 458–468 (2020).
pubmed: 31819274
doi: 10.1038/s41589-019-0425-0
Chen, J. et al. Pervasive functional translation of noncanonical human open reading frames. Science 367, 1140–1146 (2020).
pubmed: 32139545
pmcid: 7289059
doi: 10.1126/science.aay0262
Xie, W. et al. Epigenomic analysis of multilineage differentiation of human embryonic stem cells. Cell 153, 1134–1148 (2013).
pubmed: 23664764
pmcid: 3786220
doi: 10.1016/j.cell.2013.04.022
Chen, J. et al. Evolutionary analysis across mammals reveals distinct classes of long non-coding RNAs. Genome Biol. 17, 19 (2016).
pubmed: 26838501
pmcid: 4739325
doi: 10.1186/s13059-016-0880-9
Liu, S. J. et al. CRISPRi-based genome-scale identification of functional long noncoding RNA loci in human cells. Science 355, aah7111 (2017).
Petersen, T. N., Brunak, S., von Heijne, G. & Nielsen, H. SignalP 4.0: discriminating signal peptides from transmembrane regions. Nat. Methods 8, 785–786 (2011).
pubmed: 21959131
doi: 10.1038/nmeth.1701
Kelley, L. A., Mezulis, S., Yates, C. M., Wass, M. N. & Sternberg, M. J. The Phyre2 web portal for protein modeling, prediction and analysis. Nat. Protoc. 10, 845–858 (2015).
pubmed: 25950237
pmcid: 5298202
doi: 10.1038/nprot.2015.053
Domazet-Loso, T., Brajkovic, J. & Tautz, D. A phylostratigraphy approach to uncover the genomic history of major adaptations in metazoan lineages. Trends Genet. 23, 533–539 (2007).
pubmed: 18029048
doi: 10.1016/j.tig.2007.08.014
Domazet-Loso, T. et al. No evidence for phylostratigraphic bias impacting inferences on patterns of gene emergence and evolution. Mol. Biol. Evol. 34, 843–856 (2017).
pubmed: 28087778
pmcid: 5400388
Kumar, S., Stecher, G., Suleski, M. & Hedges, S. B. TimeTree: a resource for timelines, timetrees, and divergence times. Mol. Biol. Evol. 34, 1812–1819 (2017).
pubmed: 28387841
doi: 10.1093/molbev/msx116
Yang, X. et al. A public genome-scale lentiviral expression library of human ORFs. Nat. Methods 8, 659–661 (2011).
pubmed: 21706014
pmcid: 3234135
doi: 10.1038/nmeth.1638
Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005).
pubmed: 16199517
pmcid: 1239896
doi: 10.1073/pnas.0506580102
Ross, Z., Wickham, H., Robinson, D. Declutter your R workflow with tidy tools. Preprint at PeerJ https://peerj.com/preprints/3180.pdf (2017).
Enache, O. M. et al. The GCTx format and cmap{Py, R, M, J} packages: resources for optimized storage and integrated traversal of annotated dense matrices. Bioinformatics 35, 1427–1429 (2019).
pubmed: 30203022
doi: 10.1093/bioinformatics/bty784
Doench, J. G. et al. Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9. Nat. Biotechnol. 34, 184–191 (2016).
pubmed: 26780180
pmcid: 4744125
doi: 10.1038/nbt.3437
Piccioni, F., Younger, S. T. & Root, D. E. Pooled lentiviral-delivery genetic screens. Curr. Protoc. Mol. Biol. 121, 32.1.1–32.1.21 (2018).
doi: 10.1002/cpmb.52
Meyers, R. M. et al. Computational correction of copy number effect improves specificity of CRISPR-Cas9 essentiality screens in cancer cells. Nat. Genet. 49, 1779–1784 (2017).
pubmed: 29083409
pmcid: 5709193
doi: 10.1038/ng.3984
Hart, T., Brown, K. R., Sircoulomb, F., Rottapel, R. & Moffat, J. Measuring error rates in genomic perturbation screens: gold standards for human functional genomics. Mol. Syst. Biol. 10, 733 (2014).
pubmed: 24987113
pmcid: 4299491
doi: 10.15252/msb.20145216
Bae, S., Park, J. & Kim, J. S. Cas-OFFinder: a fast and versatile algorithm that searches for potential off-target sites of Cas9 RNA-guided endonucleases. Bioinformatics 30, 1473–1475 (2014).
pubmed: 24463181
pmcid: 4016707
doi: 10.1093/bioinformatics/btu048
Yu, C. et al. High-throughput identification of genotype-specific cancer vulnerabilities in mixtures of barcoded tumor cell lines. Nat. Biotechnol. 34, 419–423 (2016).
pubmed: 26928769
pmcid: 5508574
doi: 10.1038/nbt.3460
Pinello, L. et al. Analyzing CRISPR genome-editing experiments with CRISPResso. Nat. Biotechnol. 34, 695–697 (2016).
pubmed: 27404874
pmcid: 5242601
doi: 10.1038/nbt.3583
Niknafs, Y. S. et al. MiPanda: a resource for analyzing and visualizing next-generation sequencing transcriptomics data. Neoplasia 20, 1144–1149 (2018).
pubmed: 30268942
pmcid: 6171536
doi: 10.1016/j.neo.2018.09.001
Shevchenko, A., Wilm, M., Vorm, O. & Mann, M. Mass spectrometric sequencing of proteins silver-stained polyacrylamide gels. Anal. Chem. 68, 850–858 (1996).
pubmed: 8779443
doi: 10.1021/ac950914h
Peng, J. & Gygi, S. P. Proteomics: the move to mixtures. J. Mass Spectrom. 36, 1083–1091 (2001).
pubmed: 11747101
doi: 10.1002/jms.229
Eng, J. K., McCormack, A. L. & Yates, J. R. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom. 5, 976–989 (1994).
pubmed: 24226387
doi: 10.1016/1044-0305(94)80016-2
Beausoleil, S. A., Villen, J., Gerber, S. A., Rush, J. & Gygi, S. P. A probability-based approach for high-throughput protein phosphorylation analysis and site localization. Nat. Biotechnol. 24, 1285–1292 (2006).
pubmed: 16964243
doi: 10.1038/nbt1240
Jones, D. T. & Cozzetto, D. DISOPRED3: precise disordered region predictions with annotated protein-binding activity. Bioinformatics 31, 857–863 (2015).
pubmed: 25391399
doi: 10.1093/bioinformatics/btu744