CapTrap-seq: a platform-agnostic and quantitative approach for high-fidelity full-length RNA sequencing.
Journal
Nature communications
ISSN: 2041-1723
Titre abrégé: Nat Commun
Pays: England
ID NLM: 101528555
Informations de publication
Date de publication:
27 Jun 2024
27 Jun 2024
Historique:
received:
12
07
2023
accepted:
10
06
2024
medline:
28
6
2024
pubmed:
28
6
2024
entrez:
27
6
2024
Statut:
epublish
Résumé
Long-read RNA sequencing is essential to produce accurate and exhaustive annotation of eukaryotic genomes. Despite advancements in throughput and accuracy, achieving reliable end-to-end identification of RNA transcripts remains a challenge for long-read sequencing methods. To address this limitation, we develop CapTrap-seq, a cDNA library preparation method, which combines the Cap-trapping strategy with oligo(dT) priming to detect 5' capped, full-length transcripts. In our study, we evaluate the performance of CapTrap-seq alongside other widely used RNA-seq library preparation protocols in human and mouse tissues, employing both ONT and PacBio sequencing technologies. To explore the quantitative capabilities of CapTrap-seq and its accuracy in reconstructing full-length RNA molecules, we implement a capping strategy for synthetic RNA spike-in sequences that mimics the natural 5'cap formation. Our benchmarks, incorporating the Long-read RNA-seq Genome Annotation Assessment Project (LRGASP) data, demonstrate that CapTrap-seq is a competitive, platform-agnostic RNA library preparation method for generating full-length transcript sequences.
Identifiants
pubmed: 38937428
doi: 10.1038/s41467-024-49523-3
pii: 10.1038/s41467-024-49523-3
doi:
Substances chimiques
RNA
63231-63-0
RNA Caps
0
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
5278Informations de copyright
© 2024. The Author(s).
Références
Zhao, B. S., Roundtree, I. A. & He, C. Post-transcriptional gene regulation by mRNA modifications. Nat. Rev. Mol. Cell Biol. 18, 31–42 (2017).
pubmed: 27808276
doi: 10.1038/nrm.2016.132
Passmore, L. A. & Coller, J. Roles of mRNA poly(A) tails in regulation of eukaryotic gene expression. Nat. Rev. Mol. Cell Biol. 23, 93–106 (2022).
pubmed: 34594027
doi: 10.1038/s41580-021-00417-y
Ramanathan, A., Robb, G. B. & Chan, S.-H. mRNA capping: biological functions and applications. Nucleic Acids Res. 44, 7511–7526 (2016).
pubmed: 27317694
pmcid: 5027499
doi: 10.1093/nar/gkw551
Herzel, L., Ottoz, D. S. M., Alpert, T. & Neugebauer, K. M. Splicing and transcription touch base: co-transcriptional spliceosome assembly and function. Nat. Rev. Mol. Cell Biol. 18, 637–650 (2017).
pubmed: 28792005
pmcid: 5928008
doi: 10.1038/nrm.2017.63
Lagarde, J. et al. High-throughput annotation of full-length long noncoding RNAs with capture long-read sequencing. Nat. Genet. 49, 1731–1740 (2017).
pubmed: 29106417
pmcid: 5709232
doi: 10.1038/ng.3988
Fu, G. et al. Female-specific insect lethality engineered using alternative splicing. Nat. Biotechnol. 25, 353–357 (2007).
pubmed: 17322873
doi: 10.1038/nbt1283
Ferreira, P. G. et al. The effects of death and post-mortem cold ischemia on human tissue transcriptomes. Nat. Commun. 9, 490 (2018).
pubmed: 29440659
pmcid: 5811508
doi: 10.1038/s41467-017-02772-x
Qiu, J., Ma, X., Zeng, F. & Yan, J. RNA editing regulates lncRNA splicing in human early embryo development. PLoS Comput. Biol. 17, e1009630 (2021).
pubmed: 34851956
pmcid: 8668112
doi: 10.1371/journal.pcbi.1009630
Amarasinghe, S. L. et al. Opportunities and challenges in long-read sequencing data analysis. Genome Biol. 21, 30 (2020).
pubmed: 32033565
pmcid: 7006217
doi: 10.1186/s13059-020-1935-5
Zhu, Y. Y., Machleder, E. M., Chenchik, A., Li, R. & Siebert, P. D. Reverse transcriptase template switching: a SMART approach for full-length cDNA library construction. BioTechniques 30, 892–897 (2001).
pubmed: 11314272
doi: 10.2144/01304pf02
Ramsköld, D. et al. Full-length mRNA-Seq from single-cell levels of RNA and individual circulating tumor cells. Nat. Biotechnol. 30, 777–782 (2012).
pubmed: 22820318
pmcid: 3467340
doi: 10.1038/nbt.2282
Picelli, S. et al. Smart-seq2 for sensitive full-length transcriptome profiling in single cells. Nat. Methods 10, 1096–1098 (2013).
pubmed: 24056875
doi: 10.1038/nmeth.2639
Dijk, E. Lvan, Jaszczyszyn, Y. & Thermes, C. Library preparation methods for next-generation sequencing: tone down the bias. Exp. Cell Res. 322, 12–20 (2014).
pubmed: 24440557
doi: 10.1016/j.yexcr.2014.01.008
Roy, S. W. & Irimia, M. When good transcripts go bad: artifactual RT-PCR ‘splicing’ and genome analysis. BioEssays N. Rev. Mol. Cell. Dev. Biol. 30, 601–605 (2008).
Levin, J. Z. et al. Comprehensive comparative analysis of strand-specific RNA sequencing methods. Nat. Methods 7, 709–715 (2010).
pubmed: 20711195
pmcid: 3005310
doi: 10.1038/nmeth.1491
Kuo, R. I. et al. Illuminating the dark side of the human transcriptome with long read transcript sequencing. BMC Genomics 21, 751 (2020).
pubmed: 33126848
pmcid: 7596999
doi: 10.1186/s12864-020-07123-7
Ibrahim, F., Oppelt, J., Maragkakis, M. & Mourelatos, Z. TERA-Seq: true end-to-end sequencing of native RNA molecules for transcriptome characterization. Nucleic Acids Res. 49, e115 (2021).
pubmed: 34428294
pmcid: 8599856
doi: 10.1093/nar/gkab713
Jiang, F. et al. Long-read direct RNA sequencing by 5’-Cap capturing reveals the impact of Piwi on the widespread exonization of transposable elements in locusts. RNA Biol. 16, 950–959 (2019).
pubmed: 30982421
pmcid: 6546357
doi: 10.1080/15476286.2019.1602437
Bayega, A., Oikonomopoulos, S., Wang, Y. C. & Ragoussis, J. Improved Nanopore full-length cDNA sequencing by PCR-suppression. Front. Genet. 13, 1031355–1031366 (2022).
Begik, O. et al. Nano3P-seq: transcriptome-wide analysis of gene expression and tail dynamics using end-capture nanopore cDNA sequencing. Nat. Methods 20, 75–85 (2023).
pubmed: 36536091
doi: 10.1038/s41592-022-01714-w
Probst, V. et al. Benchmarking full-length transcript single cell mRNA sequencing protocols. BMC Genomics 23, 860 (2022).
pubmed: 36581800
pmcid: 9801581
doi: 10.1186/s12864-022-09014-5
Zhao, S., Zhang, Y., Gamini, R., Zhang, B. & Schack, Dvon Evaluation of two main RNA-seq approaches for gene quantification in clinical RNA sequencing: polyA+ selection versus rRNA depletion. Sci. Rep. 8, 4781 (2018).
pubmed: 29556074
pmcid: 5859127
doi: 10.1038/s41598-018-23226-4
Carninci, P. et al. High-efficiency full-length cDNA cloning by biotinylated CAP trapper. Genomics 37, 327–336 (1996).
pubmed: 8938445
doi: 10.1006/geno.1996.0567
Carninci, P. & Hayashizaki, Y. High-efficiency full-length cDNA cloning. Methods Enzymol. 303, 19–44 (1999).
pubmed: 10349636
doi: 10.1016/S0076-6879(99)03004-9
Morioka, M. S. et al. Cap analysis of gene expression (CAGE): a quantitative and genome-wide assay of transcription start sites. Methods Mol. Biol. 2120, 277–301 (2020).
pubmed: 32124327
doi: 10.1007/978-1-0716-0327-7_20
Grapotte, M. et al. Discovery of widespread transcription initiation at microsatellites predictable by sequence-based deep neural network. Nat. Commun. 12, 3297 (2021).
pubmed: 34078885
pmcid: 8172540
doi: 10.1038/s41467-021-23143-7
Frankish, A. et al. GENCODE reference annotation for the human and mouse genomes. Nucleic Acids Res. 47, D766–D773 (2019).
pubmed: 30357393
doi: 10.1093/nar/gky955
Frankish, A. et al. GENCODE: reference annotation for the human and mouse genomes in 2023. Nucleic Acids Res. 51, D942–D949 (2023).
pubmed: 36420896
doi: 10.1093/nar/gkac1071
Pardo-Palacios, F. J. et al. Systematic assessment of long-read RNA-seq methods for transcript identification and quantification. Nat. Methods https://doi.org/10.1038/s41592-024-02298-3 (2024).
Green, M. R. & Sambrook, J. Long and accurate polymerase chain reaction (LA PCR). Cold Spring Harb. Protoc. 2019, 188–191 (2019).
Cartolano, M., Huettel, B., Hartwig, B., Reinhardt, R. & Schneeberger, K. cDNA library enrichment of full length transcripts for SMRT long read sequencing. PLoS ONE 11, e0157779 (2016).
pubmed: 27327613
pmcid: 4915659
doi: 10.1371/journal.pone.0157779
Forrest, A. R. R. et al. A promoter-level mammalian expression atlas. Nature 507, 462–470 (2014).
pubmed: 24670764
doi: 10.1038/nature13182
Carninci, P. et al. Genome-wide analysis of mammalian promoter architecture and evolution. Nat. Genet. 38, 626–635 (2006).
pubmed: 16645617
doi: 10.1038/ng1789
Lopez, F., Granjeaud, S., Ara, T., Ghattas, B. & Gautheret, D. The disparate nature of ‘intergenic’ polyadenylation sites. RNA 12, 1794–1801 (2006).
pubmed: 16931874
pmcid: 1581981
doi: 10.1261/rna.136206
Statello, L., Guo, C.-J., Chen, L.-L. & Huarte, M. Gene regulation by long non-coding RNAs and its biological functions. Nat. Rev. Mol. Cell Biol. 22, 96–118 (2021).
pubmed: 33353982
doi: 10.1038/s41580-020-00315-9
Uszczynska-Ratajczak, B., Lagarde, J., Frankish, A., Guigó, R. & Johnson, R. Towards a complete map of the human long non-coding RNA transcriptome. Nat. Rev. Genet. 19, 535–548 (2018).
pubmed: 29795125
pmcid: 6451964
doi: 10.1038/s41576-018-0017-y
Coster, W. D., Weissensteiner, M. H. & Sedlazeck, F. J. Towards population-scale long-read sequencing. Nat. Rev. Genet. 22, 572–587 (2021).
pubmed: 34050336
pmcid: 8161719
doi: 10.1038/s41576-021-00367-3
Baker, S. C. et al. The external RNA controls consortium: a progress report. Nat. Methods 2, 731–734 (2005).
pubmed: 16179916
doi: 10.1038/nmeth1005-731
Paul, L. et al. SIRVs: spike-in RNA variants as external isoform controls in RNA-sequencing. Preprint at bioRxiv https://doi.org/10.1101/080747 (2016).
Hardwick, S. A. et al. Spliced synthetic genes as internal controls in RNA sequencing experiments. Nat. Methods 13, 792–798 (2016).
pubmed: 27502218
doi: 10.1038/nmeth.3958
Volden, R. et al. Improving nanopore read accuracy with the R2C2 method enables the sequencing of highly multiplexed full-length single-cell cDNA. Proc. Natl Acad. Sci. USA 115, 9726–9731 (2018).
pubmed: 30201725
pmcid: 6166824
doi: 10.1073/pnas.1806447115
Picelli, S. et al. Full-length RNA-seq from single cells using Smart-seq2. Nat. Protoc. 9, 171–181 (2014).
pubmed: 24385147
doi: 10.1038/nprot.2014.006
Lewin, H. A. et al. The Earth BioGenome Project 2020: starting the clock. Proc. Natl Acad. Sci. USA 119, e2115635118 (2022).
pubmed: 35042800
pmcid: 8795548
doi: 10.1073/pnas.2115635118
Carbonell-Sala, S. & Guigó, R. 5’ capping protocol to add 5’ cap structures to exogenous synthetic RNA references (spike-ins). https://doi.org/10.21203/rs.3.pex-2649/v1 (2024).
Carbonell-Sala, S. & Guigó, R. CapTrap-Seq cDNA library preparation for full-length RNA sequencing. https://doi.org/10.21203/rs.3.pex-2646/v1 (2024).
Shibata, Y. et al. Cloning full-length, cap-trapper-selected cDNAs by using the single-strand linker ligation method. Biotechniques 30, 1250–1254 (2001).
pubmed: 11414214
doi: 10.2144/01306st01
Islam, S. et al. Characterization of the single-cell transcriptional landscape by highly multiplex RNA-seq. Genome Res. 21, 1160–1167 (2011).
pubmed: 21543516
pmcid: 3129258
doi: 10.1101/gr.110882.110
Pertea, G. & Pertea, M. GFF Utilities: GffRead and GffCompare [version 1; peer review: 3 approved]. F1000Res. 9, ISCB (2020).
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
pubmed: 23104886
doi: 10.1093/bioinformatics/bts635
Ramírez, F. et al. deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Res. 44, W160–W165 (2016).
pubmed: 27079975
pmcid: 4987876
doi: 10.1093/nar/gkw257
Perteghella, T. The CapTrap-seq GitHub code and data repository. https://doi.org/10.5281/zenodo.1124228 .
Lagarde, J. The tmerge GitHub repository. https://doi.org/10.5281/zenodo.11261789 .