Systematic assessment of long-read RNA-seq methods for transcript identification and quantification.
Journal
Nature methods
ISSN: 1548-7105
Titre abrégé: Nat Methods
Pays: United States
ID NLM: 101215604
Informations de publication
Date de publication:
07 Jun 2024
07 Jun 2024
Historique:
received:
02
08
2021
accepted:
03
05
2024
medline:
8
6
2024
pubmed:
8
6
2024
entrez:
7
6
2024
Statut:
aheadofprint
Résumé
The Long-read RNA-Seq Genome Annotation Assessment Project Consortium was formed to evaluate the effectiveness of long-read approaches for transcriptome analysis. Using different protocols and sequencing platforms, the consortium generated over 427 million long-read sequences from complementary DNA and direct RNA datasets, encompassing human, mouse and manatee species. Developers utilized these data to address challenges in transcript isoform detection, quantification and de novo transcript detection. The study revealed that libraries with longer, more accurate sequences produce more accurate transcripts than those with increased read depth, whereas greater read depth improved quantification accuracy. In well-annotated genomes, tools based on reference sequences demonstrated the best performance. Incorporating additional orthogonal data and replicate samples is advised when aiming to detect rare and novel transcripts or using reference-free approaches. This collaborative study offers a benchmark for current practices and provides direction for future method development in transcriptome analysis.
Identifiants
pubmed: 38849569
doi: 10.1038/s41592-024-02298-3
pii: 10.1038/s41592-024-02298-3
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Subventions
Organisme : U.S. Department of Health & Human Services | NIH | National Institute of General Medical Sciences (NIGMS)
ID : R35GM138122
Organisme : U.S. Department of Health & Human Services | NIH | National Institute of General Medical Sciences (NIGMS)
ID : R35GM14264
Organisme : U.S. Department of Health & Human Services | NIH | National Human Genome Research Institute (NHGRI)
ID : U41HG007234
Organisme : U.S. Department of Health & Human Services | NIH | National Human Genome Research Institute (NHGRI)
ID : U41HG007234
Organisme : U.S. Department of Health & Human Services | NIH | National Human Genome Research Institute (NHGRI)
ID : R01HG008759
Organisme : U.S. Department of Health & Human Services | NIH | National Human Genome Research Institute (NHGRI)
ID : R01HG011469
Organisme : U.S. Department of Health & Human Services | NIH | National Human Genome Research Institute (NHGRI)
ID : R01GM136886
Organisme : NHGRI NIH HHS
ID : UM1 HG009443
Pays : United States
Organisme : U.S. Department of Health & Human Services | NIH | National Human Genome Research Institute (NHGRI)
ID : U41HG007234
Organisme : U.S. Department of Health & Human Services | NIH | National Human Genome Research Institute (NHGRI)
ID : R01HG008759
Organisme : U.S. Department of Health & Human Services | NIH | National Human Genome Research Institute (NHGRI)
ID : R01HG011469
Organisme : U.S. Department of Health & Human Services | NIH | National Human Genome Research Institute (NHGRI)
ID : R01GM136886
Organisme : U.S. Department of Health & Human Services | NIH | National Human Genome Research Institute (NHGRI)
ID : U41HG007234
Organisme : U.S. Department of Health & Human Services | NIH | National Human Genome Research Institute (NHGRI)
ID : F31HG010999
Organisme : U.S. Department of Health & Human Services | NIH | National Human Genome Research Institute (NHGRI)
ID : U41HG007234
Organisme : NHGRI NIH HHS
ID : UM1 HG009443
Pays : United States
Organisme : U.S. Department of Health & Human Services | NIH | National Human Genome Research Institute (NHGRI)
ID : U41HG007234
Organisme : U.S. Department of Health & Human Services | NIH | National Human Genome Research Institute (NHGRI)
ID : R01HG008759
Organisme : U.S. Department of Health & Human Services | NIH | National Human Genome Research Institute (NHGRI)
ID : R01HG011469
Organisme : U.S. Department of Health & Human Services | NIH | National Human Genome Research Institute (NHGRI)
ID : R01GM136886
Organisme : NHGRI NIH HHS
ID : UM1 HG009443
Pays : United States
Organisme : U.S. Department of Health & Human Services | NIH | National Human Genome Research Institute (NHGRI)
ID : U41HG007234
Organisme : U.S. Department of Health & Human Services | NIH | National Human Genome Research Institute (NHGRI)
ID : U41HG007234
Organisme : U.S. Department of Health & Human Services | NIH | National Human Genome Research Institute (NHGRI)
ID : U41HG007234
Organisme : U.S. Department of Health & Human Services | NIH | National Human Genome Research Institute (NHGRI)
ID : U41HG007234
Organisme : Saint Petersburg State University (St. Petersburg State University)
ID : 73023672
Organisme : Wellcome Trust (Wellcome)
ID : WT108749/Z/15/Z
Informations de copyright
© 2024. The Author(s).
Références
Reese, M. G. et al. Genome annotation assessment in Drosophila melanogaster. Genome Res. 10, 483–501 (2000).
doi: 10.1101/gr.10.4.483
pubmed: 10779488
pmcid: 310877
Guigó, R. et al. EGASP: the human ENCODE genome annotation assessment project. Genome Biol. 7, S2.1–31 (2006).
doi: 10.1186/gb-2006-7-s1-s2
pubmed: 16925836
Engström, P. G. et al. Systematic evaluation of spliced alignment programs for RNA-seq data. Nat. Methods 10, 1185–1191 (2013).
doi: 10.1038/nmeth.2722
pubmed: 24185836
pmcid: 4018468
Steijger, T. et al. Assessment of transcript reconstruction methods for RNA-seq. Nat. Methods 10, 1177–1184 (2013).
doi: 10.1038/nmeth.2714
pubmed: 24185837
pmcid: 3851240
Carbonell-Sala, S. et al. CapTrap-Seq: a platform-agnostic and quantitative approach for high-fidelity full-length RNA transcript sequencing. Preprint at bioRxiv https://doi.org/10.1101/2023.06.16.543444 (2023).
Volden, R. et al. Improving nanopore read accuracy with the R2C2 method enables the sequencing of highly multiplexed full-length single-cell cDNA. Proc. Natl Acad. Sci. USA 115, 9726–9731 (2018).
doi: 10.1073/pnas.1806447115
pubmed: 30201725
pmcid: 6166824
Carninci, P. et al. High-efficiency full-length cDNA cloning by biotinylated CAP trapper. Genomics 37, 327–336 (1996).
doi: 10.1006/geno.1996.0567
pubmed: 8938445
Pardo-Palacios, F. J. et al. SQANTI3: curation of long-read transcriptomes for accurate identification of known and novel isoforms. Nat. Methods https://doi.org/10.1038/s41592-024-02229-2 (2024).
doi: 10.1038/s41592-024-02229-2
pubmed: 38509328
pmcid: 11093726
Pardo-Palacios, F. et al. Systematic assessment of long-read RNA-seq methods for transcript identification and quantification. Res. Sq. https://doi.org/10.6084/m9.figshare.19642383.v1 (2021).
doi: 10.6084/m9.figshare.19642383.v1
Kawaji, H. Spectra, a Set of Scripts to Build Gene Models Based on Full-Length CDNA Reads (2021).
Li, W. Cdhit: Automatically Exported from Code.google.com/p/cdhit. GitHub https://github.com/weizhongli/cdhit (2019).
Chen, Y. et al. Context-aware transcript quantification from long-read RNA-seq data with Bambu. Nat. Methods https://doi.org/10.1038/s41592-023-01908-w (2023).
doi: 10.1038/s41592-023-01908-w
pubmed: 38036856
pmcid: 10870000
Tang, A. D., Hrabeta-Robinson, E., Volden, R., Vollmers, C. & Brooks, A. N. Detecting haplotype-specific transcript variation in long reads with FLAIR2. Preprint at bioRxiv https://doi.org/10.1101/2023.06.09.544396 (2023).
Tian, L. et al. Comprehensive characterization of single-cell full-length isoforms in human and mouse with long-read sequencing. Genome Biol. 22, 310 (2021).
doi: 10.1186/s13059-021-02525-6
pubmed: 34763716
pmcid: 8582192
Prjibelski, A. D. et al. Accurate isoform discovery with IsoQuant using long reads. Nat. Biotechnol. https://doi.org/10.1038/s41587-022-01565-y (2023).
doi: 10.1038/s41587-022-01565-y
pubmed: 36593406
pmcid: 10344776
Wyman, D. et al. A technology-agnostic long-read analysis pipeline for transcriptome discovery and quantification. Preprint at bioRxiv https://doi.org/10.1101/672931 (2020).
Çelik, M. H. & Mortazavi, A. Analysis of alternative polyadenylation from long-read or short-read RNA-seq with LAPA. Preprint at bioRxiv https://doi.org/10.1101/2022.11.08.515683 (2022).
Lienhard, M. et al. IsoTools: a flexible workflow for long-read transcriptome sequencing analysis. Bioinformatics https://doi.org/10.1093/bioinformatics/btad364 (2023).
doi: 10.1093/bioinformatics/btad364
pubmed: 37267159
pmcid: 10287928
Volden, R. et al. Identifying and quantifying isoforms from accurate full-length transcriptome sequencing reads with Mandalorion. Genome Biol. 24, 167 (2023).
doi: 10.1186/s13059-023-02999-6
pubmed: 37461039
pmcid: 10351160
Hafezqorani, S. et al. Trans-NanoSim characterizes and simulates nanopore RNA-sequencing data. Gigascience 9, giaa061 (2020).
doi: 10.1093/gigascience/giaa061
pubmed: 32520350
pmcid: 7285873
Wang, Y. IsoSeqSim: Iso-Seq reads simulator for PacBio and ONT full-length isoform sequencing technologies. GitHub https://github.com/yunhaowang/IsoSeqSim (2022).
Li, B. & Dewey, C. N. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinform. 12, 323 (2011).
doi: 10.1186/1471-2105-12-323
LRGASP Quantification Evaluation Server https://lrrna-seq-quantification.org/
Baker, S. C. et al. The External RNA Controls Consortium: a progress report. Nat. Methods 2, 731–734 (2005).
External RNA Controls Consortium. Proposed methods for testing and selecting the ERCC external RNA controls. BMC Genomics 6, 150 (2005).
doi: 10.1186/1471-2164-6-150
pmcid: 1325234
Nip, K. M. et al. Reference-free assembly of long-read transcriptome sequencing data with RNA-Bloom2. Nat. Commun. 14, 2940 (2023).
doi: 10.1038/s41467-023-38553-y
pubmed: 37217540
pmcid: 10202958
Bushmanova, E., Antipov, D., Lapidus, A. & Prjibelski, A. D. rnaSPAdes: a de novo transcriptome assembler and its application to RNA-seq data. Gigascience 8, giz100 (2019).
doi: 10.1093/gigascience/giz100
pubmed: 31494669
pmcid: 6736328
Manni, M., Berkeley, M. R., Seppey, M., Simão, F. A. & Zdobnov, E. M. BUSCO update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. Mol. Biol. Evol. 38, 4647–4654 (2021).
doi: 10.1093/molbev/msab199
pubmed: 34320186
pmcid: 8476166
Wilks, C. et al. recount3: summaries and queries for large-scale RNA-seq expression and splicing. Genome Biol. 22, 323 (2021).
doi: 10.1186/s13059-021-02533-6
pubmed: 34844637
pmcid: 8628444
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
doi: 10.1093/bioinformatics/bty191
pubmed: 29750242
pmcid: 6137996
Kuo, R. I. et al. Illuminating the dark side of the human transcriptome with long read transcript sequencing. BMC Genomics 21, 751 (2020).
doi: 10.1186/s12864-020-07123-7
pubmed: 33126848
pmcid: 7596999
Topfer, A. et al. IsoSeq v3: scalable de novo isoform discovery. GitHub https://github.com/PacificBiosciences/IsoSeq (2023).
Hu, Y. et al. LIQA: long-read isoform quantification and analysis. Genome Biol. 22, 182 (2021).
doi: 10.1186/s13059-021-02399-8
pubmed: 34140043
pmcid: 8212471
Gao, Y. et al. ESPRESSO: robust discovery and quantification of transcript isoforms from error-prone long-read RNA-seq data. Sci. Adv. 9, eabq5072 (2023).
doi: 10.1126/sciadv.abq5072
pubmed: 36662851
pmcid: 9858503
Capella-Gutierrez, S. et al. Lessons learned: recommendations for establishing critical periodic scientific benchmarking. Preprint at bioRxiv https://doi.org/10.1101/181677 (2017).
Lexogen. SIRVs (Spike-in RNA Variant Control Mixes); https://www.lexogen.com/sirvs/
Frankish, A. et al. GENCODE 2021. Nucleic Acids Res. 49, D916–D923 (2021).
doi: 10.1093/nar/gkaa1087
pubmed: 33270111
Nassar, L. R. et al. The UCSC Genome Browser database: 2023 update. Nucleic Acids Res. 51, D1188–D1195 (2023).
doi: 10.1093/nar/gkac1072
pubmed: 36420891
Raney, B. J. et al. Track data hubs enable visualization of user-defined genome-wide annotations on the UCSC Genome Browser. Bioinformatics 30, 1003–1005 (2014).
doi: 10.1093/bioinformatics/btt637
pubmed: 24227676
Searle, S. M. J., Gilbert, J., Iyer, V. & Clamp, M. The otter annotation system. Genome Res. 14, 963–970 (2004).
doi: 10.1101/gr.1864804
pubmed: 15123593
pmcid: 479127
GTEx Consortium. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330 (2020).
doi: 10.1126/science.aaz1776
Noguchi, S. et al. FANTOM5 CAGE profiles of human and mouse samples. Sci. Data 4, 170112 (2017).
doi: 10.1038/sdata.2017.112
pubmed: 28850106
pmcid: 5574368
Sonnhammer, E. L. & Durbin, R. A dot-matrix program with dynamic threshold control suited for genomic DNA and protein sequence analysis. Gene 167, GC1–10 (1995).
doi: 10.1016/0378-1119(95)00714-8
pubmed: 8566757
Morales, J. et al. A joint NCBI and EMBL-EBI transcript set for clinical genomics and research. Nature 604, 310–315 (2022).
doi: 10.1038/s41586-022-04558-8
pubmed: 35388217
pmcid: 9007741
Robinson, J. T. et al. Integrative genomics viewer. Nat. Biotechnol. 29, 24–26 (2011).
doi: 10.1038/nbt.1754
pubmed: 21221095
pmcid: 3346182
Brůna, T., Lomsadze, A. & Borodovsky, M. GeneMark-EP+: eukaryotic gene prediction with self-training in the space of genes and proteins. NAR Genom. Bioinform. 2, lqaa026 (2020).
doi: 10.1093/nargab/lqaa026
pubmed: 32440658
pmcid: 7222226
Verstrepen, L. et al. Expression, biological activities and mechanisms of action of A20 (TNFAIP3). Biochem. Pharmacol. 80, 2009–2020 (2010).
doi: 10.1016/j.bcp.2010.06.044
pubmed: 20599425
Zeng, P. et al. Secreted phosphoprotein 1 as a potential prognostic and immunotherapy biomarker in multiple human cancers. Bioengineered 13, 3221–3239 (2022).
doi: 10.1080/21655979.2021.2020391
pubmed: 35067176
pmcid: 8973783
Bouwman, A. C., van Daalen, K. R., Crnko, S., Ten Broeke, T. & Bovenschen, N. Intracellular and extracellular roles of Granzyme K. Front. Immunol. 12, 677707 (2021).
doi: 10.3389/fimmu.2021.677707
pubmed: 34017346
pmcid: 8129556
Sim, G. C. & Radvanyi, L. The IL-2 cytokine family in cancer immunotherapy. Cytokine Growth Factor Rev. 25, 377–390 (2014).
doi: 10.1016/j.cytogfr.2014.07.018
pubmed: 25200249
Garlanda, C., Dinarello, C. A. & Mantovani, A. The interleukin-1 family: back to the future. Immunity 39, 1003–1018 (2013).
doi: 10.1016/j.immuni.2013.11.010
pubmed: 24332029
pmcid: 3933951
Inforzato, A. et al. PTX3 as a paradigm for the interaction of pentraxins with the complement system. Semin. Immunol. 25, 79–85 (2013).
doi: 10.1016/j.smim.2013.05.002
pubmed: 23747040