Nucleotide-level distance metrics to quantify alternative splicing implemented in TranD.
Journal
Nucleic acids research
ISSN: 1362-4962
Titre abrégé: Nucleic Acids Res
Pays: England
ID NLM: 0411011
Informations de publication
Date de publication:
21 Mar 2024
21 Mar 2024
Historique:
accepted:
18
01
2024
revised:
29
11
2023
received:
21
07
2023
pubmed:
10
2
2024
medline:
10
2
2024
entrez:
10
2
2024
Statut:
ppublish
Résumé
Advances in affordable transcriptome sequencing combined with better exon and gene prediction has motivated many to compare transcription across the tree of life. We develop a mathematical framework to calculate complexity and compare transcript models. Structural features, i.e. intron retention (IR), donor/acceptor site variation, alternative exon cassettes, alternative 5'/3' UTRs, are compared and the distance between transcript models is calculated with nucleotide level precision. All metrics are implemented in a PyPi package, TranD and output can be used to summarize splicing patterns for a transcriptome (1GTF) and between transcriptomes (2GTF). TranD output enables quantitative comparisons between: annotations augmented by empirical RNA-seq data and the original transcript models; transcript model prediction tools for longread RNA-seq (e.g. FLAIR versus Isoseq3); alternate annotations for a species (e.g. RefSeq vs Ensembl); and between closely related species. In C. elegans, Z. mays, D. melanogaster, D. simulans and H. sapiens, alternative exons were observed more frequently in combination with an alternative donor/acceptor than alone. Transcript models in RefSeq and Ensembl are linked and both have unique transcript models with empirical support. D. melanogaster and D. simulans, share many transcript models and long-read RNAseq data suggests that both species are under-annotated. We recommend combined references.
Identifiants
pubmed: 38340337
pii: 7606259
doi: 10.1093/nar/gkae056
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
e28Subventions
Organisme : NIGMS NIH HHS
ID : R01GM128193
Pays : United States
Organisme : NIGMS NIH HHS
ID : R01GM128193
Pays : United States
Informations de copyright
© The Author(s) 2024. Published by Oxford University Press on behalf of Nucleic Acids Research.