Nucleotide-level distance metrics to quantify alternative splicing implemented in TranD.


Journal

Nucleic acids research
ISSN: 1362-4962
Titre abrégé: Nucleic Acids Res
Pays: England
ID NLM: 0411011

Informations de publication

Date de publication:
21 Mar 2024
Historique:
accepted: 18 01 2024
revised: 29 11 2023
received: 21 07 2023
pubmed: 10 2 2024
medline: 10 2 2024
entrez: 10 2 2024
Statut: ppublish

Résumé

Advances in affordable transcriptome sequencing combined with better exon and gene prediction has motivated many to compare transcription across the tree of life. We develop a mathematical framework to calculate complexity and compare transcript models. Structural features, i.e. intron retention (IR), donor/acceptor site variation, alternative exon cassettes, alternative 5'/3' UTRs, are compared and the distance between transcript models is calculated with nucleotide level precision. All metrics are implemented in a PyPi package, TranD and output can be used to summarize splicing patterns for a transcriptome (1GTF) and between transcriptomes (2GTF). TranD output enables quantitative comparisons between: annotations augmented by empirical RNA-seq data and the original transcript models; transcript model prediction tools for longread RNA-seq (e.g. FLAIR versus Isoseq3); alternate annotations for a species (e.g. RefSeq vs Ensembl); and between closely related species. In C. elegans, Z. mays, D. melanogaster, D. simulans and H. sapiens, alternative exons were observed more frequently in combination with an alternative donor/acceptor than alone. Transcript models in RefSeq and Ensembl are linked and both have unique transcript models with empirical support. D. melanogaster and D. simulans, share many transcript models and long-read RNAseq data suggests that both species are under-annotated. We recommend combined references.

Identifiants

pubmed: 38340337
pii: 7606259
doi: 10.1093/nar/gkae056
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

e28

Subventions

Organisme : NIGMS NIH HHS
ID : R01GM128193
Pays : United States
Organisme : NIGMS NIH HHS
ID : R01GM128193
Pays : United States

Informations de copyright

© The Author(s) 2024. Published by Oxford University Press on behalf of Nucleic Acids Research.

Auteurs

Adalena Nanni (A)

Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL 32611, USA.
University of Florida Genetics Institute, University of Florida, Gainesville, FL 32611, USA.

James Titus-McQuillan (J)

University of North Carolina at Charlotte Department of Bioinformatics and Genomics Charlotte, NC, USA.

Kinfeosioluwa S Bankole (KS)

Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL 32611, USA.
University of Florida Genetics Institute, University of Florida, Gainesville, FL 32611, USA.

Francisco Pardo-Palacios (F)

Institute for Integrative Systems Biology. Spanish National Research Council, Paterna, Spain.

Sarah Signor (S)

Department of Biological Sciences, North Dakota State University, Fargo, ND, USA.

Srna Vlaho (S)

Department of Biological Sciences, University of Southern California, Los Angeles, CA, USA.

Oleksandr Moskalenko (O)

University of Florida Research Computing, University of Florida, Gainesville, FL 32611, USA.

Alison M Morse (AM)

Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL 32611, USA.
University of Florida Genetics Institute, University of Florida, Gainesville, FL 32611, USA.

Rebekah L Rogers (RL)

University of North Carolina at Charlotte Department of Bioinformatics and Genomics Charlotte, NC, USA.

Ana Conesa (A)

Institute for Integrative Systems Biology. Spanish National Research Council, Paterna, Spain.

Lauren M McIntyre (LM)

Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL 32611, USA.
University of Florida Genetics Institute, University of Florida, Gainesville, FL 32611, USA.

Classifications MeSH