The ENCODE4 long-read RNA-seq collection reveals distinct classes of transcript structure diversity.


Journal

bioRxiv : the preprint server for biology
Titre abrégé: bioRxiv
Pays: United States
ID NLM: 101680187

Informations de publication

Date de publication:
16 May 2023
Historique:
pubmed: 9 6 2023
medline: 9 6 2023
entrez: 9 6 2023
Statut: epublish

Résumé

The majority of mammalian genes encode multiple transcript isoforms that result from differential promoter use, changes in exonic splicing, and alternative 3' end choice. Detecting and quantifying transcript isoforms across tissues, cell types, and species has been extremely challenging because transcripts are much longer than the short reads normally used for RNA-seq. By contrast, long-read RNA-seq (LR-RNA-seq) gives the complete structure of most transcripts. We sequenced 264 LR-RNA-seq PacBio libraries totaling over 1 billion circular consensus reads (CCS) for 81 unique human and mouse samples. We detect at least one full-length transcript from 87.7% of annotated human protein coding genes and a total of 200,000 full-length transcripts, 40% of which have novel exon junction chains. To capture and compute on the three sources of transcript structure diversity, we introduce a gene and transcript annotation framework that uses triplets representing the transcript start site, exon junction chain, and transcript end site of each transcript. Using triplets in a simplex representation demonstrates how promoter selection, splice pattern, and 3' processing are deployed across human tissues, with nearly half of multi-transcript protein coding genes showing a clear bias toward one of the three diversity mechanisms. Evaluated across samples, the predominantly expressed transcript changes for 74% of protein coding genes. In evolution, the human and mouse transcriptomes are globally similar in types of transcript structure diversity, yet among individual orthologous gene pairs, more than half (57.8%) show substantial differences in mechanism of diversification in matching tissues. This initial large-scale survey of human and mouse long-read transcriptomes provides a foundation for further analyses of alternative transcript usage, and is complemented by short-read and microRNA data on the same samples and by epigenome data elsewhere in the ENCODE4 collection.

Identifiants

pubmed: 37292896
doi: 10.1101/2023.05.15.540865
pmc: PMC10245583
pii:
doi:

Types de publication

Preprint

Langues

eng

Subventions

Organisme : NIA NIH HHS
ID : U01 AG046152
Pays : United States
Organisme : NHGRI NIH HHS
ID : U01 HG009380
Pays : United States
Organisme : NIA NIH HHS
ID : U01 AG061356
Pays : United States
Organisme : NHGRI NIH HHS
ID : R01 HG012367
Pays : United States
Organisme : NIA NIH HHS
ID : R01 AG017917
Pays : United States
Organisme : NIA NIH HHS
ID : P30 AG010161
Pays : United States
Organisme : NHGRI NIH HHS
ID : U24 HG009446
Pays : United States
Organisme : NIA NIH HHS
ID : R01 AG015819
Pays : United States
Organisme : NHGRI NIH HHS
ID : UM1 HG009443
Pays : United States
Organisme : NIA NIH HHS
ID : P30 AG072975
Pays : United States
Organisme : NHGRI NIH HHS
ID : U24 HG009397
Pays : United States
Organisme : NHGRI NIH HHS
ID : UM1 HG009382
Pays : United States

Auteurs

Fairlie Reese (F)

Developmental and Cell Biology, University of California, Irvine, Irvine, USA.
Center for Complex Biological Systems, University of California, Irvine, Irvine, USA.

Brian Williams (B)

Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, USA.

Gabriela Balderrama-Gutierrez (G)

Developmental and Cell Biology, University of California, Irvine, Irvine, USA.
Center for Complex Biological Systems, University of California, Irvine, Irvine, USA.

Dana Wyman (D)

Center for Complex Biological Systems, University of California, Irvine, Irvine, USA.

Muhammed Hasan Çelik (MH)

Center for Complex Biological Systems, University of California, Irvine, Irvine, USA.

Elisabeth Rebboah (E)

Center for Complex Biological Systems, University of California, Irvine, Irvine, USA.

Narges Rezaie (N)

Center for Complex Biological Systems, University of California, Irvine, Irvine, USA.

Diane Trout (D)

Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, USA.

Milad Razavi-Mohseni (M)

Department of Biomedical Engineering, Johns Hopkins University, Baltimore, USA.
McKusick-Nathans Department of Genetic Medicine, Johns Hopkins University, Baltimore, USA.

Yunzhe Jiang (Y)

Program in Computational Biology and Bioinformatics, Yale University, New Haven, USA.
Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, USA.

Beatrice Borsari (B)

Program in Computational Biology and Bioinformatics, Yale University, New Haven, USA.
Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, USA.
Centre for Genomic Regulation, The Barcelona Institute of Science and Technology, Barcelona, Spain.

Samuel Morabito (S)

Center for Complex Biological Systems, University of California, Irvine, Irvine, USA.

Heidi Yahan Liang (HY)

Center for Complex Biological Systems, University of California, Irvine, Irvine, USA.

Cassandra J McGill (CJ)

Developmental and Cell Biology, University of California, Irvine, Irvine, USA.
Center for Complex Biological Systems, University of California, Irvine, Irvine, USA.

Sorena Rahmanian (S)

Center for Complex Biological Systems, University of California, Irvine, Irvine, USA.

Jasmine Sakr (J)

Center for Complex Biological Systems, University of California, Irvine, Irvine, USA.
Department of Pharmaceutical Sciences, University of California, Irvine, Irvine, USA.

Shan Jiang (S)

Developmental and Cell Biology, University of California, Irvine, Irvine, USA.
Center for Complex Biological Systems, University of California, Irvine, Irvine, USA.

Weihua Zeng (W)

Developmental and Cell Biology, University of California, Irvine, Irvine, USA.
Center for Complex Biological Systems, University of California, Irvine, Irvine, USA.

Klebea Carvalho (K)

Center for Complex Biological Systems, University of California, Irvine, Irvine, USA.

Annika K Weimer (AK)

Department of Genetics, Stanford University School of Medicine, Palo Alto, USA.

Louise A Dionne (LA)

The Jackson Laboratory, The Jackson Laboratory, Bar Harbor, USA.

Ariel McShane (A)

Cellular and Molecular Biology Program, University of Michigan, Ann Arbor, USA.
Department of Radiation Oncology, University of Michigan, Ann Arbor, USA.

Karan Bedi (K)

Department of Biostatistics, University of Michigan, Ann Arbor, USA.
Center for RNA Biomedicine and Rogel Cancer Center, University of Michigan, Ann Arbor, USA.

Shaimae I Elhajjajy (SI)

Program in Bioinformatics and Integrative Biology, University of Massachusetts Chan Medical School, Worcester, USA.

Sean Upchurch (S)

Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, USA.

Jennifer Jou (J)

Department of Genetics, Stanford University School of Medicine, Palo Alto, USA.

Ingrid Youngworth (I)

Department of Genetics, Stanford University School of Medicine, Palo Alto, USA.

Idan Gabdank (I)

Department of Genetics, Stanford University School of Medicine, Palo Alto, USA.

Paul Sud (P)

Department of Genetics, Stanford University School of Medicine, Palo Alto, USA.

Otto Jolanki (O)

Department of Genetics, Stanford University School of Medicine, Palo Alto, USA.

J Seth Strattan (JS)

Department of Genetics, Stanford University School of Medicine, Palo Alto, USA.

Meenakshi S Kagda (MS)

Department of Genetics, Stanford University School of Medicine, Palo Alto, USA.

Michael P Snyder (MP)

Department of Genetics, Stanford University School of Medicine, Palo Alto, USA.

Ben C Hitz (BC)

Department of Genetics, Stanford University School of Medicine, Palo Alto, USA.

Jill E Moore (JE)

Program in Bioinformatics and Integrative Biology, University of Massachusetts Chan Medical School, Worcester, USA.

Zhiping Weng (Z)

Program in Bioinformatics and Integrative Biology, University of Massachusetts Chan Medical School, Worcester, USA.

David Bennett (D)

Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, USA.
Department of Neurological Sciences, Rush University Medical Center, Chicago, USA.

Laura Reinholdt (L)

The Jackson Laboratory, The Jackson Laboratory, Bar Harbor, USA.

Mats Ljungman (M)

Center for RNA Biomedicine and Rogel Cancer Center, University of Michigan, Ann Arbor, USA.
Departments of Radiation Oncology and Environmental Health Sciences, University of Michigan, Ann Arbor, USA.

Michael A Beer (MA)

Department of Biomedical Engineering, Johns Hopkins University, Baltimore, USA.
McKusick-Nathans Department of Genetic Medicine, Johns Hopkins University, Baltimore, USA.

Mark B Gerstein (MB)

Program in Computational Biology and Bioinformatics, Yale University, New Haven, USA.
Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, USA.
Section on Biomedical Informatics and Data Science, Yale University, New Haven, USA.
Department of Statistics and Data Science, Yale University, New Haven, USA.
Department of Computer Science, Yale University, New Haven, USA.

Lior Pachter (L)

Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, USA.
Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, USA.

Roderic Guigó (R)

Centre for Genomic Regulation, The Barcelona Institute of Science and Technology, Barcelona, Spain.
Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona, Spain.

Barbara J Wold (BJ)

Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, USA.

Ali Mortazavi (A)

Developmental and Cell Biology, University of California, Irvine, Irvine, USA.
Center for Complex Biological Systems, University of California, Irvine, Irvine, USA.

Classifications MeSH