MOCCASIN: a method for correcting for known and unknown confounders in RNA splicing analysis.


Journal

Nature communications
ISSN: 2041-1723
Titre abrégé: Nat Commun
Pays: England
ID NLM: 101528555

Informations de publication

Date de publication:
07 06 2021
Historique:
received: 23 05 2020
accepted: 07 05 2021
entrez: 8 6 2021
pubmed: 9 6 2021
medline: 29 6 2021
Statut: epublish

Résumé

The effects of confounding factors on gene expression analysis have been extensively studied following the introduction of high-throughput microarrays and subsequently RNA sequencing. In contrast, there is a lack of equivalent analysis and tools for RNA splicing. Here we first assess the effect of confounders on both expression and splicing quantifications in two large public RNA-Seq datasets (TARGET, ENCODE). We show quantification of splicing variations are affected at least as much as those of gene expression, revealing unwanted sources of variations in both datasets. Next, we develop MOCCASIN, a method to correct the effect of both known and unknown confounders on RNA splicing quantification and demonstrate MOCCASIN's effectiveness on both synthetic and real data. Code, synthetic and corrected datasets are all made available as resources.

Identifiants

pubmed: 34099673
doi: 10.1038/s41467-021-23608-9
pii: 10.1038/s41467-021-23608-9
pmc: PMC8184769
doi:

Types de publication

Journal Article Research Support, N.I.H., Extramural

Langues

eng

Sous-ensembles de citation

IM

Pagination

3353

Subventions

Organisme : NIGMS NIH HHS
ID : R01 GM128096
Pays : United States
Organisme : NIGMS NIH HHS
ID : R35 GM118048
Pays : United States
Organisme : NIGMS NIH HHS
ID : T32 GM008216
Pays : United States
Organisme : NCI NIH HHS
ID : U01 CA232563
Pays : United States

Références

Nature. 2020 Jul;583(7818):711-719
pubmed: 32728246
BMC Bioinformatics. 2010 Feb 18;11:94
pubmed: 20167110
BMC Genomics. 2011 Dec 29;12:635
pubmed: 22206443
Nat Biotechnol. 2016 May;34(5):525-7
pubmed: 27043002
Nat Methods. 2010 Dec;7(12):1009-15
pubmed: 21057496
Nucleic Acids Res. 2014 Dec 1;42(21):
pubmed: 25294822
Bioinformatics. 2018 May 1;34(9):1488-1497
pubmed: 29236961
BMC Bioinformatics. 2011 Aug 04;12:323
pubmed: 21816040
Proc Natl Acad Sci U S A. 2014 Dec 2;111(48):17224-9
pubmed: 25413365
Nucleic Acids Res. 2019 Jan 8;47(D1):D766-D773
pubmed: 30357393
Genome Res. 2012 Oct;22(10):2008-17
pubmed: 22722343
Biostatistics. 2012 Jul;13(3):539-52
pubmed: 22101192
Nat Methods. 2017 Apr;14(4):417-419
pubmed: 28263959
Nat Biotechnol. 2014 Sep;32(9):896-902
pubmed: 25150836
Methods. 2003 Dec;31(4):265-73
pubmed: 14597310
Bioinformatics. 2011 Sep 15;27(18):2518-28
pubmed: 21775302
Nucleic Acids Res. 2015 Sep 18;43(16):7664-74
pubmed: 26202970
Biostatistics. 2007 Jan;8(1):118-27
pubmed: 16632515
Proc Natl Acad Sci U S A. 2013 Sep 17;110(38):15377-82
pubmed: 24003148
Nat Methods. 2012 Nov;9(11):1046
pubmed: 23281567
Bioinformatics. 2002;18 Suppl 1:S96-104
pubmed: 12169536
Genome Biol. 2020 Mar 16;21(1):69
pubmed: 32178699
Genome Biol. 2013 Jul 22;14(7):R74
pubmed: 23876401
Nat Genet. 2018 Nov;50(11):1584-1592
pubmed: 30297968
Nat Genet. 2018 Jan;50(1):151-158
pubmed: 29229983
Cell. 2019 Jan 24;176(3):549-563.e23
pubmed: 30661752
Biostatistics. 2016 Jan;17(1):29-39
pubmed: 26272994
Proc Natl Acad Sci U S A. 2014 Nov 11;111(45):16219-24
pubmed: 25349387
Elife. 2016 Feb 01;5:e11752
pubmed: 26829591
Bioinformatics. 2016 May 15;32(10):1479-85
pubmed: 26708335
PLoS Genet. 2007 Sep;3(9):1724-35
pubmed: 17907809

Auteurs

Barry Slaff (B)

Department of Computer and Information Sciences, School of Engineering, University of Pennsylvania, Philadelphia, PA, USA.

Caleb M Radens (CM)

Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
Graduate Group in Cell and Molecular Biology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.

Paul Jewell (P)

Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.

Anupama Jha (A)

Department of Computer and Information Sciences, School of Engineering, University of Pennsylvania, Philadelphia, PA, USA.

Nicholas F Lahens (NF)

Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.

Gregory R Grant (GR)

Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.

Andrei Thomas-Tikhonenko (A)

Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
Division of Cancer Pathobiology, Children's Hospital of Philadelphia, Philadelphia, PA, USA.

Kristen W Lynch (KW)

Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
Graduate Group in Cell and Molecular Biology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.

Yoseph Barash (Y)

Department of Computer and Information Sciences, School of Engineering, University of Pennsylvania, Philadelphia, PA, USA. yosephb@upenn.edu.
Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA. yosephb@upenn.edu.
Graduate Group in Cell and Molecular Biology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA. yosephb@upenn.edu.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C

Classifications MeSH