Impact of adaptive filtering on power and false discovery rate in RNA-seq experiments.

Gene expression Gene filter Multiple testing Next generation sequencing

Journal

BMC bioinformatics
ISSN: 1471-2105
Titre abrégé: BMC Bioinformatics
Pays: England
ID NLM: 100965194

Informations de publication

Date de publication:
24 Sep 2022
Historique:
received: 28 09 2020
accepted: 13 09 2022
entrez: 24 9 2022
pubmed: 25 9 2022
medline: 28 9 2022
Statut: epublish

Résumé

In RNA-sequencing studies a large number of hypothesis tests are performed to compare the differential expression of genes between several conditions. Filtering has been proposed to remove candidate genes with a low expression level which may not be relevant and have little or no chance of showing a difference between conditions. This step may reduce the multiple testing burden and increase power. We show in a simulation study that filtering can lead to some increase in power for RNA-sequencing data, too aggressive filtering, however, can lead to a decline. No uniformly optimal filter in terms of power exists. Depending on the scenario different filters may be optimal. We propose an adaptive filtering strategy which selects one of several filters to maximise the number of rejections. No additional adjustment for multiplicity has to be included, but a rule has to be considered if the number of rejections is too small. For a large range of simulation scenarios, the adaptive filter maximises the power while the simulated False Discovery Rate is bounded by the pre-defined significance level. Using the adaptive filter, it is not necessary to pre-specify a single individual filtering method optimised for a specific scenario.

Sections du résumé

BACKGROUND BACKGROUND
In RNA-sequencing studies a large number of hypothesis tests are performed to compare the differential expression of genes between several conditions. Filtering has been proposed to remove candidate genes with a low expression level which may not be relevant and have little or no chance of showing a difference between conditions. This step may reduce the multiple testing burden and increase power.
RESULTS RESULTS
We show in a simulation study that filtering can lead to some increase in power for RNA-sequencing data, too aggressive filtering, however, can lead to a decline. No uniformly optimal filter in terms of power exists. Depending on the scenario different filters may be optimal. We propose an adaptive filtering strategy which selects one of several filters to maximise the number of rejections. No additional adjustment for multiplicity has to be included, but a rule has to be considered if the number of rejections is too small.
CONCLUSIONS CONCLUSIONS
For a large range of simulation scenarios, the adaptive filter maximises the power while the simulated False Discovery Rate is bounded by the pre-defined significance level. Using the adaptive filter, it is not necessary to pre-specify a single individual filtering method optimised for a specific scenario.

Identifiants

pubmed: 36153479
doi: 10.1186/s12859-022-04928-z
pii: 10.1186/s12859-022-04928-z
pmc: PMC9509565
doi:

Substances chimiques

RNA 63231-63-0

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

388

Informations de copyright

© 2022. The Author(s).

Références

BMC Bioinformatics. 2010 Sep 07;11:450
pubmed: 20822518
Science. 2008 Aug 15;321(5891):956-60
pubmed: 18599741
Proc Natl Acad Sci U S A. 2010 May 25;107(21):9546-51
pubmed: 20460310
Genome Biol. 2014;15(12):550
pubmed: 25516281
Nucleic Acids Res. 2015 Apr 20;43(7):e47
pubmed: 25605792
Stat Appl Genet Mol Biol. 2015 Nov;14(5):429-42
pubmed: 26461844
Genome Biol. 2019 Jun 4;20(1):118
pubmed: 31164141
Bioinformatics. 2013 Sep 1;29(17):2146-52
pubmed: 23821648
Bioinformatics. 2010 Jan 1;26(1):139-40
pubmed: 19910308
Biom J. 2014 Jul;56(4):614-30
pubmed: 24753160
Bioinformatics. 2010 Apr 15;26(8):1050-6
pubmed: 20189938
Stat Methods Med Res. 2013 Oct;22(5):519-36
pubmed: 22127579
BMC Genomics. 2016 Jan 05;17:28
pubmed: 26732976
PLoS One. 2011 Mar 24;6(3):e17820
pubmed: 21455293
BMC Genomics. 2019 Nov 7;20(1):820
pubmed: 31699041
Bioinformatics. 2015 Jan 15;31(2):233-41
pubmed: 25273110
Stat Med. 2010 Jan 15;29(1):1-13
pubmed: 19844944
Genet Epidemiol. 2002 Jun;23(1):70-86
pubmed: 12112249
PeerJ. 2014 Sep 23;2:e576
pubmed: 25337456
Bioinformatics. 2015 Jul 1;31(13):2131-40
pubmed: 25725090
PLoS One. 2014 Jun 13;9(6):e99625
pubmed: 24926665
BMC Bioinformatics. 2005 May 16;6:120
pubmed: 15904488
Genome Biol. 2010;11(3):R25
pubmed: 20196867
Nat Cell Biol. 2015 Apr;17(4):365-75
pubmed: 25730472
PLoS Biol. 2010 Sep 14;8(9):
pubmed: 20856902
Genome Biol. 2014 Feb 03;15(2):R29
pubmed: 24485249
Nature. 2013 Jul 4;499(7456):43-9
pubmed: 23792563
Bioinformatics. 2016 Mar 15;32(6):850-8
pubmed: 26576654
BMC Bioinformatics. 2013 Mar 09;14:91
pubmed: 23497356
BMC Bioinformatics. 2008 Jul 09;9:303
pubmed: 18613966
Nucleic Acids Res. 2013 Nov;41(21):e198
pubmed: 24049071

Auteurs

Sonja Zehetmayer (S)

Center for Medical Statistics, Informatics, and Intelligent Systems, Medical University of Vienna, Spitalgasse, Vienna, Austria. sonja.zehetmayer@meduniwien.ac.at.

Martin Posch (M)

Center for Medical Statistics, Informatics, and Intelligent Systems, Medical University of Vienna, Spitalgasse, Vienna, Austria.

Alexandra Graf (A)

Center for Medical Statistics, Informatics, and Intelligent Systems, Medical University of Vienna, Spitalgasse, Vienna, Austria.

Articles similaires

Humans Meta-Analysis as Topic Sample Size Models, Statistical Computer Simulation
Animals Lung India Sheep Transcriptome
Humans Algorithms Software Artificial Intelligence Computer Simulation
Humans Robotic Surgical Procedures Clinical Competence Male Female

Classifications MeSH