Impact of adaptive filtering on power and false discovery rate in RNA-seq experiments.
Gene expression
Gene filter
Multiple testing
Next generation sequencing
Journal
BMC bioinformatics
ISSN: 1471-2105
Titre abrégé: BMC Bioinformatics
Pays: England
ID NLM: 100965194
Informations de publication
Date de publication:
24 Sep 2022
24 Sep 2022
Historique:
received:
28
09
2020
accepted:
13
09
2022
entrez:
24
9
2022
pubmed:
25
9
2022
medline:
28
9
2022
Statut:
epublish
Résumé
In RNA-sequencing studies a large number of hypothesis tests are performed to compare the differential expression of genes between several conditions. Filtering has been proposed to remove candidate genes with a low expression level which may not be relevant and have little or no chance of showing a difference between conditions. This step may reduce the multiple testing burden and increase power. We show in a simulation study that filtering can lead to some increase in power for RNA-sequencing data, too aggressive filtering, however, can lead to a decline. No uniformly optimal filter in terms of power exists. Depending on the scenario different filters may be optimal. We propose an adaptive filtering strategy which selects one of several filters to maximise the number of rejections. No additional adjustment for multiplicity has to be included, but a rule has to be considered if the number of rejections is too small. For a large range of simulation scenarios, the adaptive filter maximises the power while the simulated False Discovery Rate is bounded by the pre-defined significance level. Using the adaptive filter, it is not necessary to pre-specify a single individual filtering method optimised for a specific scenario.
Sections du résumé
BACKGROUND
BACKGROUND
In RNA-sequencing studies a large number of hypothesis tests are performed to compare the differential expression of genes between several conditions. Filtering has been proposed to remove candidate genes with a low expression level which may not be relevant and have little or no chance of showing a difference between conditions. This step may reduce the multiple testing burden and increase power.
RESULTS
RESULTS
We show in a simulation study that filtering can lead to some increase in power for RNA-sequencing data, too aggressive filtering, however, can lead to a decline. No uniformly optimal filter in terms of power exists. Depending on the scenario different filters may be optimal. We propose an adaptive filtering strategy which selects one of several filters to maximise the number of rejections. No additional adjustment for multiplicity has to be included, but a rule has to be considered if the number of rejections is too small.
CONCLUSIONS
CONCLUSIONS
For a large range of simulation scenarios, the adaptive filter maximises the power while the simulated False Discovery Rate is bounded by the pre-defined significance level. Using the adaptive filter, it is not necessary to pre-specify a single individual filtering method optimised for a specific scenario.
Identifiants
pubmed: 36153479
doi: 10.1186/s12859-022-04928-z
pii: 10.1186/s12859-022-04928-z
pmc: PMC9509565
doi:
Substances chimiques
RNA
63231-63-0
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
388Informations de copyright
© 2022. The Author(s).
Références
BMC Bioinformatics. 2010 Sep 07;11:450
pubmed: 20822518
Science. 2008 Aug 15;321(5891):956-60
pubmed: 18599741
Proc Natl Acad Sci U S A. 2010 May 25;107(21):9546-51
pubmed: 20460310
Genome Biol. 2014;15(12):550
pubmed: 25516281
Nucleic Acids Res. 2015 Apr 20;43(7):e47
pubmed: 25605792
Stat Appl Genet Mol Biol. 2015 Nov;14(5):429-42
pubmed: 26461844
Genome Biol. 2019 Jun 4;20(1):118
pubmed: 31164141
Bioinformatics. 2013 Sep 1;29(17):2146-52
pubmed: 23821648
Bioinformatics. 2010 Jan 1;26(1):139-40
pubmed: 19910308
Biom J. 2014 Jul;56(4):614-30
pubmed: 24753160
Bioinformatics. 2010 Apr 15;26(8):1050-6
pubmed: 20189938
Stat Methods Med Res. 2013 Oct;22(5):519-36
pubmed: 22127579
BMC Genomics. 2016 Jan 05;17:28
pubmed: 26732976
PLoS One. 2011 Mar 24;6(3):e17820
pubmed: 21455293
BMC Genomics. 2019 Nov 7;20(1):820
pubmed: 31699041
Bioinformatics. 2015 Jan 15;31(2):233-41
pubmed: 25273110
Stat Med. 2010 Jan 15;29(1):1-13
pubmed: 19844944
Genet Epidemiol. 2002 Jun;23(1):70-86
pubmed: 12112249
PeerJ. 2014 Sep 23;2:e576
pubmed: 25337456
Bioinformatics. 2015 Jul 1;31(13):2131-40
pubmed: 25725090
PLoS One. 2014 Jun 13;9(6):e99625
pubmed: 24926665
BMC Bioinformatics. 2005 May 16;6:120
pubmed: 15904488
Genome Biol. 2010;11(3):R25
pubmed: 20196867
Nat Cell Biol. 2015 Apr;17(4):365-75
pubmed: 25730472
PLoS Biol. 2010 Sep 14;8(9):
pubmed: 20856902
Genome Biol. 2014 Feb 03;15(2):R29
pubmed: 24485249
Nature. 2013 Jul 4;499(7456):43-9
pubmed: 23792563
Bioinformatics. 2016 Mar 15;32(6):850-8
pubmed: 26576654
BMC Bioinformatics. 2013 Mar 09;14:91
pubmed: 23497356
BMC Bioinformatics. 2008 Jul 09;9:303
pubmed: 18613966
Nucleic Acids Res. 2013 Nov;41(21):e198
pubmed: 24049071