A guide to creating design matrices for gene expression experiments.

Design matrix contrast matrix gene expression analysis model matrix statistical models

Journal

F1000Research
ISSN: 2046-1402
Titre abrégé: F1000Res
Pays: England
ID NLM: 101594320

Informations de publication

Date de publication:
2020
Historique:
accepted: 26 11 2020
entrez: 19 2 2021
pubmed: 20 2 2021
medline: 18 5 2021
Statut: epublish

Résumé

Differential expression analysis of genomic data types, such as RNA-sequencing experiments, use linear models to determine the size and direction of the changes in gene expression. For RNA-sequencing, there are several established software packages for this purpose accompanied with analysis pipelines that are well described. However, there are two crucial steps in the analysis process that can be a stumbling block for many -- the set up an appropriate model via design matrices and the set up of comparisons of interest via contrast matrices. These steps are particularly troublesome because an extensive catalogue for design and contrast matrices does not currently exist. One would usually search for example case studies across different platforms and mix and match the advice from those sources to suit the dataset they have at hand. This article guides the reader through the basics of how to set up design and contrast matrices. We take a practical approach by providing code and graphical representation of each case study, starting with simpler examples (e.g. models with a single explanatory variable) and move onto more complex ones (e.g. interaction models, mixed effects models, higher order time series and cyclical models). Although our work has been written specifically with a

Identifiants

pubmed: 33604029
doi: 10.12688/f1000research.27893.1
pmc: PMC7873980
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

1444

Informations de copyright

Copyright: © 2020 Law CW et al.

Déclaration de conflit d'intérêts

No competing interests were disclosed.

Références

Nucleic Acids Res. 2012 May;40(10):4288-97
pubmed: 22287627
Nucleic Acids Res. 2015 Apr 20;43(7):e47
pubmed: 25605792
F1000Res. 2020 Jun 4;9:512
pubmed: 32704355
Bioinformatics. 2010 Jan 1;26(1):139-40
pubmed: 19910308
F1000Res. 2016 Jun 17;5:1408
pubmed: 27441086
Nat Methods. 2015 Feb;12(2):115-21
pubmed: 25633503
Biostatistics. 2004 Jan;5(1):89-111
pubmed: 14744830
Stat Appl Genet Mol Biol. 2004;3:Article3
pubmed: 16646809

Auteurs

Charity W Law (CW)

The Walter and Eliza Hall Institute of Medical Research, Parkville, 3052, Australia.
Department of Medical Biology, The University of Melbourne, Parkville, 3010, Australia.

Kathleen Zeglinski (K)

The Walter and Eliza Hall Institute of Medical Research, Parkville, 3052, Australia.
Research and Development, CSL Limited, Bio21 Institute, Parkville, 3010, Australia.

Xueyi Dong (X)

The Walter and Eliza Hall Institute of Medical Research, Parkville, 3052, Australia.
Department of Medical Biology, The University of Melbourne, Parkville, 3010, Australia.

Monther Alhamdoosh (M)

Research and Development, CSL Limited, Bio21 Institute, Parkville, 3010, Australia.

Gordon K Smyth (GK)

The Walter and Eliza Hall Institute of Medical Research, Parkville, 3052, Australia.
School of Mathematics and Statistics, The University of Melbourne, Parkville, 3010, Australia.

Matthew E Ritchie (ME)

The Walter and Eliza Hall Institute of Medical Research, Parkville, 3052, Australia.
Department of Medical Biology, The University of Melbourne, Parkville, 3010, Australia.
School of Mathematics and Statistics, The University of Melbourne, Parkville, 3010, Australia.

Articles similaires

Coal Metagenome Phylogeny Bacteria Genome, Bacterial
Animals Lung India Sheep Transcriptome
Genome, Bacterial Virulence Phylogeny Genomics Plant Diseases
Host Specificity Bacteriophages Genomics Algorithms Escherichia coli

Classifications MeSH