Juxtapose: a gene-embedding approach for comparing co-expression networks.
Embedding
Evolution
Gene co-expression networks
Machine learning
Transcriptomics
Word2vec
Journal
BMC bioinformatics
ISSN: 1471-2105
Titre abrégé: BMC Bioinformatics
Pays: England
ID NLM: 100965194
Informations de publication
Date de publication:
16 Mar 2021
16 Mar 2021
Historique:
received:
02
10
2020
accepted:
01
03
2021
entrez:
17
3
2021
pubmed:
18
3
2021
medline:
13
4
2021
Statut:
epublish
Résumé
Gene co-expression networks (GCNs) are not easily comparable due to their complex structure. In this paper, we propose a tool, Juxtapose, together with similarity measures that can be utilized for comparative transcriptomics between a set of organisms. While we focus on its application to comparing co-expression networks across species in evolutionary studies, Juxtapose is also generalizable to co-expression network comparisons across tissues or conditions within the same species. A word embedding strategy commonly used in natural language processing was utilized in order to generate gene embeddings based on walks made throughout the GCNs. Juxtapose was evaluated based on its ability to embed the nodes of synthetic structures in the networks consistently while also generating biologically informative results. Evaluation of the techniques proposed in this research utilized RNA-seq datasets from GTEx, a multi-species experiment of prefrontal cortex samples from the Gene Expression Omnibus, as well as synthesized datasets. Biological evaluation was performed using gene set enrichment analysis and known gene relationships in literature. We show that Juxtapose is capable of globally aligning synthesized networks as well as identifying areas that are conserved in real gene co-expression networks without reliance on external biological information. Furthermore, output from a matching algorithm that uses cosine distance between GCN embeddings is shown to be an informative measure of similarity that reflects the amount of topological similarity between networks. Juxtapose can be used to align GCNs without relying on known biological similarities and enables post-hoc analyses using biological parameters, such as orthology of genes, or conserved or variable pathways. A development version of the software used in this paper is available at https://github.com/klovens/juxtapose.
Sections du résumé
BACKGROUND
BACKGROUND
Gene co-expression networks (GCNs) are not easily comparable due to their complex structure. In this paper, we propose a tool, Juxtapose, together with similarity measures that can be utilized for comparative transcriptomics between a set of organisms. While we focus on its application to comparing co-expression networks across species in evolutionary studies, Juxtapose is also generalizable to co-expression network comparisons across tissues or conditions within the same species.
METHODS
METHODS
A word embedding strategy commonly used in natural language processing was utilized in order to generate gene embeddings based on walks made throughout the GCNs. Juxtapose was evaluated based on its ability to embed the nodes of synthetic structures in the networks consistently while also generating biologically informative results. Evaluation of the techniques proposed in this research utilized RNA-seq datasets from GTEx, a multi-species experiment of prefrontal cortex samples from the Gene Expression Omnibus, as well as synthesized datasets. Biological evaluation was performed using gene set enrichment analysis and known gene relationships in literature.
RESULTS
RESULTS
We show that Juxtapose is capable of globally aligning synthesized networks as well as identifying areas that are conserved in real gene co-expression networks without reliance on external biological information. Furthermore, output from a matching algorithm that uses cosine distance between GCN embeddings is shown to be an informative measure of similarity that reflects the amount of topological similarity between networks.
CONCLUSIONS
CONCLUSIONS
Juxtapose can be used to align GCNs without relying on known biological similarities and enables post-hoc analyses using biological parameters, such as orthology of genes, or conserved or variable pathways.
AVAILABILITY
BACKGROUND
A development version of the software used in this paper is available at https://github.com/klovens/juxtapose.
Identifiants
pubmed: 33726666
doi: 10.1186/s12859-021-04055-1
pii: 10.1186/s12859-021-04055-1
pmc: PMC7968242
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
125Subventions
Organisme : Natural Sciences and Engineering Research Council of Canada
ID : 435655-201
Organisme : Natural Sciences and Engineering Research Council of Canada
ID : 2019-05977
Organisme : Natural Sciences and Engineering Research Council of Canada
ID : 2016-06172
Références
PLoS Genet. 2018 Aug 16;14(8):e1007518
pubmed: 30114187
Genome Biol. 2014 Aug 28;15(8):R100
pubmed: 25249401
BMC Bioinformatics. 2018 Jun 13;19(Suppl 8):213
pubmed: 29897320
Nucleic Acids Res. 2003 Jan 1;31(1):68-71
pubmed: 12519949
PLoS Biol. 2014 May 27;12(5):e1001871
pubmed: 24866127
Genome Biol. 2010;11(10):R106
pubmed: 20979621
Bioinformatics. 2014 Sep 1;30(17):i430-7
pubmed: 25161230
KDD. 2016 Aug;2016:855-864
pubmed: 27853626
Science. 2003 Oct 10;302(5643):249-55
pubmed: 12934013
Bioinformatics. 2012 Dec 1;28(23):3105-14
pubmed: 23047556
Sci Rep. 2017 Apr 19;7(1):953
pubmed: 28424527
Nature. 2014 Aug 28;512(7515):445-8
pubmed: 25164755
Nucleic Acids Res. 2002 Jan 1;30(1):207-10
pubmed: 11752295
Sci Rep. 2018 Sep 13;8(1):13729
pubmed: 30213980
Bioinformatics. 2015 Jul 15;31(14):2409-11
pubmed: 25792552
Brief Bioinform. 2018 May 1;19(3):472-481
pubmed: 28062413
Cancer Inform. 2010 Jun 30;9:121-37
pubmed: 20628593
Elife. 2019 Oct 08;8:
pubmed: 31591963
Nucleic Acids Res. 2019 May 21;47(9):e51
pubmed: 30847485
BMC Genomics. 2019 Dec 30;20(Suppl 12):1003
pubmed: 31888454
Front Genet. 2019 Apr 11;10:294
pubmed: 31031797
Bioinformatics. 2014 Sep 1;30(17):i438-44
pubmed: 25161231
Nucleic Acids Res. 2015 Apr 20;43(7):e47
pubmed: 25605792
BMC Evol Biol. 2016 May 21;16(1):113
pubmed: 27209096
Microarrays (Basel). 2015 Oct 16;4(4):432-53
pubmed: 27600233
Nucleic Acids Res. 2005 Jul 1;33(Web Server issue):W741-8
pubmed: 15980575
Front Plant Sci. 2016 Apr 08;7:444
pubmed: 27092161
BMC Genomics. 2019 Feb 4;20(Suppl 1):82
pubmed: 30712510
J Biol. 2009;8(3):33
pubmed: 19371447
Biostatistics. 2016 Jan;17(1):29-39
pubmed: 26272994
Front Genet. 2019 Jan 04;9:682
pubmed: 30662451
Integr Biol (Camb). 2012 Jul;4(7):734-43
pubmed: 22234340
Bioinformatics. 2010 Jan 1;26(1):139-40
pubmed: 19910308
Bioinformatics. 2009 Jun 15;25(12):i253-8
pubmed: 19477996
Bioinformatics. 2013 Jan 1;29(1):15-21
pubmed: 23104886
Plant Physiol. 2011 Jul;156(3):1244-56
pubmed: 21606319
PeerJ. 2014 Oct 09;2:e610
pubmed: 25320678
BMC Bioinformatics. 2008 Dec 29;9:559
pubmed: 19114008
F1000Res. 2016 Jun 17;5:1408
pubmed: 27441086
Circ Cardiovasc Genet. 2014 Aug;7(4):536-47
pubmed: 25140061
Bioinformatics. 2015 Jul 1;31(13):2182-9
pubmed: 25725498
BMC Evol Biol. 2006 Sep 12;6:70
pubmed: 16968540
BMC Evol Biol. 2015 Nov 20;15:259
pubmed: 26589719
Genome Res. 2003 Apr;13(4):703-16
pubmed: 12671006
J R Soc Interface. 2010 Sep 6;7(50):1341-54
pubmed: 20236959
BMC Bioinformatics. 2018 Aug 13;19(Suppl 9):284
pubmed: 30367568