TopoFun: a machine learning method to improve the functional similarity of gene co-expression modules.


Journal

NAR genomics and bioinformatics
ISSN: 2631-9268
Titre abrégé: NAR Genom Bioinform
Pays: England
ID NLM: 101756213

Informations de publication

Date de publication:
Dec 2021
Historique:
received: 16 03 2021
revised: 22 09 2021
accepted: 13 10 2021
entrez: 11 11 2021
pubmed: 12 11 2021
medline: 12 11 2021
Statut: epublish

Résumé

A comprehensive, accurate functional annotation of genes is key to systems-level approaches. As functionally related genes tend to be co-expressed, one possible approach to identify functional modules or supplement existing gene annotations is to analyse gene co-expression. We describe TopoFun, a machine learning method that combines topological and functional information to improve the functional similarity of gene co-expression modules. Using LASSO, we selected topological descriptors that discriminated modules made of functionally related genes and random modules. Using the selected topological descriptors, we performed linear discriminant analysis to construct a topological score that predicted the type of a module, random-like or functional-like. We combined the topological score with a functional similarity score in a fitness function that we used in a genetic algorithm to explore the co-expression network. To illustrate the use of TopoFun, we started from a subset of the Gene Ontology Biological Processes (GO-BPs) and showed that TopoFun efficiently retrieved genes that we omitted, and aggregated a number of novel genes to the initial GO-BP while improving module topology and functional similarity. Using an independent protein-protein interaction database, we confirmed that the novel genes gathered by TopoFun were functionally related to the original gene set.

Identifiants

pubmed: 34761220
doi: 10.1093/nargab/lqab103
pii: lqab103
pmc: PMC8573820
doi:

Types de publication

Journal Article

Langues

eng

Pagination

lqab103

Informations de copyright

© The Author(s) 2021. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics.

Références

Trends Genet. 2003 May;19(5):238-42
pubmed: 12711213
Sci Adv. 2017 May 03;3(5):e1602548
pubmed: 28508065
Front Genet. 2019 Mar 05;10:155
pubmed: 30891064
Nat Commun. 2018 Mar 15;9(1):1090
pubmed: 29545622
Nucleic Acids Res. 2020 Jan 8;48(D1):D498-D503
pubmed: 31691815
BMC Bioinformatics. 2005 Sep 14;6:227
pubmed: 16162296
BMC Bioinformatics. 2008 Dec 29;9:559
pubmed: 19114008
Bioinformatics. 2007 May 15;23(10):1274-81
pubmed: 17344234
BMC Syst Biol. 2014;8 Suppl 2:S7
pubmed: 25032889
Bioinformatics. 2008 Jan 15;24(2):282-4
pubmed: 18006545
Nat Rev Genet. 2013 Oct;14(10):719-32
pubmed: 24045689
Bioinformatics. 2012 Sep 1;28(17):2272-3
pubmed: 22782548
PLoS Biol. 2018 Sep 18;16(9):e2006643
pubmed: 30226837
Science. 2002 May 3;296(5569):910-3
pubmed: 11988575
DNA Res. 2009 Oct;16(5):249-60
pubmed: 19767600
BMC Bioinformatics. 2006 Mar 20;7 Suppl 1:S7
pubmed: 16723010
Biochim Biophys Acta Gene Regul Mech. 2017 Jan;1860(1):53-63
pubmed: 27485388
IEEE/ACM Trans Comput Biol Bioinform. 2018 Jan-Feb;15(1):168-180
pubmed: 27723603
Nucleic Acids Res. 2012 Jan;40(Database issue):D109-14
pubmed: 22080510
Nucleic Acids Res. 2018 Jul 2;46(W1):W60-W64
pubmed: 29912392
Nucleic Acids Res. 2008 Jan;36(Database issue):D13-21
pubmed: 18045790
Phys Rev E Stat Nonlin Soft Matter Phys. 2014 Dec;90(6):062805
pubmed: 25615146
BMC Syst Biol. 2007 Jun 04;1:24
pubmed: 17547772
Nucleic Acids Res. 2013 Jan;41(Database issue):D1014-20
pubmed: 23203868
BMC Syst Biol. 2017 Apr 12;11(1):47
pubmed: 28403906
Stat Med. 1990 Jul;9(7):811-8
pubmed: 2218183
Sci Rep. 2018 Jul 18;8(1):10885
pubmed: 30022075
Bioinformatics. 2005 Mar;21(6):788-93
pubmed: 15509611
NPJ Syst Biol Appl. 2017 Mar 3;3:6
pubmed: 28649433
Genome Res. 2015 Mar;25(3):353-67
pubmed: 25614607
Bioinformatics. 2017 Feb 15;33(4):612-614
pubmed: 27993773
Methods. 2018 Jan 1;132:19-25
pubmed: 28941788
Nucleic Acids Res. 2021 Jan 8;49(D1):D605-D612
pubmed: 33237311
Nature. 1999 Dec 2;402(6761 Suppl):C47-52
pubmed: 10591225
Bioinformatics. 2003 Sep 22;19(14):1787-99
pubmed: 14512350
Nat Methods. 2019 Sep;16(9):843-852
pubmed: 31471613
Bioinformatics. 2007 Apr 1;23(7):850-8
pubmed: 17267429
Nucleic Acids Res. 2015 Jan;43(Database issue):D1124-32
pubmed: 25361971
Front Plant Sci. 2014 Aug 19;5:394
pubmed: 25191328
Nucleic Acids Res. 2017 Jan 4;45(D1):D331-D338
pubmed: 27899567
Science. 2003 Oct 10;302(5643):249-55
pubmed: 12934013
Proc IEEE Symp Comput Intell Bioinforma Comput Biol. 2004 Oct 7;2004:25-31
pubmed: 25664345
PLoS One. 2015 Feb 26;10(2):e0117988
pubmed: 25719748
Bioinformatics. 2017 May 15;33(10):1536-1544
pubmed: 28069594
Sci Rep. 2019 Mar 12;9(1):4192
pubmed: 30862804

Auteurs

Ali Janbain (A)

IGF, Univ Montpellier, CNRS, INSERM, Montpellier 34094, France.

Christelle Reynès (C)

IGF, Univ Montpellier, CNRS, INSERM, Montpellier 34094, France.

Zainab Assaghir (Z)

Applied Mathematics Department, Lebanese University, Beirut 1003, Lebanon.

Hassan Zeineddine (H)

Applied Mathematics Department, Lebanese University, Beirut 1003, Lebanon.

Robert Sabatier (R)

IGF, Univ Montpellier, CNRS, INSERM, Montpellier 34094, France.

Laurent Journot (L)

IGF, Univ Montpellier, CNRS, INSERM, Montpellier 34094, France.

Classifications MeSH