Development of a knowledge graph framework to ease and empower translational approaches in plant research: a use-case on grain legumes.

OrthoLegKB Ortho_KB comparative omics gene expression graph database ontology orthology quantitative genetics

Journal

Frontiers in artificial intelligence
ISSN: 2624-8212
Titre abrégé: Front Artif Intell
Pays: Switzerland
ID NLM: 101770551

Informations de publication

Date de publication:
2023
Historique:
received: 21 03 2023
accepted: 10 07 2023
medline: 21 8 2023
pubmed: 21 8 2023
entrez: 21 8 2023
Statut: epublish

Résumé

While the continuing decline in genotyping and sequencing costs has largely benefited plant research, some key species for meeting the challenges of agriculture remain mostly understudied. As a result, heterogeneous datasets for different traits are available for a significant number of these species. As gene structures and functions are to some extent conserved through evolution, comparative genomics can be used to transfer available knowledge from one species to another. However, such a translational research approach is complex due to the multiplicity of data sources and the non-harmonized description of the data. Here, we provide two pipelines, referred to as structural and functional pipelines, to create a framework for a NoSQL graph-database (Neo4j) to integrate and query heterogeneous data from multiple species. We call this framework Orthology-driven knowledge base framework for translational research (Ortho_KB). The structural pipeline builds bridges across species based on orthology. The functional pipeline integrates biological information, including QTL, and RNA-sequencing datasets, and uses the backbone from the structural pipeline to connect orthologs in the database. Queries can be written using the Neo4j Cypher language and can, for instance, lead to identify genes controlling a common trait across species. To explore the possibilities offered by such a framework, we populated Ortho_KB to obtain OrthoLegKB, an instance dedicated to legumes. The proposed model was evaluated by studying the conservation of a flowering-promoting gene. Through a series of queries, we have demonstrated that our knowledge graph base provides an intuitive and powerful platform to support research and development programmes.

Identifiants

pubmed: 37601035
doi: 10.3389/frai.2023.1191122
pmc: PMC10435283
doi:

Types de publication

Journal Article

Langues

eng

Pagination

1191122

Informations de copyright

Copyright © 2023 Imbert, Kreplak, Flores, Aubert, Burstin and Tayeh.

Déclaration de conflit d'intérêts

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Références

Nucleic Acids Res. 2016 Jan 4;44(D1):D1189-94
pubmed: 26578557
Mol Biol Evol. 2021 Dec 9;38(12):5825-5829
pubmed: 34597405
Nat Plants. 2018 Dec;4(12):1017-1025
pubmed: 30397259
Ann Bot. 2021 Sep 3;128(4):481-496
pubmed: 34185828
Front Plant Sci. 2021 Nov 16;12:782574
pubmed: 34868184
Nucleic Acids Res. 2023 Jan 6;51(D1):D1539-D1548
pubmed: 36370099
Plant Biotechnol J. 2021 Aug;19(8):1670-1678
pubmed: 33750020
Nucleic Acids Res. 2022 Jan 7;50(D1):D996-D1003
pubmed: 34791415
Mol Plant. 2019 Jun 3;12(6):879-892
pubmed: 30639314
Nucleic Acids Res. 2020 Jan 8;48(D1):D1093-D1103
pubmed: 31680153
Genomics Inform. 2017 Mar;15(1):19-27
pubmed: 28416946
Plant Physiol. 2021 Feb 25;185(1):161-178
pubmed: 33631796
Nucleic Acids Res. 2012 Apr;40(7):e49
pubmed: 22217600
Nat Genet. 2019 Sep;51(9):1411-1422
pubmed: 31477930
Plant Physiol. 2011 Aug;156(4):2207-24
pubmed: 21685176
Plant Physiol. 2016 Aug;171(4):2343-57
pubmed: 27303025
Methods Mol Biol. 2022;2443:81-100
pubmed: 35037201
Bioinformatics. 2014 May 1;30(9):1236-40
pubmed: 24451626
Front Physiol. 2012 Aug 25;3:326
pubmed: 22934074
Nucleic Acids Res. 2021 Jan 8;49(D1):D344-D354
pubmed: 33156333
Plant Genome. 2021 Nov;14(3):e20121
pubmed: 34275211
Plant J. 2008 Feb;53(4):661-73
pubmed: 18269575
Mol Cell Proteomics. 2014 Oct;13(10):2765-75
pubmed: 24980485
Plant Cell. 2011 Jan;23(1):147-61
pubmed: 21282524
Mol Biol Evol. 2021 Jul 29;38(8):3033-3045
pubmed: 33822172
Sci Data. 2016 Mar 15;3:160018
pubmed: 26978244
Bioinformatics. 2011 Dec 15;27(24):3423-4
pubmed: 21949271
Nat Biotechnol. 2020 Mar;38(3):276-278
pubmed: 32055031
Mol Breed. 2016;36:9
pubmed: 26798323
Genome Biol Evol. 2018 Apr 1;10(5):1221-1236
pubmed: 29788250
Plant Cell Physiol. 2015 Jan;56(1):e1
pubmed: 25432968
Plant Direct. 2017 Jul 20;1(2):
pubmed: 31240274
Nature. 2017 Oct 19;550(7676):345-353
pubmed: 29019985
Mol Plant. 2021 Jan 4;14(1):27-39
pubmed: 33346062
Nat Methods. 2021 Apr;18(4):366-368
pubmed: 33828273
GigaByte. 2022 Jan 31;2022:gigabyte38
pubmed: 36824524
PLoS One. 2018 Nov 30;13(11):e0198270
pubmed: 30500839
Plant Genome. 2021 Nov;14(3):e20144
pubmed: 34643336
Nucleic Acids Res. 2022 Jan 7;50(D1):D20-D26
pubmed: 34850941
PeerJ. 2019 Mar 22;7:e6626
pubmed: 30923654
J Adv Res. 2022 Dec;42:315-329
pubmed: 36513421
F1000Res. 2015 Dec 30;4:1521
pubmed: 26925227
PLoS Comput Biol. 2020 Oct 5;16(10):e1008260
pubmed: 33017400
Nature. 2023 Mar;615(7953):652-659
pubmed: 36890232
Nat Biotechnol. 2022 May;40(5):692-702
pubmed: 35102292
Sci Rep. 2020 Feb 25;10(1):3416
pubmed: 32098982
PLoS One. 2016 Oct 5;11(10):e0163962
pubmed: 27706213
Methods Mol Biol. 2009;563:123-40
pubmed: 19597783
Nucleic Acids Res. 2006 Jan 1;34(Database issue):D741-4
pubmed: 16381971
Genesis. 2015 Aug;53(8):474-85
pubmed: 26201819
Genome Biol. 2019 Nov 14;20(1):238
pubmed: 31727128
Nucleic Acids Res. 2012 Jan;40(Database issue):D1221-9
pubmed: 22110036
Nucleic Acids Res. 2023 Jan 6;51(D1):D977-D985
pubmed: 36350656
BMC Plant Biol. 2018 Aug 16;18(1):172
pubmed: 30115030
Nucleic Acids Res. 2018 Jan 4;46(D1):D1168-D1180
pubmed: 29186578
Nucleic Acids Res. 2022 Jan 7;50(D1):D1468-D1474
pubmed: 34747486
Nat Commun. 2022 Jun 16;13(1):3479
pubmed: 35710823
Nat Biotechnol. 2017 Apr 11;35(4):316-319
pubmed: 28398311
Front Plant Sci. 2019 Jul 31;10:965
pubmed: 31428111
Nucleic Acids Res. 2021 Jan 8;49(D1):D412-D419
pubmed: 33125078
BMC Bioinformatics. 2017 Aug 10;18(1):367
pubmed: 28797229
Plant Cell Environ. 2014 May;37(5):1250-8
pubmed: 24237261
Database (Oxford). 2018 Jan 1;2018:
pubmed: 29688367
Sci Rep. 2021 Jul 2;11(1):13716
pubmed: 34215783
Nucleic Acids Res. 2012 Jan;40(Database issue):D465-71
pubmed: 22139938
Nucleic Acids Res. 2012 Jan;40(Database issue):D1178-86
pubmed: 22110026
J Exp Bot. 2022 Jun 24;73(12):3978-3990
pubmed: 35383838
Bioinformatics. 2017 Sep 15;33(18):2938-2940
pubmed: 28645171
DNA Res. 2008 Aug;15(4):227-39
pubmed: 18511435
BMC Genomics. 2021 Dec 15;22(1):898
pubmed: 34911432
Nucleic Acids Res. 2021 Jan 8;49(D1):D1472-D1479
pubmed: 33166388
Nat Methods. 2017 Apr;14(4):417-419
pubmed: 28263959
Nat Genet. 2022 Oct;54(10):1553-1563
pubmed: 36138232
Nucleic Acids Res. 2010 Jan;38(Database issue):D843-6
pubmed: 20008513
Gigascience. 2020 Sep 23;9(9):
pubmed: 32964225
Plant Physiol. 2012 Feb;158(2):590-600
pubmed: 22198273
Bioinformatics. 2010 Mar 15;26(6):841-2
pubmed: 20110278
Genome Biol. 2015 Aug 06;16:157
pubmed: 26243257
Plant Cell Physiol. 2021 Nov 17;62(9):1494-1500
pubmed: 34245304
Nucleic Acids Res. 2021 Sep 27;49(17):e101
pubmed: 34197621
J Mol Endocrinol. 2018 Jul 13;:
pubmed: 30006342
Hortic Res. 2020 Sep 21;7(1):153
pubmed: 33024567
Nucleic Acids Res. 2021 Jan 8;49(D1):D1452-D1463
pubmed: 33170273
PLoS Comput Biol. 2018 Jan 29;14(1):e1005968
pubmed: 29377902
Mol Biol Evol. 2020 Sep 1;37(9):2747-2762
pubmed: 32384156
Nucleic Acids Res. 2019 Jan 8;47(D1):D309-D314
pubmed: 30418610
Plant Physiol. 2007 Jun;144(2):648-61
pubmed: 17468223
Science. 2008 Apr 25;320(5875):486-8
pubmed: 18436778
Plant Methods. 2015 Feb 25;11:10
pubmed: 25774204

Auteurs

Baptiste Imbert (B)

Agroécologie, INRAE, Institut Agro, Univ. Bourgogne, Univ. Bourgogne Franche-Comté, Dijon, France.

Jonathan Kreplak (J)

Agroécologie, INRAE, Institut Agro, Univ. Bourgogne, Univ. Bourgogne Franche-Comté, Dijon, France.

Raphaël-Gauthier Flores (RG)

Université Paris-Saclay, INRAE, URGI, Versailles, France.
Université Paris-Saclay, INRAE, BioinfOmics, Plant Bioinformatics Facility, Versailles, France.

Grégoire Aubert (G)

Agroécologie, INRAE, Institut Agro, Univ. Bourgogne, Univ. Bourgogne Franche-Comté, Dijon, France.

Judith Burstin (J)

Agroécologie, INRAE, Institut Agro, Univ. Bourgogne, Univ. Bourgogne Franche-Comté, Dijon, France.

Nadim Tayeh (N)

Agroécologie, INRAE, Institut Agro, Univ. Bourgogne, Univ. Bourgogne Franche-Comté, Dijon, France.

Classifications MeSH