Spliceator: multi-species splice site prediction using convolutional neural networks.

Convolutional neural network Data quality Deep learning Genome annotation Splice site prediction

Journal

BMC bioinformatics
ISSN: 1471-2105
Titre abrégé: BMC Bioinformatics
Pays: England
ID NLM: 100965194

Informations de publication

Date de publication:
23 Nov 2021
Historique:
received: 04 07 2021
accepted: 09 11 2021
entrez: 24 11 2021
pubmed: 25 11 2021
medline: 26 11 2021
Statut: epublish

Résumé

Ab initio prediction of splice sites is an essential step in eukaryotic genome annotation. Recent predictors have exploited Deep Learning algorithms and reliable gene structures from model organisms. However, Deep Learning methods for non-model organisms are lacking. We developed Spliceator to predict splice sites in a wide range of species, including model and non-model organisms. Spliceator uses a convolutional neural network and is trained on carefully validated data from over 100 organisms. We show that Spliceator achieves consistently high accuracy (89-92%) compared to existing methods on independent benchmarks from human, fish, fly, worm, plant and protist organisms. Spliceator is a new Deep Learning method trained on high-quality data, which can be used to predict splice sites in diverse organisms, ranging from human to protists, with consistently high accuracy.

Sections du résumé

BACKGROUND BACKGROUND
Ab initio prediction of splice sites is an essential step in eukaryotic genome annotation. Recent predictors have exploited Deep Learning algorithms and reliable gene structures from model organisms. However, Deep Learning methods for non-model organisms are lacking.
RESULTS RESULTS
We developed Spliceator to predict splice sites in a wide range of species, including model and non-model organisms. Spliceator uses a convolutional neural network and is trained on carefully validated data from over 100 organisms. We show that Spliceator achieves consistently high accuracy (89-92%) compared to existing methods on independent benchmarks from human, fish, fly, worm, plant and protist organisms.
CONCLUSIONS CONCLUSIONS
Spliceator is a new Deep Learning method trained on high-quality data, which can be used to predict splice sites in diverse organisms, ranging from human to protists, with consistently high accuracy.

Identifiants

pubmed: 34814826
doi: 10.1186/s12859-021-04471-3
pii: 10.1186/s12859-021-04471-3
pmc: PMC8609763
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

561

Subventions

Organisme : agence nationale de la recherche
ID : ANR-11-INBS-0013
Organisme : agence nationale de la recherche
ID : GA-676559
Organisme : agence nationale de la recherche
ID : ANR-18-RAR3-0006-02

Informations de copyright

© 2021. The Author(s).

Références

Nucleic Acids Res. 2001 Jan 1;29(1):255-9
pubmed: 11125105
Genome Biol. 2010;11(3):R34
pubmed: 20236510
Genome Biol. 2019 May 16;20(1):92
pubmed: 31097009
J Mol Biol. 1997 Apr 25;268(1):78-94
pubmed: 9149143
Wiley Interdiscip Rev RNA. 2020 Sep;11(5):e1593
pubmed: 32128990
Nucleic Acids Res. 2003 Jul 1;31(13):3829-32
pubmed: 12824430
Nat Genet. 2008 Dec;40(12):1413-5
pubmed: 18978789
BMC Bioinformatics. 2019 Dec 27;20(Suppl 23):652
pubmed: 31881982
J Appl Genet. 2018 Aug;59(3):253-268
pubmed: 29680930
Bioinformatics. 2005 Feb 15;21(4):471-82
pubmed: 15374869
J Comput Biol. 2004;11(2-3):377-94
pubmed: 15285897
Bioinformatics. 2019 Nov 1;35(22):4862-4865
pubmed: 31116374
Bioinformatics. 2020 Dec 16;:
pubmed: 33325516
Nature. 2015 May 28;521(7553):436-44
pubmed: 26017442
Methods. 2019 Aug 15;166:4-21
pubmed: 31022451
PeerJ Comput Sci. 2020 Jun 15;6:e278
pubmed: 33816929
Biol Direct. 2012 Apr 16;7:11
pubmed: 22507701
Nucleic Acids Res. 2000 Nov 1;28(21):4364-75
pubmed: 11058137
Curr Protoc Bioinformatics. 2014 Dec 12;48:4.11.1-39
pubmed: 25501943
Health Inf Manag. 2018 Sep;47(3):103-105
pubmed: 29719995
Nucleic Acids Res. 2006;34(14):3955-67
pubmed: 16914448
Methods. 2017 Aug 1;125:36-44
pubmed: 28595983
Nature. 2016 Jan 28;529(7587):484-9
pubmed: 26819042
Brief Bioinform. 2021 Jul 20;22(4):
pubmed: 33005921
Bioinformatics. 2002;18 Suppl 2:S75-83
pubmed: 12385987
Annu Int Conf IEEE Eng Med Biol Soc. 2016 Aug;2016:3076-3079
pubmed: 28268961
J Mol Cell Biol. 2021 Feb 15;12(11):823-827
pubmed: 32573721
BMC Genomics. 2018 Dec 29;19(1):980
pubmed: 30594132
Nucleic Acids Res. 2003 Oct 1;31(19):5654-66
pubmed: 14500829
Cell. 2019 Jan 24;176(3):535-548.e24
pubmed: 30661751
Nucleic Acids Res. 2020 Jan 8;48(D1):D682-D688
pubmed: 31691826
Genome Res. 2015 Feb;25(2):290-303
pubmed: 25561518
BMC Bioinformatics. 2020 Nov 10;21(1):513
pubmed: 33172385
Comput Struct Biotechnol J. 2020 Jun 17;18:1466-1473
pubmed: 32637044
Nucleic Acids Res. 2001 Mar 1;29(5):1185-90
pubmed: 11222768
J Comput Biol. 1997 Fall;4(3):311-23
pubmed: 9278062
Bioinformatics. 2009 May 1;25(9):1105-11
pubmed: 19289445
Nature. 2020 Jan;577(7792):706-710
pubmed: 31942072
NAR Genom Bioinform. 2021 Jan 06;3(1):lqaa108
pubmed: 33575650
Bioinformatics. 2005 Apr 15;21(8):1332-8
pubmed: 15564294
Genome Res. 2004 Jun;14(6):1188-90
pubmed: 15173120
Nat Rev Genet. 2002 Apr;3(4):285-98
pubmed: 11967553
BMC Bioinformatics. 2006 Feb 09;7:62
pubmed: 16469098
BMC Res Notes. 2017 Dec 4;10(1):667
pubmed: 29202864
Biol Direct. 2019 Apr 11;14(1):6
pubmed: 30975175
Nucleic Acids Res. 2019 Dec 2;47(21):10994-11006
pubmed: 31584084
Nucleic Acids Res. 2008 Apr;36(7):2257-67
pubmed: 18285363
Nat Rev Mol Cell Biol. 2014 Feb;15(2):108-21
pubmed: 24452469
Nat Rev Genet. 2011 Feb;12(2):87-98
pubmed: 21191423
Cells. 2020 Feb 18;9(2):
pubmed: 32085510
Bioinformatics. 2018 Dec 15;34(24):4180-4188
pubmed: 29931149
Genome Biol. 2006;7(1):R1
pubmed: 16507133
Nucleic Acids Res. 2010 Oct;38(18):e178
pubmed: 20802226
Front Genet. 2019 Mar 26;10:214
pubmed: 30972100
J Biol Chem. 2008 Jan 18;283(3):1229-33
pubmed: 18024428
Proc Natl Acad Sci U S A. 2008 Apr 15;105(15):5797-802
pubmed: 18391195
Nucleic Acids Res. 2021 Jan 8;49(D1):D480-D489
pubmed: 33237286
Nucleic Acids Res. 2019 Jan 8;47(D1):D411-D418
pubmed: 30380106
BioData Min. 2016 Jan 22;9:4
pubmed: 26807151
Gene. 2018 Jun 20;660:92-101
pubmed: 29588184
J Comput Biol. 2018 Aug;25(8):954-961
pubmed: 29668310
Sci Adv. 2020 Jun 10;6(24):
pubmed: 32917675
BMC Bioinformatics. 2007;8 Suppl 10:S7
pubmed: 18269701
Bioinformatics. 2004 Nov 1;20(16):2878-9
pubmed: 15145805
BMC Genomics. 2020 Apr 9;21(1):293
pubmed: 32272892
Nat Rev Genet. 2012 Apr 18;13(5):329-42
pubmed: 22510764
BMC Bioinformatics. 2004 May 14;5:59
pubmed: 15144565

Auteurs

Nicolas Scalzitti (N)

Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR7357, University of Strasbourg, 1 rue Eugène Boeckel, 67000, Strasbourg, France.

Arnaud Kress (A)

Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR7357, University of Strasbourg, 1 rue Eugène Boeckel, 67000, Strasbourg, France.
BiGEst-ICube Platform, ICube Laboratory, UMR7357, 1 rue Eugène Boeckel, 67000, Strasbourg, France.

Romain Orhand (R)

Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR7357, University of Strasbourg, 1 rue Eugène Boeckel, 67000, Strasbourg, France.

Thomas Weber (T)

Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR7357, University of Strasbourg, 1 rue Eugène Boeckel, 67000, Strasbourg, France.

Luc Moulinier (L)

Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR7357, University of Strasbourg, 1 rue Eugène Boeckel, 67000, Strasbourg, France.
BiGEst-ICube Platform, ICube Laboratory, UMR7357, 1 rue Eugène Boeckel, 67000, Strasbourg, France.

Anne Jeannin-Girardon (A)

Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR7357, University of Strasbourg, 1 rue Eugène Boeckel, 67000, Strasbourg, France.

Pierre Collet (P)

Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR7357, University of Strasbourg, 1 rue Eugène Boeckel, 67000, Strasbourg, France.

Olivier Poch (O)

Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR7357, University of Strasbourg, 1 rue Eugène Boeckel, 67000, Strasbourg, France.

Julie D Thompson (JD)

Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR7357, University of Strasbourg, 1 rue Eugène Boeckel, 67000, Strasbourg, France. thompson@unistra.fr.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH