ORFanID: A web-based search engine for the discovery and identification of orphan and taxonomically restricted genes.


Journal

PloS one
ISSN: 1932-6203
Titre abrégé: PLoS One
Pays: United States
ID NLM: 101285081

Informations de publication

Date de publication:
2023
Historique:
received: 29 03 2023
accepted: 24 08 2023
medline: 27 10 2023
pubmed: 25 10 2023
entrez: 25 10 2023
Statut: epublish

Résumé

With the numerous genomes sequenced today, it has been revealed that a noteworthy percentage of genes in a given taxon of organisms in the phylogenetic tree of life do not have orthologous sequences in other taxa. These sequences are commonly referred to as "orphans" or "ORFans" if found as single occurrences in a single species or as "taxonomically restricted genes" (TRGs) when found at higher taxonomic levels. Quantitative and collective studies of these genes are necessary for understanding their biological origins. However, the current software for identifying orphan genes is limited in its functionality, database search range, and very complex algorithmically. Thus, researchers studying orphan genes must harvest their data from many disparate sources. ORFanID is a graphical web-based search engine that facilitates the efficient identification of both orphan genes and TRGs at all taxonomic levels, from DNA or amino acid sequences in the NCBI database cluster and other large bioinformatics repositories. The software allows users to identify genes that are unique to any taxonomic rank, from species to domain, using NCBI systematic classifiers. It provides control over NCBI database search parameters, and the results are presented in a spreadsheet as well as a graphical display. The tables in the software are sortable, and results can be filtered using the fuzzy search functionality. The visual presentation can be expanded and collapsed by the taxonomic tree to its various branches. Example results from searches on five species and gene expression data from specific orphan genes are provided in the Supplementary Information.

Identifiants

pubmed: 37879070
doi: 10.1371/journal.pone.0291260
pii: PONE-D-23-08315
pmc: PMC10599687
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

e0291260

Informations de copyright

Copyright: © 2023 Gunasekera et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Déclaration de conflit d'intérêts

The authors have declared that no competing interests exist.

Références

Nucleic Acids Res. 2019 Jan 8;47(D1):D807-D811
pubmed: 30395283
Mol Biol Evol. 2019 Dec 1;36(12):2922-2924
pubmed: 31411700
BMC Genomics. 2011 Mar 29;12:164
pubmed: 21447185
Bioinformatics. 1999 Sep;15(9):759-62
pubmed: 10498776
Bioinformatics. 2012 Jun 15;28(12):1647-9
pubmed: 22543367
Genome Biol. 2019 Nov 14;20(1):238
pubmed: 31727128
Nucleic Acids Res. 2018 Jan 4;46(D1):D477-D485
pubmed: 29106550
Front Cell Infect Microbiol. 2012 Aug 28;2:113
pubmed: 22973559
BMC Biol. 2011 Feb 04;9:8
pubmed: 21294909
J Struct Biol. 2001 May-Jun;134(2-3):191-203
pubmed: 11551179
PLoS One. 2007 Apr 04;2(4):e359
pubmed: 17406683
Bioinformatics. 2019 Aug 15;35(16):2856-2858
pubmed: 30615063
J Genet. 2009 Apr;88(1):93-7
pubmed: 19417550
PLoS Biol. 2020 Nov 2;18(11):e3000862
pubmed: 33137085
BMC Bioinformatics. 2016 May 31;17(1):226
pubmed: 27245157
Plant Biotechnol J. 2019 Jan;17(1):252-263
pubmed: 29878511
J Mol Evol. 2013 Dec;77(5-6):246-59
pubmed: 24221639
Proc Natl Acad Sci U S A. 2006 Jun 27;103(26):9935-9
pubmed: 16777968
Genome Biol Evol. 2012;4(11):1176-87
pubmed: 23034216
BMC Bioinformatics. 2022 May 5;23(1):162
pubmed: 35513802
Bioinformatics. 2021 Sep 29;37(18):3029-3031
pubmed: 33734313
Trends Genet. 2009 Sep;25(9):404-13
pubmed: 19716618
Nucleic Acids Res. 2020 Jan 8;48(D1):D762-D767
pubmed: 31642470
Nat Rev Genet. 2011 Aug 31;12(10):692-702
pubmed: 21878963
Front Genet. 2018 Sep 20;9:407
pubmed: 30294344
Trends Genet. 2007 Nov;23(11):533-9
pubmed: 18029048
Front Genet. 2020 Oct 02;11:820
pubmed: 33133122
Genome Res. 2003 Sep;13(9):2178-89
pubmed: 12952885
Bioinformatics. 2016 Jul 1;32(13):2053-5
pubmed: 27153690
Plant J. 2009 May;58(3):485-98
pubmed: 19154206
Proc Natl Acad Sci U S A. 2007 Feb 13;104(7):2043-9
pubmed: 17261804
Nat Methods. 2015 Jan;12(1):59-60
pubmed: 25402007
Nucleic Acids Res. 2023 Jan 6;51(D1):D389-D394
pubmed: 36399505
G3 (Bethesda). 2017 Oct 5;7(10):3337-3347
pubmed: 28839119
Genetics. 2008 May;179(1):487-96
pubmed: 18493065
Genome Biol Evol. 2013;5(2):439-55
pubmed: 23348040
Int J Mol Sci. 2021 May 26;22(11):
pubmed: 34073251
BMC Bioinformatics. 2013 Mar 22;14:106
pubmed: 23522326
BMC Evol Biol. 2006 Aug 16;6:63
pubmed: 16914045

Auteurs

Richard S Gunasekera (RS)

Department of Chemistry, Physics and Engineering, School of Science, Technology & Health, Biola University, La Mirada, CA, United States of America.

Komal K B Raja (KKB)

Department of Pathology & Immunology, Baylor College of Medicine, Houston, TX, United States of America.

Suresh Hewapathirana (S)

European Bioinformatics Institute, Welcome Genome Campus, Hinxton, Cambridgeshire, United Kingdom.

Emanuel Tundrea (E)

Griffiths School of Management and IT, Emanuel University of Oradea, Oradea, Romania.

Vinodh Gunasekera (V)

Bioinformatics, Chesalon USA, Inc., Houston, TX, United States of America.

Thushara Galbadage (T)

Department of Kinesiology and Public Health, School of Science, Technology & Health, Biola University, La Mirada, CA, United States of America.

Paul A Nelson (PA)

Biola University, La Mirada, CA, United States of America.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages
Animals Hemiptera Insect Proteins Phylogeny Insecticides
Amaryllidaceae Alkaloids Lycoris NADPH-Ferrihemoprotein Reductase Gene Expression Regulation, Plant Plant Proteins

Classifications MeSH