Alignment-free comparison of metagenomics sequences via approximate string matching.
Journal
Bioinformatics advances
ISSN: 2635-0041
Titre abrégé: Bioinform Adv
Pays: England
ID NLM: 9918282081306676
Informations de publication
Date de publication:
2022
2022
Historique:
received:
04
07
2022
revised:
16
09
2022
accepted:
19
10
2022
entrez:
17
11
2022
pubmed:
18
11
2022
medline:
18
11
2022
Statut:
epublish
Résumé
Quantifying pairwise sequence similarities is a key step in metagenomics studies. Alignment-free methods provide a computationally efficient alternative to alignment-based methods for large-scale sequence analysis. Several neural network-based methods have recently been developed for this purpose. However, existing methods do not perform well on sequences of varying lengths and are sensitive to the presence of insertions and deletions. In this article, we describe the development of a new method, referred to as AsMac that addresses the aforementioned issues. We proposed a novel neural network structure for approximate string matching for the extraction of pertinent information from biological sequences and developed an efficient gradient computation algorithm for training the constructed neural network. We performed a large-scale benchmark study using real-world data that demonstrated the effectiveness and potential utility of the proposed method. The open-source software for the proposed method and trained neural-network models for some commonly used metagenomics marker genes were developed and are freely available at www.acsu.buffalo.edu/~yijunsun/lab/AsMac.html. Supplementary data are available at
Identifiants
pubmed: 36388153
doi: 10.1093/bioadv/vbac077
pii: vbac077
pmc: PMC9645238
doi:
Types de publication
Journal Article
Langues
eng
Pagination
vbac077Subventions
Organisme : NIAID NIH HHS
ID : R01 AI125982
Pays : United States
Organisme : NCI NIH HHS
ID : R01 CA241123
Pays : United States
Informations de copyright
© The Author(s) 2022. Published by Oxford University Press.
Références
Nucleic Acids Res. 2019 Oct 10;47(18):e103
pubmed: 31269198
Bioinformatics. 2014 Jul 15;30(14):2000-8
pubmed: 24828656
Nat Methods. 2016 Jul;13(7):581-3
pubmed: 27214047
Nature. 2015 May 28;521(7553):436-44
pubmed: 26017442
PLoS Comput Biol. 2010 Feb 26;6(2):e1000667
pubmed: 20195499
Trends Genet. 1995 Jul;11(7):283-90
pubmed: 7482779
BMC Evol Biol. 2007 Mar 15;7:41
pubmed: 17359548
FEMS Microbiol Ecol. 2012 Dec;82(3):607-15
pubmed: 22680682
Nat Methods. 2010 May;7(5):335-6
pubmed: 20383131
Brief Bioinform. 2014 Nov;15(6):890-905
pubmed: 23904502
Genome Biol. 2019 Jul 25;20(1):144
pubmed: 31345254
J Dent Res. 2019 Aug;98(9):975-984
pubmed: 31329044
J Comput Biol. 2009 Oct;16(10):1487-500
pubmed: 19803738
Gene. 2012 Jan 15;492(1):309-14
pubmed: 22100880
Bioinformatics. 2019 Feb 1;35(3):380-388
pubmed: 30010718
Genome Biol. 2017 Oct 3;18(1):186
pubmed: 28974235
Nucleic Acids Res. 2011 Aug;39(14):e95
pubmed: 21596775
Bioinformatics. 2009 Dec 15;25(24):3221-7
pubmed: 19825795
ISME J. 2012 Mar;6(3):610-8
pubmed: 22134646
PLoS Comput Biol. 2017 Apr 24;13(4):e1005518
pubmed: 28437450
Bioinformatics. 2010 Oct 1;26(19):2460-1
pubmed: 20709691
Nucleic Acids Res. 2007;35(21):7188-96
pubmed: 17947321
Science. 2013 Dec 20;342(6165):1440-1
pubmed: 24357292
Proc Natl Acad Sci U S A. 2009 Feb 24;106(8):2677-82
pubmed: 19188606
Nature. 2016 Jul 06;535(7610):94-103
pubmed: 27383984
J Mol Biol. 1970 Mar;48(3):443-53
pubmed: 5420325
Sci Adv. 2015 Apr 3;1(3):
pubmed: 26229982
J Comput Biol. 2006 Mar;13(2):336-50
pubmed: 16597244
Bioinformatics. 2019 Jun 1;35(11):1820-1828
pubmed: 30346493
Brief Bioinform. 2014 May;15(3):343-53
pubmed: 24064230