The parallelism motifs of genomic data analysis.
bioinformatics
high-performance data analytics
parallel computing
Journal
Philosophical transactions. Series A, Mathematical, physical, and engineering sciences
ISSN: 1471-2962
Titre abrégé: Philos Trans A Math Phys Eng Sci
Pays: England
ID NLM: 101133385
Informations de publication
Date de publication:
06 Mar 2020
06 Mar 2020
Historique:
entrez:
21
1
2020
pubmed:
21
1
2020
medline:
21
1
2020
Statut:
ppublish
Résumé
Genomic datasets are growing dramatically as the cost of sequencing continues to decline and small sequencing devices become available. Enormous community databases store and share these data with the research community, but some of these genomic data analysis problems require large-scale computational platforms to meet both the memory and computational requirements. These applications differ from scientific simulations that dominate the workload on high-end parallel systems today and place different requirements on programming support, software libraries and parallel architectural design. For example, they involve irregular communication patterns such as asynchronous updates to shared data structures. We consider several problems in high-performance genomics analysis, including alignment, profiling, clustering and assembly for both single genomes and metagenomes. We identify some of the common computational patterns or 'motifs' that help inform parallelization strategies and compare our motifs to some of the established lists, arguing that at least two key patterns, sorting and hashing, are missing. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'.
Identifiants
pubmed: 31955674
doi: 10.1098/rsta.2019.0394
pmc: PMC7015300
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
20190394Références
Cell Syst. 2019 Apr 24;8(4):292-301.e3
pubmed: 31005579
Sci Rep. 2019 Oct 16;9(1):14882
pubmed: 31619717
Nat Biotechnol. 2017 Nov;35(11):1026-1028
pubmed: 29035372
IEEE/ACM Trans Comput Biol Bioinform. 2019 Jul-Aug;16(4):1117-1131
pubmed: 28991750
Bioinformatics. 2005 Dec 1;21(23):4239-47
pubmed: 16188929
Nat Commun. 2021 May 26;12(1):3168
pubmed: 34039967
BMC Bioinformatics. 2018 Feb 19;19(Suppl 1):45
pubmed: 29504909
Genome Biol. 2019 Dec 4;20(1):265
pubmed: 31801633
J Mol Biol. 1981 Mar 25;147(1):195-7
pubmed: 7265238
BMC Bioinformatics. 2013 Apr 04;14:117
pubmed: 23557111
Bioinformatics. 2018 Sep 15;34(18):3094-3100
pubmed: 29750242
Genome Biol. 2016 Jun 20;17(1):132
pubmed: 27323842
Annu Rev Biophys. 2008;37:289-316
pubmed: 18573083
Gigascience. 2018 Dec 1;7(12):
pubmed: 30346548
Bioinformatics. 2006 Jul 1;22(13):1658-9
pubmed: 16731699
Genome Res. 2011 Mar;21(3):487-93
pubmed: 21209072
DNA Res. 2015 Feb;22(1):69-77
pubmed: 25431440
Nucleic Acids Res. 2018 Apr 6;46(6):e33
pubmed: 29315405
J Mol Biol. 1970 Mar;48(3):443-53
pubmed: 5420325
Nucleic Acids Res. 2002 Apr 1;30(7):1575-84
pubmed: 11917018
J Comput Biol. 2000 Feb-Apr;7(1-2):203-14
pubmed: 10890397
Bioinformatics. 2007 Jan 15;23(2):156-61
pubmed: 17110365
BMC Bioinformatics. 2007 Jun 07;8:185
pubmed: 17555593