The parallelism motifs of genomic data analysis.

bioinformatics high-performance data analytics parallel computing

Journal

Philosophical transactions. Series A, Mathematical, physical, and engineering sciences
ISSN: 1471-2962
Titre abrégé: Philos Trans A Math Phys Eng Sci
Pays: England
ID NLM: 101133385

Informations de publication

Date de publication:
06 Mar 2020
Historique:
entrez: 21 1 2020
pubmed: 21 1 2020
medline: 21 1 2020
Statut: ppublish

Résumé

Genomic datasets are growing dramatically as the cost of sequencing continues to decline and small sequencing devices become available. Enormous community databases store and share these data with the research community, but some of these genomic data analysis problems require large-scale computational platforms to meet both the memory and computational requirements. These applications differ from scientific simulations that dominate the workload on high-end parallel systems today and place different requirements on programming support, software libraries and parallel architectural design. For example, they involve irregular communication patterns such as asynchronous updates to shared data structures. We consider several problems in high-performance genomics analysis, including alignment, profiling, clustering and assembly for both single genomes and metagenomes. We identify some of the common computational patterns or 'motifs' that help inform parallelization strategies and compare our motifs to some of the established lists, arguing that at least two key patterns, sorting and hashing, are missing. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'.

Identifiants

pubmed: 31955674
doi: 10.1098/rsta.2019.0394
pmc: PMC7015300
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

20190394

Références

Cell Syst. 2019 Apr 24;8(4):292-301.e3
pubmed: 31005579
Sci Rep. 2019 Oct 16;9(1):14882
pubmed: 31619717
Nat Biotechnol. 2017 Nov;35(11):1026-1028
pubmed: 29035372
IEEE/ACM Trans Comput Biol Bioinform. 2019 Jul-Aug;16(4):1117-1131
pubmed: 28991750
Bioinformatics. 2005 Dec 1;21(23):4239-47
pubmed: 16188929
Nat Commun. 2021 May 26;12(1):3168
pubmed: 34039967
BMC Bioinformatics. 2018 Feb 19;19(Suppl 1):45
pubmed: 29504909
Genome Biol. 2019 Dec 4;20(1):265
pubmed: 31801633
J Mol Biol. 1981 Mar 25;147(1):195-7
pubmed: 7265238
BMC Bioinformatics. 2013 Apr 04;14:117
pubmed: 23557111
Bioinformatics. 2018 Sep 15;34(18):3094-3100
pubmed: 29750242
Genome Biol. 2016 Jun 20;17(1):132
pubmed: 27323842
Annu Rev Biophys. 2008;37:289-316
pubmed: 18573083
Gigascience. 2018 Dec 1;7(12):
pubmed: 30346548
Bioinformatics. 2006 Jul 1;22(13):1658-9
pubmed: 16731699
Genome Res. 2011 Mar;21(3):487-93
pubmed: 21209072
DNA Res. 2015 Feb;22(1):69-77
pubmed: 25431440
Nucleic Acids Res. 2018 Apr 6;46(6):e33
pubmed: 29315405
J Mol Biol. 1970 Mar;48(3):443-53
pubmed: 5420325
Nucleic Acids Res. 2002 Apr 1;30(7):1575-84
pubmed: 11917018
J Comput Biol. 2000 Feb-Apr;7(1-2):203-14
pubmed: 10890397
Bioinformatics. 2007 Jan 15;23(2):156-61
pubmed: 17110365
BMC Bioinformatics. 2007 Jun 07;8:185
pubmed: 17555593

Auteurs

Katherine Yelick (K)

Lawrence Berkeley National Laboratory, Berkeley, CA, USA.
Department of Electrical Engineering and Computer Sciences, University of California at Berkeley, Berkeley, CA, USA.

Aydın Buluç (A)

Lawrence Berkeley National Laboratory, Berkeley, CA, USA.
Department of Electrical Engineering and Computer Sciences, University of California at Berkeley, Berkeley, CA, USA.

Muaaz Awan (M)

Lawrence Berkeley National Laboratory, Berkeley, CA, USA.

Ariful Azad (A)

School of Informatics, Computing, and Engineering, Indiana University, Bloomington, IN, USA.

Benjamin Brock (B)

Lawrence Berkeley National Laboratory, Berkeley, CA, USA.
Department of Electrical Engineering and Computer Sciences, University of California at Berkeley, Berkeley, CA, USA.

Rob Egan (R)

DOE Joint Genome Institute, Walnut Creek, CA, USA.

Saliya Ekanayake (S)

Lawrence Berkeley National Laboratory, Berkeley, CA, USA.

Marquita Ellis (M)

Lawrence Berkeley National Laboratory, Berkeley, CA, USA.
Department of Electrical Engineering and Computer Sciences, University of California at Berkeley, Berkeley, CA, USA.

Evangelos Georganas (E)

Intel Labs, Santa Clara, CA, USA.

Giulia Guidi (G)

Lawrence Berkeley National Laboratory, Berkeley, CA, USA.
Department of Electrical Engineering and Computer Sciences, University of California at Berkeley, Berkeley, CA, USA.

Steven Hofmeyr (S)

Lawrence Berkeley National Laboratory, Berkeley, CA, USA.

Oguz Selvitopi (O)

Lawrence Berkeley National Laboratory, Berkeley, CA, USA.

Cristina Teodoropol (C)

Lawrence Berkeley National Laboratory, Berkeley, CA, USA.
Department of Electrical Engineering and Computer Sciences, University of California at Berkeley, Berkeley, CA, USA.

Leonid Oliker (L)

Lawrence Berkeley National Laboratory, Berkeley, CA, USA.

Classifications MeSH