Many-core algorithms for high-dimensional gradients on phylogenetic trees.


Journal

Bioinformatics (Oxford, England)
ISSN: 1367-4811
Titre abrégé: Bioinformatics
Pays: England
ID NLM: 9808944

Informations de publication

Date de publication:
18 Jan 2024
Historique:
received: 24 03 2023
revised: 20 12 2023
accepted: 15 01 2024
medline: 20 1 2024
pubmed: 20 1 2024
entrez: 20 1 2024
Statut: aheadofprint

Résumé

Advancements in high-throughput genomic sequencing are delivering genomic pathogen data at an unprecedented rate, positioning statistical phylogenetics as a critical tool to monitor infectious diseases globally. This rapid growth spurs the need for efficient inference techniques, such as Hamiltonian Monte Carlo (HMC) in a Bayesian framework, to estimate parameters of these phylogenetic models where the dimensions of the parameters increase with the number of sequences N. HMC requires repeated calculation of the gradient of the data log-likelihood with respect to (wrt) all branch-length-specific (BLS) parameters that traditionally takes O(N2) operations using the standard pruning algorithm. A recent study proposes an approach to calculate this gradient in O(N), enabling researchers to take advantage of gradient-based samplers such as HMC. The CPU implementation of this approach makes the calculation of the gradient computationally tractable for nucleotide-based models but falls short in performance for larger state-space size models, such as Markov-modulated and codon models. Here, we describe novel massively parallel algorithms to calculate the gradient of the log-likelihood wrt all BLS parameters that take advantage of graphics processing units (GPUs) and result in many fold higher speedups over previous CPU implementations. We benchmark these GPU algorithms on three computing systems using three evolutionary inference examples exploring complete genomes from 997 dengue viruses, 62 carnivore mitochondria and 49 yeasts, and observe a greater than 128-fold speedup over the CPU implementation for codon-based models and greater than 8-fold speedup for nucleotide-based models. As a practical demonstration, we also estimate the timing of the first introduction of West Nile virus into the continental Unites States under a codon model with a relaxed molecular clock from 104 full viral genomes, an inference task previously intractable. We provide an implementation of our GPU algorithms in BEAGLE v4.0.0 (https://github.com/beagle-dev/beagle-lib), an open source library for statistical phylogenetics that enables parallel calculations on multi-core CPUs and GPUs. We employ a BEAGLE-implementation using the Bayesian phylogenetics framework BEAST (https://github.com/beast-dev/beast-mcmc).

Identifiants

pubmed: 38243701
pii: 7577857
doi: 10.1093/bioinformatics/btae030
pii:
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Informations de copyright

© The Author(s) 2024. Published by Oxford University Press.

Auteurs

Karthik Gangavarapu (K)

Department of Biomathematics, David Geffen School of Medicine at UCLA, University of California, Los Angeles, United States.

Xiang Ji (X)

Department of Mathematics, School of Science & Engineering, Tulane University, New Orleans, United States.

Guy Baele (G)

Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven, Leuven Belgium.

Mathieu Fourment (M)

Australian Institute for Microbiology and Infection, University of Technology Sydney, Ultimo, NSW Australia.

Philippe Lemey (P)

Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven, Leuven Belgium.

Frederick A Matsen (FA)

Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington USA.
Department of Statistics, University of Washington, Seattle, USA.
Department of Genome Sciences, University of Washington, Seattle, USA.
Howard Hughes Medical Institute, Fred Hutchinson Cancer Research Center, Seattle, Washington USA.

Marc A Suchard (MA)

Department of Biomathematics, David Geffen School of Medicine at UCLA, University of California, Los Angeles, United States.
Department of Biostatistics, Jonathan and Karin Fielding School of Public Health, University of California, Los Angeles, United States.
Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, United States.

Classifications MeSH