A deep learning approach to real-time HIV outbreak detection using genetic data.
Journal
PLoS computational biology
ISSN: 1553-7358
Titre abrégé: PLoS Comput Biol
Pays: United States
ID NLM: 101238922
Informations de publication
Date de publication:
10 2022
10 2022
Historique:
received:
17
12
2021
accepted:
23
09
2022
revised:
26
10
2022
pubmed:
15
10
2022
medline:
29
10
2022
entrez:
14
10
2022
Statut:
epublish
Résumé
Pathogen genomic sequence data are increasingly made available for epidemiological monitoring. A main interest is to identify and assess the potential of infectious disease outbreaks. While popular methods to analyze sequence data often involve phylogenetic tree inference, they are vulnerable to errors from recombination and impose a high computational cost, making it difficult to obtain real-time results when the number of sequences is in or above the thousands. Here, we propose an alternative strategy to outbreak detection using genomic data based on deep learning methods developed for image classification. The key idea is to use a pairwise genetic distance matrix calculated from viral sequences as an image, and develop convolutional neutral network (CNN) models to classify areas of the images that show signatures of active outbreak, leading to identification of subsets of sequences taken from an active outbreak. We showed that our method is efficient in finding HIV-1 outbreaks with R0 ≥ 2.5, and overall a specificity exceeding 98% and sensitivity better than 92%. We validated our approach using data from HIV-1 CRF01 in Europe, containing both endemic sequences and a well-known dual outbreak in intravenous drug users. Our model accurately identified known outbreak sequences in the background of slower spreading HIV. Importantly, we detected both outbreaks early on, before they were over, implying that had this method been applied in real-time as data became available, one would have been able to intervene and possibly prevent the extent of these outbreaks. This approach is scalable to processing hundreds of thousands of sequences, making it useful for current and future real-time epidemiological investigations, including public health monitoring using large databases and especially for rapid outbreak identification.
Identifiants
pubmed: 36240224
doi: 10.1371/journal.pcbi.1010598
pii: PCOMPBIOL-D-21-02277
pmc: PMC9604978
doi:
Types de publication
Journal Article
Research Support, N.I.H., Extramural
Langues
eng
Sous-ensembles de citation
IM
Pagination
e1010598Subventions
Organisme : NIAID NIH HHS
ID : R01 AI087520
Pays : United States
Organisme : NIAID NIH HHS
ID : R01 AI135946
Pays : United States
Organisme : NIAID NIH HHS
ID : R01 AI152703
Pays : United States
Déclaration de conflit d'intérêts
The authors have declared that no competing interests exist.
Références
Mol Biol Evol. 2017 Apr 1;34(4):997-1007
pubmed: 28100788
J Acquir Immune Defic Syndr. 2020 Nov 1;85(3):e32-e40
pubmed: 32740373
Virus Evol. 2018 Jun 08;4(1):vey016
pubmed: 29942656
Lancet Infect Dis. 2019 Feb;19(2):143-155
pubmed: 30509777
PLoS Pathog. 2017 Jan 9;13(1):e1006000
pubmed: 28068413
Nature. 2015 May 28;521(7553):436-44
pubmed: 26017442
Bioinformatics. 2014 May 1;30(9):1312-3
pubmed: 24451623
J Infect Dis. 2008 Sep 1;198(5):687-93
pubmed: 18662132
Syst Biol. 2020 Sep 1;69(5):884-896
pubmed: 32049340
PLoS Comput Biol. 2017 Jan 13;13(1):e1005316
pubmed: 28085876
PLoS One. 2010 Mar 10;5(3):e9490
pubmed: 20224823
Mol Biol Evol. 2018 Jul 1;35(7):1812-1819
pubmed: 29401317
PLoS Med. 2013 Dec;10(12):e1001568; discussion e1001568
pubmed: 24339751
AIDS Res Hum Retroviruses. 2017 Mar;33(3):211-218
pubmed: 27824249
Mol Biol Evol. 2013 Apr;30(4):772-80
pubmed: 23329690
J Virol. 2011 Jan;85(1):510-8
pubmed: 20962100
Nat Microbiol. 2018 Sep;3(9):983-988
pubmed: 30061758
Mol Biol Evol. 2018 Mar 1;35(3):719-733
pubmed: 29186559
Bioinformatics. 2019 Feb 1;35(3):526-528
pubmed: 30016406
Epidemics. 2012 Jun;4(2):104-16
pubmed: 22664069
BMJ Open. 2022 Apr 21;12(4):e060184
pubmed: 35450916
Proc Natl Acad Sci U S A. 1999 Sep 14;96(19):10752-7
pubmed: 10485898
Mol Biol Evol. 2020 May 1;37(5):1530-1534
pubmed: 32011700
IEEE Trans Neural Netw Learn Syst. 2021 Jun 10;PP:
pubmed: 34111009
Int J Epidemiol. 2019 Dec 1;48(6):1795-1803
pubmed: 31074780
J Acquir Immune Defic Syndr. 2015 Dec 1;70(4):444-51
pubmed: 26302431
PLoS Comput Biol. 2014 Apr 17;10(4):e1003570
pubmed: 24743590
Science. 2010 Mar 12;327(5971):1376-9
pubmed: 20223986
J Acquir Immune Defic Syndr. 2018 Dec 15;79(5):543-550
pubmed: 30222659
Syst Biol. 2010 May;59(3):307-21
pubmed: 20525638
Stat Med. 2009 May 15;28(11):1554-68
pubmed: 19278012