Beyond multidrug resistance: Leveraging rare variants with machine and statistical learning models in Mycobacterium tuberculosis resistance prediction.
Antitubercular Agents
/ pharmacology
Cluster Analysis
Computational Biology
/ methods
Databases, Genetic
Evolution, Molecular
Extensively Drug-Resistant Tuberculosis
/ diagnosis
Genetic Variation
Genome, Bacterial
Genomics
/ methods
Humans
Machine Learning
Microbial Sensitivity Tests
Models, Statistical
Mycobacterium tuberculosis
/ drug effects
Prognosis
ROC Curve
Reproducibility of Results
Tuberculosis, Multidrug-Resistant
/ diagnosis
Extensively drug-resistant tuberculosis
Genome sequencing
Machine learning
Multidrug-resistance
Mycobacterium tuberculosis
Journal
EBioMedicine
ISSN: 2352-3964
Titre abrégé: EBioMedicine
Pays: Netherlands
ID NLM: 101647039
Informations de publication
Date de publication:
May 2019
May 2019
Historique:
received:
09
01
2019
revised:
21
02
2019
accepted:
05
04
2019
pubmed:
3
5
2019
medline:
26
11
2019
entrez:
4
5
2019
Statut:
ppublish
Résumé
The diagnosis of multidrug resistant and extensively drug resistant tuberculosis is a global health priority. Whole genome sequencing of clinical Mycobacterium tuberculosis isolates promises to circumvent the long wait times and limited scope of conventional phenotypic antimicrobial susceptibility, but gaps remain for predicting phenotype accurately from genotypic data especially for certain drugs. Our primary aim was to perform an exploration of statistical learning algorithms and genetic predictor sets using a rich dataset to build a high performing and fast predicting model to detect anti-tuberculosis drug resistance. We collected targeted or whole genome sequencing and conventional drug resistance phenotyping data from 3601 Mycobacterium tuberculosis strains enriched for resistance to first- and second-line drugs, with 1228 multidrug resistant strains. We investigated the utility of (1) rare variants and variants known to be determinants of resistance for at least one drug and (2) machine and statistical learning architectures in predicting phenotypic drug resistance to 10 anti-tuberculosis drugs. Specifically, we investigated multitask and single task wide and deep neural networks, a multilayer perceptron, regularized logistic regression, and random forest classifiers. The highest performing machine and statistical learning methods included both rare variants and those known to be causal of resistance for at least one drug. Both simpler L2 penalized regression and complex machine learning models had high predictive performance. The average AUCs for our highest performing model was 0.979 for first-line drugs and 0.936 for second-line drugs during repeated cross-validation. On an independent validation set, the highest performing model showed average AUCs, sensitivities, and specificities, respectively, of 0.937, 87.9%, and 92.7% for first-line drugs and 0.891, 82.0% and 90.1% for second-line drugs. Our method outperforms existing approaches based on direct association, with increased sum of sensitivity and specificity of 11.7% on first line drugs and 3.2% on second line drugs. Our method has higher predictive performance compared to previously reported machine learning models during cross-validation, with higher AUCs for 8 of 10 drugs. Statistical models, especially those that are trained using both frequent and less frequent variants, significantly improve the accuracy of resistance prediction and hold promise in bringing sequencing technologies closer to the bedside.
Sections du résumé
BACKGROUND
BACKGROUND
The diagnosis of multidrug resistant and extensively drug resistant tuberculosis is a global health priority. Whole genome sequencing of clinical Mycobacterium tuberculosis isolates promises to circumvent the long wait times and limited scope of conventional phenotypic antimicrobial susceptibility, but gaps remain for predicting phenotype accurately from genotypic data especially for certain drugs. Our primary aim was to perform an exploration of statistical learning algorithms and genetic predictor sets using a rich dataset to build a high performing and fast predicting model to detect anti-tuberculosis drug resistance.
METHODS
METHODS
We collected targeted or whole genome sequencing and conventional drug resistance phenotyping data from 3601 Mycobacterium tuberculosis strains enriched for resistance to first- and second-line drugs, with 1228 multidrug resistant strains. We investigated the utility of (1) rare variants and variants known to be determinants of resistance for at least one drug and (2) machine and statistical learning architectures in predicting phenotypic drug resistance to 10 anti-tuberculosis drugs. Specifically, we investigated multitask and single task wide and deep neural networks, a multilayer perceptron, regularized logistic regression, and random forest classifiers.
FINDINGS
RESULTS
The highest performing machine and statistical learning methods included both rare variants and those known to be causal of resistance for at least one drug. Both simpler L2 penalized regression and complex machine learning models had high predictive performance. The average AUCs for our highest performing model was 0.979 for first-line drugs and 0.936 for second-line drugs during repeated cross-validation. On an independent validation set, the highest performing model showed average AUCs, sensitivities, and specificities, respectively, of 0.937, 87.9%, and 92.7% for first-line drugs and 0.891, 82.0% and 90.1% for second-line drugs. Our method outperforms existing approaches based on direct association, with increased sum of sensitivity and specificity of 11.7% on first line drugs and 3.2% on second line drugs. Our method has higher predictive performance compared to previously reported machine learning models during cross-validation, with higher AUCs for 8 of 10 drugs.
INTERPRETATION
CONCLUSIONS
Statistical models, especially those that are trained using both frequent and less frequent variants, significantly improve the accuracy of resistance prediction and hold promise in bringing sequencing technologies closer to the bedside.
Identifiants
pubmed: 31047860
pii: S2352-3964(19)30250-6
doi: 10.1016/j.ebiom.2019.04.016
pmc: PMC6557804
pii:
doi:
Substances chimiques
Antitubercular Agents
0
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
356-369Subventions
Organisme : NIEHS NIH HHS
ID : K01 ES026835
Pays : United States
Commentaires et corrections
Type : CommentIn
Informations de copyright
Copyright © 2019 The Authors. Published by Elsevier B.V. All rights reserved.
Références
Eur Respir J. 2008 Nov;32(5):1165-74
pubmed: 18614561
Bioinformatics. 2009 Aug 15;25(16):2078-9
pubmed: 19505943
Antimicrob Agents Chemother. 2009 Oct;53(10):4138-46
pubmed: 19596878
Int J Tuberc Lung Dis. 2009 Nov;13(11):1320-30
pubmed: 19861002
Genome Res. 2011 Jun;21(6):936-9
pubmed: 20980556
N Engl J Med. 2011 Feb 24;364(8):730-9
pubmed: 21345102
Science. 2011 Sep 16;333(6049):1630-2
pubmed: 21835980
J Clin Microbiol. 2012 Apr;50(4):1233-9
pubmed: 22301024
J Antimicrob Chemother. 2012 Sep;67(9):2107-9
pubmed: 22593564
Nat Rev Genet. 2012 Sep;13(9):601-612
pubmed: 22868263
Bull World Health Organ. 2012 Sep 1;90(9):693-8
pubmed: 22984314
N Engl J Med. 2013 Jul 18;369(3):290-2
pubmed: 23863072
Nat Genet. 2013 Oct;45(10):1190-7
pubmed: 23995136
Nat Genet. 2013 Oct;45(10):1255-60
pubmed: 23995137
J Antimicrob Chemother. 2014 Feb;69(2):331-42
pubmed: 24055765
Genome Biol. 2014 Mar 03;15(3):R46
pubmed: 24580807
Nat Genet. 2014 Aug;46(8):912-918
pubmed: 25017105
Cochrane Database Syst Rev. 2014 Oct 29;(10):CD010705
pubmed: 25353401
Int J Tuberc Lung Dis. 2015 Mar;19(3):339-41
pubmed: 25686144
PLoS One. 2015 Mar 04;10(3):e0118432
pubmed: 25738806
Lancet Infect Dis. 2015 Oct;15(10):1193-1202
pubmed: 26116186
J Clin Microbiol. 2015 Sep;53(9):2961-9
pubmed: 26179309
Clin Infect Dis. 2015 Oct 15;61Suppl 3:S141-6
pubmed: 26409275
Nat Commun. 2015 Dec 21;6:10063
pubmed: 26686880
J Clin Microbiol. 2016 Mar;54(3):727-33
pubmed: 26763957
Proc Natl Acad Sci U S A. 2016 Feb 16;113(7):E839-46
pubmed: 26792518
Am J Respir Crit Care Med. 2016 Sep 1;194(5):621-30
pubmed: 26910495
Eur Respir Rev. 2016 Mar;25(139):29-35
pubmed: 26929418
Nat Med. 2016 Dec;22(12):1470-1474
pubmed: 27798613
Nat Genet. 2017 Mar;49(3):395-402
pubmed: 28092681
J Clin Microbiol. 2017 May;55(5):1285-1298
pubmed: 28275074
J Clin Microbiol. 2017 Jun;55(6):1871-1882
pubmed: 28381603
Chin Med J (Engl). 2017 Jul 5;130(13):1521-1528
pubmed: 28639565
Tuberculosis (Edinb). 2017 Dec;107:63-72
pubmed: 29050774
Bioinformatics. 2018 May 15;34(10):1666-1671
pubmed: 29240876
N Engl J Med. 2018 Oct 11;379(15):1403-1415
pubmed: 30280646
Bioinformatics. 2018 Nov 21;:null
pubmed: 30462147
Nat Commun. 2019 May 13;10(1):2128
pubmed: 31086182