Beyond multidrug resistance: Leveraging rare variants with machine and statistical learning models in Mycobacterium tuberculosis resistance prediction.

Antitubercular Agents / pharmacology Cluster Analysis Computational Biology / methods Databases, Genetic Evolution, Molecular Extensively Drug-Resistant Tuberculosis / diagnosis Genetic Variation Genome, Bacterial Genomics / methods Humans Machine Learning Microbial Sensitivity Tests Models, Statistical Mycobacterium tuberculosis / drug effects Prognosis ROC Curve Reproducibility of Results Tuberculosis, Multidrug-Resistant / diagnosis

Extensively drug-resistant tuberculosis Genome sequencing Machine learning Multidrug-resistance Mycobacterium tuberculosis

Journal

EBioMedicine

ISSN: 2352-3964

Titre abrégé: EBioMedicine

Pays: Netherlands

ID NLM: 101647039

Informations de publication

Date de publication:
May 2019

Historique:

received: 09 01 2019

revised: 21 02 2019

accepted: 05 04 2019

pubmed: 3 5 2019

medline: 26 11 2019

entrez: 4 5 2019

Statut: ppublish

Résumé

The diagnosis of multidrug resistant and extensively drug resistant tuberculosis is a global health priority. Whole genome sequencing of clinical Mycobacterium tuberculosis isolates promises to circumvent the long wait times and limited scope of conventional phenotypic antimicrobial susceptibility, but gaps remain for predicting phenotype accurately from genotypic data especially for certain drugs. Our primary aim was to perform an exploration of statistical learning algorithms and genetic predictor sets using a rich dataset to build a high performing and fast predicting model to detect anti-tuberculosis drug resistance. We collected targeted or whole genome sequencing and conventional drug resistance phenotyping data from 3601 Mycobacterium tuberculosis strains enriched for resistance to first- and second-line drugs, with 1228 multidrug resistant strains. We investigated the utility of (1) rare variants and variants known to be determinants of resistance for at least one drug and (2) machine and statistical learning architectures in predicting phenotypic drug resistance to 10 anti-tuberculosis drugs. Specifically, we investigated multitask and single task wide and deep neural networks, a multilayer perceptron, regularized logistic regression, and random forest classifiers. The highest performing machine and statistical learning methods included both rare variants and those known to be causal of resistance for at least one drug. Both simpler L2 penalized regression and complex machine learning models had high predictive performance. The average AUCs for our highest performing model was 0.979 for first-line drugs and 0.936 for second-line drugs during repeated cross-validation. On an independent validation set, the highest performing model showed average AUCs, sensitivities, and specificities, respectively, of 0.937, 87.9%, and 92.7% for first-line drugs and 0.891, 82.0% and 90.1% for second-line drugs. Our method outperforms existing approaches based on direct association, with increased sum of sensitivity and specificity of 11.7% on first line drugs and 3.2% on second line drugs. Our method has higher predictive performance compared to previously reported machine learning models during cross-validation, with higher AUCs for 8 of 10 drugs. Statistical models, especially those that are trained using both frequent and less frequent variants, significantly improve the accuracy of resistance prediction and hold promise in bringing sequencing technologies closer to the bedside.

Sections du résumé

BACKGROUND BACKGROUND

METHODS METHODS

We collected targeted or whole genome sequencing and conventional drug resistance phenotyping data from 3601 Mycobacterium tuberculosis strains enriched for resistance to first- and second-line drugs, with 1228 multidrug resistant strains. We investigated the utility of (1) rare variants and variants known to be determinants of resistance for at least one drug and (2) machine and statistical learning architectures in predicting phenotypic drug resistance to 10 anti-tuberculosis drugs. Specifically, we investigated multitask and single task wide and deep neural networks, a multilayer perceptron, regularized logistic regression, and random forest classifiers.

FINDINGS RESULTS

The highest performing machine and statistical learning methods included both rare variants and those known to be causal of resistance for at least one drug. Both simpler L2 penalized regression and complex machine learning models had high predictive performance. The average AUCs for our highest performing model was 0.979 for first-line drugs and 0.936 for second-line drugs during repeated cross-validation. On an independent validation set, the highest performing model showed average AUCs, sensitivities, and specificities, respectively, of 0.937, 87.9%, and 92.7% for first-line drugs and 0.891, 82.0% and 90.1% for second-line drugs. Our method outperforms existing approaches based on direct association, with increased sum of sensitivity and specificity of 11.7% on first line drugs and 3.2% on second line drugs. Our method has higher predictive performance compared to previously reported machine learning models during cross-validation, with higher AUCs for 8 of 10 drugs.

INTERPRETATION CONCLUSIONS

Statistical models, especially those that are trained using both frequent and less frequent variants, significantly improve the accuracy of resistance prediction and hold promise in bringing sequencing technologies closer to the bedside.

Identifiants

DOI: 10.1016/j.ebiom.2019.04.016 PMID: 31047860 PMC: PMC6557804

pubmed: 31047860

pii: S2352-3964(19)30250-6

doi: 10.1016/j.ebiom.2019.04.016

pmc: PMC6557804

pii:

doi:

Substances chimiques

Antitubercular Agents 0

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

Pagination

356-369

Subventions

Organisme : NIEHS NIH HHS

ID : K01 ES026835

Pays : United States

Commentaires et corrections

Type : CommentIn

Informations de copyright

Références

Eur Respir J. 2008 Nov;32(5):1165-74

pubmed: 18614561

Bioinformatics. 2009 Aug 15;25(16):2078-9

pubmed: 19505943

Antimicrob Agents Chemother. 2009 Oct;53(10):4138-46

pubmed: 19596878

Int J Tuberc Lung Dis. 2009 Nov;13(11):1320-30

pubmed: 19861002

Genome Res. 2011 Jun;21(6):936-9

pubmed: 20980556

N Engl J Med. 2011 Feb 24;364(8):730-9

pubmed: 21345102

Science. 2011 Sep 16;333(6049):1630-2

pubmed: 21835980

J Clin Microbiol. 2012 Apr;50(4):1233-9

pubmed: 22301024

J Antimicrob Chemother. 2012 Sep;67(9):2107-9

pubmed: 22593564

Nat Rev Genet. 2012 Sep;13(9):601-612

pubmed: 22868263

Bull World Health Organ. 2012 Sep 1;90(9):693-8

pubmed: 22984314

N Engl J Med. 2013 Jul 18;369(3):290-2

pubmed: 23863072

Nat Genet. 2013 Oct;45(10):1190-7

pubmed: 23995136

Nat Genet. 2013 Oct;45(10):1255-60

pubmed: 23995137

J Antimicrob Chemother. 2014 Feb;69(2):331-42

pubmed: 24055765

Genome Biol. 2014 Mar 03;15(3):R46

pubmed: 24580807

Nat Genet. 2014 Aug;46(8):912-918

pubmed: 25017105

Cochrane Database Syst Rev. 2014 Oct 29;(10):CD010705

pubmed: 25353401

Int J Tuberc Lung Dis. 2015 Mar;19(3):339-41

pubmed: 25686144

PLoS One. 2015 Mar 04;10(3):e0118432

pubmed: 25738806

Lancet Infect Dis. 2015 Oct;15(10):1193-1202

pubmed: 26116186

J Clin Microbiol. 2015 Sep;53(9):2961-9

pubmed: 26179309

Clin Infect Dis. 2015 Oct 15;61Suppl 3:S141-6

pubmed: 26409275

Nat Commun. 2015 Dec 21;6:10063

pubmed: 26686880

J Clin Microbiol. 2016 Mar;54(3):727-33

pubmed: 26763957

Proc Natl Acad Sci U S A. 2016 Feb 16;113(7):E839-46

pubmed: 26792518

Am J Respir Crit Care Med. 2016 Sep 1;194(5):621-30

pubmed: 26910495

Eur Respir Rev. 2016 Mar;25(139):29-35

pubmed: 26929418

Nat Med. 2016 Dec;22(12):1470-1474

pubmed: 27798613

Nat Genet. 2017 Mar;49(3):395-402

pubmed: 28092681

J Clin Microbiol. 2017 May;55(5):1285-1298

pubmed: 28275074

J Clin Microbiol. 2017 Jun;55(6):1871-1882

pubmed: 28381603

Chin Med J (Engl). 2017 Jul 5;130(13):1521-1528

pubmed: 28639565

Tuberculosis (Edinb). 2017 Dec;107:63-72

pubmed: 29050774

Bioinformatics. 2018 May 15;34(10):1666-1671

pubmed: 29240876

N Engl J Med. 2018 Oct 11;379(15):1403-1415

pubmed: 30280646

Bioinformatics. 2018 Nov 21;:null

pubmed: 30462147

Nat Commun. 2019 May 13;10(1):2128

pubmed: 31086182

Beyond multidrug resistance: Leveraging rare variants with machine and statistical learning models in Mycobacterium tuberculosis resistance prediction.

Journal

Informations de publication

Résumé

Sections du résumé

Identifiants

Substances chimiques

Types de publication

Langues

Sous-ensembles de citation

Pagination

Subventions

Commentaires et corrections

Informations de copyright

Références

Auteurs

Michael L Chen (ML)

Akshith Doddi (A)

Jimmy Royer (J)

Luca Freschi (L)

Marco Schito (M)

Matthew Ezewudo (M)

Isaac S Kohane (IS)

Andrew Beam (A)

Maha Farhat (M)

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Smoking Cessation and Incident Cardiovascular Disease.

Evaluation of Low-Value Services Across Major Medicare Advantage Insurers and Traditional Medicare.

Effectiveness of Virtual Yoga for Chronic Low Back Pain: A Randomized Clinical Trial.

Classifications MeSH