Top-down machine learning approach for high-throughput single-molecule analysis.

Algorithms Fluorescent Dyes High-Throughput Screening Assays / methods Single Molecule Imaging / methods Software Unsupervised Machine Learning

HCN channels cooperativity divisive segmentation human molecular biophysics structural biology unsupervised analysis zero mode waveguides

Journal

eLife

ISSN: 2050-084X

Titre abrégé: Elife

Pays: England

ID NLM: 101579614

Informations de publication

Date de publication:
08 04 2020

Historique:

received: 06 11 2019

accepted: 08 04 2020

pubmed: 9 4 2020

medline: 23 2 2021

entrez: 9 4 2020

Statut: epublish

Résumé

Single-molecule approaches provide enormous insight into the dynamics of biomolecules, but adequately sampling distributions of states and events often requires extensive sampling. Although emerging experimental techniques can generate such large datasets, existing analysis tools are not suitable to process the large volume of data obtained in high-throughput paradigms. Here, we present a new analysis platform (DISC) that accelerates unsupervised analysis of single-molecule trajectories. By merging model-free statistical learning with the Viterbi algorithm, DISC idealizes single-molecule trajectories up to three orders of magnitude faster with improved accuracy compared to other commonly used algorithms. Further, we demonstrate the utility of DISC algorithm to probe cooperativity between multiple binding events in the cyclic nucleotide binding domains of HCN pacemaker channel. Given the flexible and efficient nature of DISC, we anticipate it will be a powerful tool for unsupervised processing of high-throughput data across a range of single-molecule experiments. During a chemical or biological process, a molecule may transition through a series of states, many of which are rare or short-lived. Advances in technology have made it easier to detect these states by gathering large amounts of data on individual molecules. However, the increasing size of these datasets has put a strain on the algorithms and software used to identify different molecular states. Now, White et al. have developed a new algorithm called DISC which overcomes this technical limitation. Unlike most other algorithms, DISC requires minimal input from the user and uses a new method to group the data into categories that represent distinct molecular states. Although this new approach produces a similar end-result, it reaches this conclusion much faster than more commonly used algorithms. To test the effectiveness of the algorithm, White et al. studied how individual molecules of a chemical known as cAMP bind to parts of proteins called cyclic nucleotide binding domains (or CNDBs for short). A fluorescent tag was attached to single molecules of cAMP and data were collected on the behavior of each molecule. Previous evidence suggested that when four CNDBs join together to form a so-called tetramer complex, this affects the binding of cAMP. Using the DISC system, White et al. showed that individual cAMP molecules interact with all four domains in a similar way, suggesting that the binding of cAMP is not impacted by the formation of a tetramer complex. Analyzing this data took DISC less than 20 minutes compared to existing algorithms which took anywhere between four hours and two weeks to complete. The enhanced speed of the DISC algorithm could make it easier to analyze much larger datasets from other techniques in addition to fluorescence. This means that a greater number of states can be sampled, providing a deeper insight into the inner workings of biological and chemical processes.

Autres résumés

Type: plain-language-summary (eng)

During a chemical or biological process, a molecule may transition through a series of states, many of which are rare or short-lived. Advances in technology have made it easier to detect these states by gathering large amounts of data on individual molecules. However, the increasing size of these datasets has put a strain on the algorithms and software used to identify different molecular states. Now, White et al. have developed a new algorithm called DISC which overcomes this technical limitation. Unlike most other algorithms, DISC requires minimal input from the user and uses a new method to group the data into categories that represent distinct molecular states. Although this new approach produces a similar end-result, it reaches this conclusion much faster than more commonly used algorithms. To test the effectiveness of the algorithm, White et al. studied how individual molecules of a chemical known as cAMP bind to parts of proteins called cyclic nucleotide binding domains (or CNDBs for short). A fluorescent tag was attached to single molecules of cAMP and data were collected on the behavior of each molecule. Previous evidence suggested that when four CNDBs join together to form a so-called tetramer complex, this affects the binding of cAMP. Using the DISC system, White et al. showed that individual cAMP molecules interact with all four domains in a similar way, suggesting that the binding of cAMP is not impacted by the formation of a tetramer complex. Analyzing this data took DISC less than 20 minutes compared to existing algorithms which took anywhere between four hours and two weeks to complete. The enhanced speed of the DISC algorithm could make it easier to analyze much larger datasets from other techniques in addition to fluorescence. This means that a greater number of states can be sampled, providing a deeper insight into the inner workings of biological and chemical processes.

Identifiants

DOI: 10.7554/eLife.53357 PMID: 32267232 PMC: PMC7205464

pubmed: 32267232

doi: 10.7554/eLife.53357

pii: 53357

pmc: PMC7205464

doi:

pii:

Substances chimiques

Fluorescent Dyes 0

Types de publication

Journal Article Research Support, N.I.H., Extramural Research Support, U.S. Gov't, Non-P.H.S.

Langues

eng

Sous-ensembles de citation

Subventions

Organisme : National Science Foundation

ID : CHE-1856518

Pays : International

Organisme : NINDS NIH HHS

ID : NS-081293

Pays : United States

Organisme : NINDS NIH HHS

ID : R35 NS116850

Pays : United States

Organisme : NINDS NIH HHS

ID : NS-101723

Pays : United States

Organisme : NIGMS NIH HHS

ID : R21 GM127957

Pays : United States

Organisme : NIGMS NIH HHS

ID : GM007507

Pays : United States

Organisme : NIGMS NIH HHS

ID : GM127957

Pays : United States

Organisme : NINDS NIH HHS

ID : NS-081320

Pays : United States

Informations de copyright

Déclaration de conflit d'intérêts

DW, MG, RG No competing interests declared, BC Reviewing editor, eLife

Références

Nat Commun. 2019 Jan 17;10(1):272

pubmed: 30655518

Nat Methods. 2016 Apr;13(4):341-4

pubmed: 26878382

Nat Methods. 2011 Nov 13;9(1):68-71

pubmed: 22081126

Annu Rev Phys Chem. 2019 Jun 14;70:301-322

pubmed: 30978297

Proc Natl Acad Sci U S A. 2014 Jan 14;111(2):664-9

pubmed: 24379388

Biophys J. 2017 May 23;112(10):2117-2126

pubmed: 28538149

J Chem Phys. 2018 Mar 28;148(12):123320

pubmed: 29604816

Nat Chem Biol. 2011 Dec 18;8(2):162-9

pubmed: 22179066

Nat Commun. 2016 Mar 17;7:11026

pubmed: 26984516

Nano Lett. 2018 Oct 10;18(10):6633-6637

pubmed: 30251862

Nat Methods. 2013 Mar;10(3):265-9

pubmed: 23396281

J Phys Chem Lett. 2014 Sep 18;5(18):3157-3161

pubmed: 25247055

Science. 2009 Jan 2;323(5910):133-8

pubmed: 19023044

J Phys Chem B. 2005 Jan 13;109(1):617-28

pubmed: 16851054

Methods Enzymol. 2010;472:153-78

pubmed: 20580964

Biophys J. 2008 Mar 1;94(5):1826-35

pubmed: 17921203

Biophys J. 2009 Dec 16;97(12):3196-205

pubmed: 20006957

J Phys Chem B. 2018 Jun 14;122(23):6134-6147

pubmed: 29737844

PLoS One. 2012;7(2):e30024

pubmed: 22363412

Angew Chem Int Ed Engl. 2017 Feb 20;56(9):2399-2402

pubmed: 28116856

Biophys J. 2014 Mar 18;106(6):1327-37

pubmed: 24655508

Biophys J. 2015 Dec 1;109(11):2268-76

pubmed: 26636938

Nat Methods. 2015 Mar;12(3):244-50, 3 p following 250

pubmed: 25599551

Biophys J. 2004 Mar;86(3):1488-501

pubmed: 14990476

J Phys Chem Lett. 2015 May 21;6(10):1819-23

pubmed: 26263254

Elife. 2016 Nov 18;5:

pubmed: 27858593

Biochem Soc Trans. 2017 Jun 15;45(3):759-769

pubmed: 28620037

Science. 2003 Jan 31;299(5607):682-6

pubmed: 12560545

Biophys J. 2006 Sep 1;91(5):1941-51

pubmed: 16766620

J Gen Physiol. 2018 Sep 3;150(9):1273-1286

pubmed: 30042141

J Am Chem Soc. 2016 Aug 24;138(33):10546-53

pubmed: 27409974

Biophys J. 2000 Oct;79(4):1915-27

pubmed: 11023897

Nature. 2016 Feb 4;530(7588):77-80

pubmed: 26842056

Elife. 2020 Apr 08;9:

pubmed: 32267232

J Phys Chem B. 2019 Jan 24;123(3):689-701

pubmed: 30632755

Nat Chem Biol. 2006 Feb;2(2):87-94

pubmed: 16415859

Faraday Discuss. 2015;184:9-36

pubmed: 26616210

J Phys Chem A. 2017 Jul 13;121(27):5100-5109

pubmed: 28616980

Methods. 2016 Aug 1;105:90-8

pubmed: 27038745

Biophys J. 2015 Feb 3;108(3):540-56

pubmed: 25650922

J Am Chem Soc. 2009 Dec 30;131(51):18192-3

pubmed: 19961226

Top-down machine learning approach for high-throughput single-molecule analysis.

Journal

Informations de publication

Résumé

Autres résumés

Identifiants

Substances chimiques

Types de publication

Langues

Sous-ensembles de citation

Subventions

Informations de copyright

Déclaration de conflit d'intérêts

Références

Auteurs

David S White (DS)

Marcel P Goldschen-Ohm (MP)

Randall H Goldsmith (RH)

Baron Chanda (B)

Articles similaires

Selecting optimal software code descriptors-The case of Java.

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Multilabel SegSRGAN-A framework for parcellation and morphometry of preterm brain in MRI.

Accuracy of web-based automated versus digital manual cephalometric landmark identification.

Classifications MeSH