Application of deep convolutional neural networks in classification of protein subcellular localization with microscopy images.


Journal

Genetic epidemiology
ISSN: 1098-2272
Titre abrégé: Genet Epidemiol
Pays: United States
ID NLM: 8411723

Informations de publication

Date de publication:
04 2019
Historique:
received: 20 08 2018
revised: 21 10 2018
accepted: 26 11 2018
pubmed: 8 1 2019
medline: 11 5 2019
entrez: 8 1 2019
Statut: ppublish

Résumé

Single-cell microscopy image analysis has proved invaluable in protein subcellular localization for inferring gene/protein function. Fluorescent-tagged proteins across cellular compartments are tracked and imaged in response to genetic or environmental perturbations. With a large number of images generated by high-content microscopy while manual labeling is both labor-intensive and error-prone, machine learning offers a viable alternative for automatic labeling of subcellular localizations. Contrarily, in recent years applications of deep learning methods to large datasets in natural images and other domains have become quite successful. An appeal of deep learning methods is that they can learn salient features from complicated data with little data preprocessing. For such purposes, we applied several representative types of deep convolutional neural networks (CNNs) and two popular ensemble methods, random forests and gradient boosting, to predict protein subcellular localization with a moderately large cell image data set. We show a consistently better predictive performance of CNNs over the two ensemble methods. We also demonstrate the use of CNNs for feature extraction. In the end, we share our computer code and pretrained models to facilitate CNN's applications in genetics and computational biology.

Identifiants

pubmed: 30614068
doi: 10.1002/gepi.22182
pmc: PMC6416075
mid: NIHMS1003940
doi:

Substances chimiques

Saccharomyces cerevisiae Proteins 0

Types de publication

Journal Article Research Support, N.I.H., Extramural Research Support, U.S. Gov't, Non-P.H.S.

Langues

eng

Sous-ensembles de citation

IM

Pagination

330-341

Subventions

Organisme : NHLBI NIH HHS
ID : R01HL105397
Pays : United States
Organisme : NIA NIH HHS
ID : R21 AG057038
Pays : United States
Organisme : NIGMS NIH HHS
ID : R01 GM126002
Pays : United States
Organisme : NIGMS NIH HHS
ID : R01 GM113250
Pays : United States
Organisme : NHLBI NIH HHS
ID : R01 HL105397
Pays : United States
Organisme : NHLBI NIH HHS
ID : R01HL116720
Pays : United States
Organisme : NHLBI NIH HHS
ID : R01 HL116720
Pays : United States
Organisme : NIGMS NIH HHS
ID : R01GM126002
Pays : United States
Organisme : NIGMS NIH HHS
ID : R01GM113250
Pays : United States
Organisme : NIA NIH HHS
ID : R21AG057038
Pays : United States

Informations de copyright

© 2019 Wiley Periodicals, Inc.

Références

Nature. 2015 May 28;521(7553):436-44
pubmed: 26017442
G3 (Bethesda). 2017 May 5;7(5):1385-1392
pubmed: 28391243
Nat Methods. 2016 Apr;13(4):371-378
pubmed: 26928762
Mol Syst Biol. 2016 Jul 29;12(7):878
pubmed: 27474269
Mol Syst Biol. 2017 Apr 18;13(4):924
pubmed: 28420678
G3 (Bethesda). 2015 Apr 15;5(6):1223-32
pubmed: 26048563
Cell. 2015 Jun 4;161(6):1413-24
pubmed: 26046442

Auteurs

Mengli Xiao (M)

Division of Biostatistics, University of Minnesota, Minneapolis, Minnesota.

Xiaotong Shen (X)

School of Statistics, University of Minnesota, Minneapolis, Minnesota.

Wei Pan (W)

Division of Biostatistics, University of Minnesota, Minneapolis, Minnesota.

Articles similaires

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages
Databases, Protein Protein Domains Protein Folding Proteins Deep Learning
1.00
Humans Magnetic Resonance Imaging Brain Infant, Newborn Infant, Premature
Humans Meta-Analysis as Topic Sample Size Models, Statistical Computer Simulation

Classifications MeSH