Improving deconvolution methods in biology through open innovation competitions: an application to the connectivity map.


Journal

Bioinformatics (Oxford, England)
ISSN: 1367-4811
Titre abrégé: Bioinformatics
Pays: England
ID NLM: 9808944

Informations de publication

Date de publication:
29 09 2021
Historique:
received: 03 11 2020
revised: 03 03 2021
accepted: 19 03 2021
pubmed: 8 4 2021
medline: 2 2 2023
entrez: 7 4 2021
Statut: ppublish

Résumé

Do machine learning methods improve standard deconvolution techniques for gene expression data? This article uses a unique new dataset combined with an open innovation competition to evaluate a wide range of approaches developed by 294 competitors from 20 countries. The competition's objective was to address a deconvolution problem critical to analyzing genetic perturbations from the Connectivity Map. The issue consists of separating gene expression of individual genes from raw measurements obtained from gene pairs. We evaluated the outcomes using ground-truth data (direct measurements for single genes) obtained from the same samples. We find that the top-ranked algorithm, based on random forest regression, beat the other methods in accuracy and reproducibility; more traditional gaussian-mixture methods performed well and tended to be faster, and the best deep learning approach yielded outcomes slightly inferior to the above methods. We anticipate researchers in the field will find the dataset and algorithms developed in this study to be a powerful research tool for benchmarking their deconvolution methods and a resource useful for multiple applications. The data is freely available at clue.io/data (section Contests) and the software is on GitHub at https://github.com/cmap/gene_deconvolution_challenge. Supplementary data are available at Bioinformatics online.

Identifiants

pubmed: 33824954
pii: 6180069
doi: 10.1093/bioinformatics/btab192
pmc: PMC8479655
doi:

Types de publication

Journal Article Research Support, N.I.H., Extramural Research Support, Non-U.S. Gov't Research Support, U.S. Gov't, Non-P.H.S.

Langues

eng

Sous-ensembles de citation

IM

Pagination

2889-2895

Subventions

Organisme : NHGRI NIH HHS
ID : U01 HG008699
Pays : United States
Organisme : NHGRI NIH HHS
ID : U54 HG006093
Pays : United States
Organisme : Wendy Schmidt Foundation
Organisme : NHGRI NIH HHS
ID : U54 HG006093
Pays : United States
Organisme : NIH HHS
ID : 5U01HG008699
Pays : United States

Informations de copyright

© The Author(s) 2021. Published by Oxford University Press.

Auteurs

Andrea Blasco (A)

Harvard Business School, Harvard University, Boston, MA 02163, USA.
Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA.

Ted Natoli (T)

Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA.

Michael G Endres (MG)

Harvard Business School, Harvard University, Boston, MA 02163, USA.

Rinat A Sergeev (RA)

Harvard Business School, Harvard University, Boston, MA 02163, USA.

Steven Randazzo (S)

Harvard Business School, Harvard University, Boston, MA 02163, USA.

Jin H Paik (JH)

Harvard Business School, Harvard University, Boston, MA 02163, USA.

N J Maximilian Macaluso (NJM)

Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA.

Rajiv Narayan (R)

Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA.

Xiaodong Lu (X)

Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA.

David Peck (D)

Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA.

Karim R Lakhani (KR)

Harvard Business School, Harvard University, Boston, MA 02163, USA.
National Bureau of Economic Research (NBER), Cambridge, MA 02138, USA.

Aravind Subramanian (A)

Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA.

Articles similaires

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Yoan Martínez-López, Paulina Phoobane, Yanaima Jauriga et al.
1.00
Blood-Brain Barrier Machine Learning Humans Support Vector Machine Software
Humans Middle Aged Female Male Surveys and Questionnaires

Classifications MeSH