Learning curves for drug response prediction in cancer cell lines.

Cell Line Learning Curve Machine Learning Neoplasms / drug therapy Pharmaceutical Preparations Prospective Studies

Cell line Deep learning Drug response prediction Learning curve Machine learning Power law

Journal

BMC bioinformatics

ISSN: 1471-2105

Titre abrégé: BMC Bioinformatics

Pays: England

ID NLM: 100965194

Informations de publication

Date de publication:
17 May 2021

Historique:

received: 29 11 2020

accepted: 04 05 2021

entrez: 18 5 2021

pubmed: 19 5 2021

medline: 20 5 2021

Statut: epublish

Résumé

Motivated by the size and availability of cell line drug sensitivity data, researchers have been developing machine learning (ML) models for predicting drug response to advance cancer treatment. As drug sensitivity studies continue generating drug response data, a common question is whether the generalization performance of existing prediction models can be further improved with more training data. We utilize empirical learning curves for evaluating and comparing the data scaling properties of two neural networks (NNs) and two gradient boosting decision tree (GBDT) models trained on four cell line drug screening datasets. The learning curves are accurately fitted to a power law model, providing a framework for assessing the data scaling behavior of these models. The curves demonstrate that no single model dominates in terms of prediction performance across all datasets and training sizes, thus suggesting that the actual shape of these curves depends on the unique pair of an ML model and a dataset. The multi-input NN (mNN), in which gene expressions of cancer cells and molecular drug descriptors are input into separate subnetworks, outperforms a single-input NN (sNN), where the cell and drug features are concatenated for the input layer. In contrast, a GBDT with hyperparameter tuning exhibits superior performance as compared with both NNs at the lower range of training set sizes for two of the tested datasets, whereas the mNN consistently performs better at the higher range of training sizes. Moreover, the trajectory of the curves suggests that increasing the sample size is expected to further improve prediction scores of both NNs. These observations demonstrate the benefit of using learning curves to evaluate prediction models, providing a broader perspective on the overall data scaling characteristics. A fitted power law learning curve provides a forward-looking metric for analyzing prediction performance and can serve as a co-design tool to guide experimental biologists and computational scientists in the design of future experiments in prospective research studies.

Sections du résumé

BACKGROUND BACKGROUND

METHODS METHODS

We utilize empirical learning curves for evaluating and comparing the data scaling properties of two neural networks (NNs) and two gradient boosting decision tree (GBDT) models trained on four cell line drug screening datasets. The learning curves are accurately fitted to a power law model, providing a framework for assessing the data scaling behavior of these models.

RESULTS RESULTS

The curves demonstrate that no single model dominates in terms of prediction performance across all datasets and training sizes, thus suggesting that the actual shape of these curves depends on the unique pair of an ML model and a dataset. The multi-input NN (mNN), in which gene expressions of cancer cells and molecular drug descriptors are input into separate subnetworks, outperforms a single-input NN (sNN), where the cell and drug features are concatenated for the input layer. In contrast, a GBDT with hyperparameter tuning exhibits superior performance as compared with both NNs at the lower range of training set sizes for two of the tested datasets, whereas the mNN consistently performs better at the higher range of training sizes. Moreover, the trajectory of the curves suggests that increasing the sample size is expected to further improve prediction scores of both NNs. These observations demonstrate the benefit of using learning curves to evaluate prediction models, providing a broader perspective on the overall data scaling characteristics.

CONCLUSIONS CONCLUSIONS

A fitted power law learning curve provides a forward-looking metric for analyzing prediction performance and can serve as a co-design tool to guide experimental biologists and computational scientists in the design of future experiments in prospective research studies.

Identifiants

DOI: 10.1186/s12859-021-04163-y PMID: 34001007 PMC: PMC8130157

pubmed: 34001007

doi: 10.1186/s12859-021-04163-y

pii: 10.1186/s12859-021-04163-y

pmc: PMC8130157

doi:

Substances chimiques

Pharmaceutical Preparations 0

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

Pagination

252

Références

Mol Pharm. 2019 Dec 2;16(12):4797-4806

pubmed: 31618586

BMC Med Genomics. 2019 Jan 31;12(Suppl 1):18

pubmed: 30704458

Nat Commun. 2020 Sep 1;11(1):4391

pubmed: 32873806

Pac Symp Biocomput. 2014;:63-74

pubmed: 24297534

NPJ Precis Oncol. 2020 Jun 15;4:19

pubmed: 32566759

Nat Rev Cancer. 2010 Apr;10(4):241-53

pubmed: 20300105

BMC Bioinformatics. 2018 Dec 28;19(Suppl 17):497

pubmed: 30591023

Sci Rep. 2020 Oct 22;10(1):18040

pubmed: 33093487

Nat Biotechnol. 2014 Dec;32(12):1202-12

pubmed: 24880487

BMC Bioinformatics. 2018 Dec 21;19(Suppl 18):486

pubmed: 30577754

J Comput Biol. 2003;10(2):119-42

pubmed: 12804087

Pharmacol Ther. 2019 Nov;203:107395

pubmed: 31374225

Cell. 2017 Nov 30;171(6):1437-1452.e17

pubmed: 29195078

Genes (Basel). 2020 Sep 11;11(9):

pubmed: 32933072

Semin Oncol. 1992 Dec;19(6):622-38

pubmed: 1462164

Nucleic Acids Res. 2013 Jan;41(Database issue):D955-61

pubmed: 23180760

J Cheminform. 2018 Feb 06;10(1):4

pubmed: 29411163

Bioinformatics. 2019 Oct 1;35(19):3743-3751

pubmed: 30850846

Bioinformatics. 2019 Jul 15;35(14):i501-i509

pubmed: 31510700

Cancer Discov. 2015 Nov;5(11):1210-23

pubmed: 26482930

BMC Med Inform Decis Mak. 2012 Feb 15;12:8

pubmed: 22336388

J Cheminform. 2019 Jun 19;11(1):41

pubmed: 31218493

Nature. 2018 Aug;560(7718):325-330

pubmed: 30089904

J Natl Cancer Inst. 2013 Apr 3;105(7):452-8

pubmed: 23434901

Learning curves for drug response prediction in cancer cell lines.

Journal

Informations de publication

Résumé

Sections du résumé

Identifiants

Substances chimiques

Types de publication

Langues

Sous-ensembles de citation

Pagination

Références

Auteurs

Alexander Partin (A)

Thomas Brettin (T)

Yvonne A Evrard (YA)

Yitan Zhu (Y)

Hyunseung Yoo (H)

Fangfang Xia (F)

Songhao Jiang (S)

Austin Clyde (A)

Maulik Shukla (M)

Michael Fonstein (M)

James H Doroshow (JH)

Rick L Stevens (RL)

Articles similaires

Impact of supply chain disruptions and drug shortages on drug utilization: A scoping review protocol.

Patients' knowledge about renal secondary effects of anti-tumoral drugs and renal protection measures.

Robotic surgery of the urothelial carcinoma of the upper urinary tract single surgeon initial experience, 66 consecutive cases.

Survival rate changes in children with congenital diaphragmatic hernia over the past three decades: a nationwide, population-based prospective nested case-control study.

Classifications MeSH