Learning curves for drug response prediction in cancer cell lines.
Cell line
Deep learning
Drug response prediction
Learning curve
Machine learning
Power law
Journal
BMC bioinformatics
ISSN: 1471-2105
Titre abrégé: BMC Bioinformatics
Pays: England
ID NLM: 100965194
Informations de publication
Date de publication:
17 May 2021
17 May 2021
Historique:
received:
29
11
2020
accepted:
04
05
2021
entrez:
18
5
2021
pubmed:
19
5
2021
medline:
20
5
2021
Statut:
epublish
Résumé
Motivated by the size and availability of cell line drug sensitivity data, researchers have been developing machine learning (ML) models for predicting drug response to advance cancer treatment. As drug sensitivity studies continue generating drug response data, a common question is whether the generalization performance of existing prediction models can be further improved with more training data. We utilize empirical learning curves for evaluating and comparing the data scaling properties of two neural networks (NNs) and two gradient boosting decision tree (GBDT) models trained on four cell line drug screening datasets. The learning curves are accurately fitted to a power law model, providing a framework for assessing the data scaling behavior of these models. The curves demonstrate that no single model dominates in terms of prediction performance across all datasets and training sizes, thus suggesting that the actual shape of these curves depends on the unique pair of an ML model and a dataset. The multi-input NN (mNN), in which gene expressions of cancer cells and molecular drug descriptors are input into separate subnetworks, outperforms a single-input NN (sNN), where the cell and drug features are concatenated for the input layer. In contrast, a GBDT with hyperparameter tuning exhibits superior performance as compared with both NNs at the lower range of training set sizes for two of the tested datasets, whereas the mNN consistently performs better at the higher range of training sizes. Moreover, the trajectory of the curves suggests that increasing the sample size is expected to further improve prediction scores of both NNs. These observations demonstrate the benefit of using learning curves to evaluate prediction models, providing a broader perspective on the overall data scaling characteristics. A fitted power law learning curve provides a forward-looking metric for analyzing prediction performance and can serve as a co-design tool to guide experimental biologists and computational scientists in the design of future experiments in prospective research studies.
Sections du résumé
BACKGROUND
BACKGROUND
Motivated by the size and availability of cell line drug sensitivity data, researchers have been developing machine learning (ML) models for predicting drug response to advance cancer treatment. As drug sensitivity studies continue generating drug response data, a common question is whether the generalization performance of existing prediction models can be further improved with more training data.
METHODS
METHODS
We utilize empirical learning curves for evaluating and comparing the data scaling properties of two neural networks (NNs) and two gradient boosting decision tree (GBDT) models trained on four cell line drug screening datasets. The learning curves are accurately fitted to a power law model, providing a framework for assessing the data scaling behavior of these models.
RESULTS
RESULTS
The curves demonstrate that no single model dominates in terms of prediction performance across all datasets and training sizes, thus suggesting that the actual shape of these curves depends on the unique pair of an ML model and a dataset. The multi-input NN (mNN), in which gene expressions of cancer cells and molecular drug descriptors are input into separate subnetworks, outperforms a single-input NN (sNN), where the cell and drug features are concatenated for the input layer. In contrast, a GBDT with hyperparameter tuning exhibits superior performance as compared with both NNs at the lower range of training set sizes for two of the tested datasets, whereas the mNN consistently performs better at the higher range of training sizes. Moreover, the trajectory of the curves suggests that increasing the sample size is expected to further improve prediction scores of both NNs. These observations demonstrate the benefit of using learning curves to evaluate prediction models, providing a broader perspective on the overall data scaling characteristics.
CONCLUSIONS
CONCLUSIONS
A fitted power law learning curve provides a forward-looking metric for analyzing prediction performance and can serve as a co-design tool to guide experimental biologists and computational scientists in the design of future experiments in prospective research studies.
Identifiants
pubmed: 34001007
doi: 10.1186/s12859-021-04163-y
pii: 10.1186/s12859-021-04163-y
pmc: PMC8130157
doi:
Substances chimiques
Pharmaceutical Preparations
0
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
252Références
Mol Pharm. 2019 Dec 2;16(12):4797-4806
pubmed: 31618586
BMC Med Genomics. 2019 Jan 31;12(Suppl 1):18
pubmed: 30704458
Nat Commun. 2020 Sep 1;11(1):4391
pubmed: 32873806
Pac Symp Biocomput. 2014;:63-74
pubmed: 24297534
NPJ Precis Oncol. 2020 Jun 15;4:19
pubmed: 32566759
Nat Rev Cancer. 2010 Apr;10(4):241-53
pubmed: 20300105
BMC Bioinformatics. 2018 Dec 28;19(Suppl 17):497
pubmed: 30591023
Sci Rep. 2020 Oct 22;10(1):18040
pubmed: 33093487
Nat Biotechnol. 2014 Dec;32(12):1202-12
pubmed: 24880487
BMC Bioinformatics. 2018 Dec 21;19(Suppl 18):486
pubmed: 30577754
J Comput Biol. 2003;10(2):119-42
pubmed: 12804087
Pharmacol Ther. 2019 Nov;203:107395
pubmed: 31374225
Cell. 2017 Nov 30;171(6):1437-1452.e17
pubmed: 29195078
Genes (Basel). 2020 Sep 11;11(9):
pubmed: 32933072
Semin Oncol. 1992 Dec;19(6):622-38
pubmed: 1462164
Nucleic Acids Res. 2013 Jan;41(Database issue):D955-61
pubmed: 23180760
J Cheminform. 2018 Feb 06;10(1):4
pubmed: 29411163
Bioinformatics. 2019 Oct 1;35(19):3743-3751
pubmed: 30850846
Bioinformatics. 2019 Jul 15;35(14):i501-i509
pubmed: 31510700
Cancer Discov. 2015 Nov;5(11):1210-23
pubmed: 26482930
BMC Med Inform Decis Mak. 2012 Feb 15;12:8
pubmed: 22336388
J Cheminform. 2019 Jun 19;11(1):41
pubmed: 31218493
Nature. 2018 Aug;560(7718):325-330
pubmed: 30089904
J Natl Cancer Inst. 2013 Apr 3;105(7):452-8
pubmed: 23434901