Representation transfer for differentially private drug sensitivity prediction.
Journal
Bioinformatics (Oxford, England)
ISSN: 1367-4811
Titre abrégé: Bioinformatics
Pays: England
ID NLM: 9808944
Informations de publication
Date de publication:
15 07 2019
15 07 2019
Historique:
entrez:
13
9
2019
pubmed:
13
9
2019
medline:
13
6
2020
Statut:
ppublish
Résumé
Human genomic datasets often contain sensitive information that limits use and sharing of the data. In particular, simple anonymization strategies fail to provide sufficient level of protection for genomic data, because the data are inherently identifiable. Differentially private machine learning can help by guaranteeing that the published results do not leak too much information about any individual data point. Recent research has reached promising results on differentially private drug sensitivity prediction using gene expression data. Differentially private learning with genomic data is challenging because it is more difficult to guarantee privacy in high dimensions. Dimensionality reduction can help, but if the dimension reduction mapping is learned from the data, then it needs to be differentially private too, which can carry a significant privacy cost. Furthermore, the selection of any hyperparameters (such as the target dimensionality) needs to also avoid leaking private information. We study an approach that uses a large public dataset of similar type to learn a compact representation for differentially private learning. We compare three representation learning methods: variational autoencoders, principal component analysis and random projection. We solve two machine learning tasks on gene expression of cancer cell lines: cancer type classification, and drug sensitivity prediction. The experiments demonstrate significant benefit from all representation learning methods with variational autoencoders providing the most accurate predictions most often. Our results significantly improve over previous state-of-the-art in accuracy of differentially private drug sensitivity prediction. Code used in the experiments is available at https://github.com/DPBayes/dp-representation-transfer.
Identifiants
pubmed: 31510659
pii: 5529143
doi: 10.1093/bioinformatics/btz373
pmc: PMC6612875
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
i218-i224Informations de copyright
© The Author(s) 2019. Published by Oxford University Press.
Références
Proc Natl Acad Sci U S A. 2001 Sep 11;98(19):10787-92
pubmed: 11553813
PLoS Genet. 2008 Aug 29;4(8):e1000167
pubmed: 18769715
Nucleic Acids Res. 2013 Jan;41(Database issue):D955-61
pubmed: 23180760
Science. 2013 Jan 18;339(6117):321-4
pubmed: 23329047
Mach Learn. 2013 Oct;93(1):163-183
pubmed: 24482559
Nat Biotechnol. 2014 Dec;32(12):1202-12
pubmed: 24880487
Biol Direct. 2018 Feb 6;13(1):1
pubmed: 29409513