Privacy-preserving and robust watermarking on sequential genome data using belief propagation and local differential privacy.
Journal
Bioinformatics (Oxford, England)
ISSN: 1367-4811
Titre abrégé: Bioinformatics
Pays: England
ID NLM: 9808944
Informations de publication
Date de publication:
09 Sep 2021
09 Sep 2021
Historique:
received:
30
09
2020
revised:
09
02
2021
accepted:
23
02
2021
medline:
26
2
2021
pubmed:
26
2
2021
entrez:
25
2
2021
Statut:
ppublish
Résumé
Genome data is a subject of study for both biology and computer science since the start of the Human Genome Project in 1990. Since then, genome sequencing for medical and social purposes becomes more and more available and affordable. Genome data can be shared on public websites or with service providers (SPs). However, this sharing compromises the privacy of donors even under partial sharing conditions. We mainly focus on the liability aspect ensued by the unauthorized sharing of these genome data. One of the techniques to address the liability issues in data sharing is the watermarking mechanism. To detect malicious correspondents and SPs-whose aim is to share genome data without individuals' consent and undetected-, we propose a novel watermarking method on sequential genome data using belief propagation algorithm. In our method, we have two criteria to satisfy. (i) Embedding robust watermarks so that the malicious adversaries cannot temper the watermark by modification and are identified with high probability. (ii) Achieving ϵ-local differential privacy in all data sharings with SPs. For the preservation of system robustness against single SP and collusion attacks, we consider publicly available genomic information like Minor Allele Frequency, Linkage Disequilibrium, Phenotype Information and Familial Information. Our proposed scheme achieves 100% detection rate against the single SP attacks with only 3% watermark length. For the worst case scenario of collusion attacks (50% of SPs are malicious), 80% detection is achieved with 5% watermark length and 90% detection is achieved with 10% watermark length. For all cases, the impact of ϵ on precision remained negligible and high privacy is ensured. https://github.com/acoksuz/PPRW\_SGD\_BPLDP. Supplementary data are available at Bioinformatics online.
Identifiants
pubmed: 33630065
pii: 6149476
doi: 10.1093/bioinformatics/btab128
pmc: PMC11025661
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
2668-2674Subventions
Organisme : NLM NIH HHS
ID : R01 LM013429
Pays : United States
Informations de copyright
© The Author(s) 2021. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Références
Trends Biotechnol. 2003 Mar;21(3):113-6
pubmed: 12628367
Nat Biotechnol. 2019 Oct;37(10):1115-1117
pubmed: 31537915
Hum Mol Genet. 2004 Oct 1;13 Spec No 2:R225-33
pubmed: 15358729
J Mol Diagn. 2019 Jul;21(4):542-552
pubmed: 30703562
Bioinformatics. 2010 Nov 15;26(22):2867-73
pubmed: 20926424