Scalable watermarking for identifying large language model outputs.


Journal

Nature
ISSN: 1476-4687
Titre abrégé: Nature
Pays: England
ID NLM: 0410462

Informations de publication

Date de publication:
Oct 2024
Historique:
received: 08 04 2024
accepted: 05 09 2024
medline: 24 10 2024
pubmed: 24 10 2024
entrez: 24 10 2024
Statut: ppublish

Résumé

Large language models (LLMs) have enabled the generation of high-quality synthetic text, often indistinguishable from human-written content, at a scale that can markedly affect the nature of the information ecosystem

Identifiants

pubmed: 39443777
doi: 10.1038/s41586-024-08025-4
pii: 10.1038/s41586-024-08025-4
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

818-823

Informations de copyright

© 2024. The Author(s).

Références

Köbis, N. & Mossink, L. D. Artificial intelligence versus Maya Angelou: experimental evidence that people cannot differentiate AI-generated from human-written poetry. Comput. Hum. Behav. 114, 106553 (2021).
doi: 10.1016/j.chb.2020.106553
Clark, E. et al. All that’s ‘human’ is not gold: evaluating human evaluation of generated text. In Proc. 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (eds. Zong, C. et al.) 7282–7296 (Association for Computational Linguistics, 2021).
Jakesch, M., Hancock, J. T. & Naaman, M. Human heuristics for AI-generated language are flawed. Proc. Natl Acad. Sci. USA 120, 2208839120 (2023).
doi: 10.1073/pnas.2208839120
Wu, J. et al. A survey on LLM-generated text detection: necessity, methods, and future directions. Preprint at https://arxiv.org/abs/2310.14724 (2024).
Chen, C. et al. Accelerating large language model decoding with speculative sampling. Preprint at https://arxiv.org/abs/2302.01318 (2023).
Team, G. et al. Gemini: a family of highly capable multimodal models. Preprint at https://arxiv.org/abs/2312.11805 (2023).
SynthID-Team Code and data. GitHub https://github.com/google-deepmind/synthid-text (2024).
Shumailov, I. et al. AI models collapse when trained on recursively generated data. Nature 631, 755–759 (2024).
doi: 10.1038/s41586-024-07566-y pubmed: 39048682 pmcid: 11269175
Alemohammad, S. et al. Self-consuming generative models go MAD. In Proc. Twelfth International Conference on Learning Representations (ICLR, 2024).
Taori, R. & Hashimoto, T. Data feedback loops: model-driven amplification of dataset biases. In Proc. 40th International Conference on Machine Learning 33883–33920 (JMLR, 2023).
Wyllie, S., Shumailov, I. & Papernot, N. Fairness feedback loops: training on synthetic data amplifies bias. In Proc. 2024 ACM Conference on Fairness, Accountability, and Transparency 2113–2147 (Association for Computing Machinery, 2024).
Krishna, K., Song, Y., Karpinska, M., Wieting, J. F. & Iyyer, M. Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense. In Proc. Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS, 2023).
Mitchell, E., Lee, Y., Khazatsky, A., Manning, C. D. & Finn, C. DetectGPT: zero-shot machine-generated text detection using probability curvature. In Proc. 40th International Conference on Machine Learning 24950–24962 (JMLR, 2023).
Verma, V., Fleisig, E., Tomlin, N. & Klein, D. Ghostbuster: detecting text ghostwritten by large language models. In Proc. 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) 1702–1717 (Association for Computational Linguistics, 2024).
Hans, A. et al. Spotting LLMs with binoculars: zero-shot detection of machine-generated text. In Proc. 41st International Conference on Machine Learning 17519-17537 (PMLR, 2024).
Elkhatat, A. M., Elsaid, K. & Almeer, S. Evaluating the efficacy of AI content detection tools in differentiating between human and AI-generated text. Int. J. Educ. Integrity 19, 17 (2023).
doi: 10.1007/s40979-023-00140-5
Liang, W., Yuksekgonul, M., Mao, Y., Wu, E. & Zou, J. GPT detectors are biased against non-native English writers. Patterns 4, 100779 (2023).
doi: 10.1016/j.patter.2023.100779 pubmed: 37521038 pmcid: 10382961
Kamaruddin, N. S., Kamsin, A., Por, L. Y. & Rahman, H. A review of text watermarking: theory, methods, and applications. IEEE Access 6, 8011–8028 (2018).
doi: 10.1109/ACCESS.2018.2796585
Gu, C., Huang, C., Zheng, X., Chang, K.-W. & Hsieh, C.-J. Watermarking pre-trained language models with backdooring. Preprint at https://arxiv.org/abs/2210.07543 (2022).
SynthID-Team Watermarking AI-generated text and video with SynthID. Google DeepMind Blog https://deepmind.google/discover/blog/watermarking-ai-generated-text-and-video-with-synthid (2024).
Piet, J., Sitawarin, C., Fang, V., Mu, N. & Wagner, D. Mark my words: analyzing and evaluating language model watermarks. Preprint at https://arxiv.org/abs/2312.00273 (2023).
Aaronson, S. & Kirchner, H. Watermarking of large language models. Scott Aaronson https://www.scottaaronson.com/talks/watermark.ppt (2022).
Kirchenbauer, J. et al. A watermark for large language models. In Proc. 40th International Conference on Machine Learning 17061–17084 (PMLR, 2023).
Kuditipudi, R., Thickstun, J., Hashimoto, T. & Liang, P. Robust distortion-free watermarks for language models. Trans. Mach. Learn. Res. https://openreview.net/pdf?id=FpaCL1MO2C (2024).
Christ, M., Gunn, S. & Zamir, O. Undetectable watermarks for language models. In Proc. Thirty Seventh Conference on Learning Theory 1125–1139 (PMLR, 2024).
Casper, S. et al. Open problems and fundamental limitations of reinforcement learning from human feedback. Trans. Mach. Learn. Res. https://openreview.net/pdf?id=bx24KpJ4Eb (2023).
Hu, Z. et al. Unbiased watermark for large language models. In Proc. Twelfth International Conference on Learning Representations (ICLR, 2024).
Team, G. et al. Gemma: open models based on Gemini research and technology. Preprint at https://arxiv.org/abs/2403.08295 (2024).
Jiang, A. Q. et al. Mistral 7B. Preprint at https://arxiv.org/abs/2310.06825 (2023).
Fan, A. et al. ELI5: long form question answering. In Proc. 57th Annual Meeting of the Association for Computational Linguistics (eds Korhonen, A. et al.) 3558–3567 (Association for Computational Linguistics, 2019).
Cloud, G. TPU v5e. Google Cloud https://cloud.google.com/tpu/docs/v5e-inference (2024).
Jovanović, N., Staab, R. & Vechev, M. Watermark stealing in large language models. In Proc. 41st International Conference on Machine Learning 22570–22593 (PMLR, 2024).
Zhang, H. et al. Watermarks in the sand: impossibility of strong watermarking for language models. In Proc. 41st International Conference on Machine Learning 58851–58880 (PMLR, 2024).
Holtzman, A., Buys, J., Du, L., Forbes, M. & Choi, Y. The curious case of neural text degeneration. In Proc. Eighth International Conference on Learning Representations (ICLR, 2020).
Ackley, D. H., Hinton, G. E. & Sejnowski, T. J. A learning algorithm for Boltzmann machines. Cogn. Sci. 9, 147–169 (1985).
Fan, A., Lewis, M. & Dauphin, Y. Hierarchical neural story generation. In Proc. 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (eds Gurevych, I. & Miyao, Y.) 889–898 (Association for Computational Linguistics, 2018).

Auteurs

Sumanth Dathathri (S)

Google DeepMind, London, UK. sdathath@google.com.

Abigail See (A)

Google DeepMind, London, UK.

Sumedh Ghaisas (S)

Google DeepMind, London, UK.

Po-Sen Huang (PS)

Google DeepMind, London, UK.

Rob McAdam (R)

Google, Mountain View, CA, USA.

Johannes Welbl (J)

Google DeepMind, London, UK.

Vandana Bachani (V)

Google DeepMind, London, UK.

Alex Kaskasoli (A)

Google DeepMind, London, UK.

Robert Stanforth (R)

Google DeepMind, London, UK.

Tatiana Matejovicova (T)

Google DeepMind, London, UK.

Jamie Hayes (J)

Google DeepMind, London, UK.

Nidhi Vyas (N)

Google, Mountain View, CA, USA.

Majd Al Merey (MA)

Google, Mountain View, CA, USA.

Jonah Brown-Cohen (J)

Google DeepMind, London, UK.

Rudy Bunel (R)

Google DeepMind, London, UK.

Borja Balle (B)

Google DeepMind, London, UK.

Taylan Cemgil (T)

Google DeepMind, London, UK.

Zahra Ahmed (Z)

Google DeepMind, London, UK.

Kitty Stacpoole (K)

Google DeepMind, London, UK.

Ilia Shumailov (I)

Google DeepMind, London, UK.

Ciprian Baetu (C)

Google, Mountain View, CA, USA.

Sven Gowal (S)

Google DeepMind, London, UK.

Pushmeet Kohli (P)

Google DeepMind, London, UK. pushmeet@google.com.

Classifications MeSH