Divergences in color perception between deep neural networks and humans.

Color perception Computer vision Deep learning Embodied cognition Wavelet decomposition

Journal

Cognition
ISSN: 1873-7838
Titre abrégé: Cognition
Pays: Netherlands
ID NLM: 0367541

Informations de publication

Date de publication:
Dec 2023
Historique:
received: 25 02 2023
revised: 23 06 2023
accepted: 09 09 2023
pubmed: 17 9 2023
medline: 17 9 2023
entrez: 16 9 2023
Statut: ppublish

Résumé

Deep neural networks (DNNs) are increasingly proposed as models of human vision, bolstered by their impressive performance on image classification and object recognition tasks. Yet, the extent to which DNNs capture fundamental aspects of human vision such as color perception remains unclear. Here, we develop novel experiments for evaluating the perceptual coherence of color embeddings in DNNs, and we assess how well these algorithms predict human color similarity judgments collected via an online survey. We find that state-of-the-art DNN architectures - including convolutional neural networks and vision transformers - provide color similarity judgments that strikingly diverge from human color judgments of (i) images with controlled color properties, (ii) images generated from online searches, and (iii) real-world images from the canonical CIFAR-10 dataset. We compare DNN performance against an interpretable and cognitively plausible model of color perception based on wavelet decomposition, inspired by foundational theories in computational neuroscience. While one deep learning model - a convolutional DNN trained on a style transfer task - captures some aspects of human color perception, our wavelet algorithm provides more coherent color embeddings that better predict human color judgments compared to all DNNs we examine. These results hold when altering the high-level visual task used to train similar DNN architectures (e.g., image classification versus image segmentation), as well as when examining the color embeddings of different layers in a given DNN architecture. These findings break new ground in the effort to analyze the perceptual representations of machine learning algorithms and to improve their ability to serve as cognitively plausible models of human vision. Implications for machine learning, human perception, and embodied cognition are discussed.

Identifiants

pubmed: 37716312
pii: S0010-0277(23)00255-X
doi: 10.1016/j.cognition.2023.105621
pii:
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

105621

Informations de copyright

Copyright © 2023 The Authors. Published by Elsevier B.V. All rights reserved.

Auteurs

Ethan O Nadler (EO)

Carnegie Observatories, USA; Department of Physics, University of Southern California, USA. Electronic address: enadler@carnegiescience.edu.

Elise Darragh-Ford (E)

Kavli Institute for Particle Astrophysics and Cosmology and Department of Physics, Stanford University, USA.

Bhargav Srinivasa Desikan (BS)

School of Computer and Communication Sciences, École Polytechnique Fédérale de Lausanne, Switzerland; Knowledge Lab, University of Chicago, USA.

Christian Conaway (C)

University of California, San Diego, USA.

Mark Chu (M)

School of the Arts, Columbia University, USA.

Tasker Hull (T)

Psiphon Inc., Toronto, Canada.

Douglas Guilbeault (D)

Haas School of Business, University of California, Berkeley, USA. Electronic address: douglas.guilbeault@haas.berkeley.edu.

Classifications MeSH