Divergences in color perception between deep neural networks and humans.
Color perception
Computer vision
Deep learning
Embodied cognition
Wavelet decomposition
Journal
Cognition
ISSN: 1873-7838
Titre abrégé: Cognition
Pays: Netherlands
ID NLM: 0367541
Informations de publication
Date de publication:
Dec 2023
Dec 2023
Historique:
received:
25
02
2023
revised:
23
06
2023
accepted:
09
09
2023
pubmed:
17
9
2023
medline:
17
9
2023
entrez:
16
9
2023
Statut:
ppublish
Résumé
Deep neural networks (DNNs) are increasingly proposed as models of human vision, bolstered by their impressive performance on image classification and object recognition tasks. Yet, the extent to which DNNs capture fundamental aspects of human vision such as color perception remains unclear. Here, we develop novel experiments for evaluating the perceptual coherence of color embeddings in DNNs, and we assess how well these algorithms predict human color similarity judgments collected via an online survey. We find that state-of-the-art DNN architectures - including convolutional neural networks and vision transformers - provide color similarity judgments that strikingly diverge from human color judgments of (i) images with controlled color properties, (ii) images generated from online searches, and (iii) real-world images from the canonical CIFAR-10 dataset. We compare DNN performance against an interpretable and cognitively plausible model of color perception based on wavelet decomposition, inspired by foundational theories in computational neuroscience. While one deep learning model - a convolutional DNN trained on a style transfer task - captures some aspects of human color perception, our wavelet algorithm provides more coherent color embeddings that better predict human color judgments compared to all DNNs we examine. These results hold when altering the high-level visual task used to train similar DNN architectures (e.g., image classification versus image segmentation), as well as when examining the color embeddings of different layers in a given DNN architecture. These findings break new ground in the effort to analyze the perceptual representations of machine learning algorithms and to improve their ability to serve as cognitively plausible models of human vision. Implications for machine learning, human perception, and embodied cognition are discussed.
Identifiants
pubmed: 37716312
pii: S0010-0277(23)00255-X
doi: 10.1016/j.cognition.2023.105621
pii:
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
105621Informations de copyright
Copyright © 2023 The Authors. Published by Elsevier B.V. All rights reserved.