Meaning maps and saliency models based on deep convolutional neural networks are insensitive to image meaning when predicting human fixations.
Deep neural networks
Eye movements
Meaning maps
Natural scenes
Saliency
Journal
Cognition
ISSN: 1873-7838
Titre abrégé: Cognition
Pays: Netherlands
ID NLM: 0367541
Informations de publication
Date de publication:
01 2021
01 2021
Historique:
received:
21
01
2020
revised:
04
09
2020
accepted:
08
09
2020
pubmed:
24
10
2020
medline:
24
6
2021
entrez:
23
10
2020
Statut:
ppublish
Résumé
Eye movements are vital for human vision, and it is therefore important to understand how observers decide where to look. Meaning maps (MMs), a technique to capture the distribution of semantic information across an image, have recently been proposed to support the hypothesis that meaning rather than image features guides human gaze. MMs have the potential to be an important tool far beyond eye-movements research. Here, we examine central assumptions underlying MMs. First, we compared the performance of MMs in predicting fixations to saliency models, showing that DeepGaze II - a deep neural network trained to predict fixations based on high-level features rather than meaning - outperforms MMs. Second, we show that whereas human observers respond to changes in meaning induced by manipulating object-context relationships, MMs and DeepGaze II do not. Together, these findings challenge central assumptions underlying the use of MMs to measure the distribution of meaning in images.
Identifiants
pubmed: 33096374
pii: S0010-0277(20)30284-5
doi: 10.1016/j.cognition.2020.104465
pii:
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
104465Commentaires et corrections
Type : CommentIn
Type : CommentIn
Informations de copyright
Copyright © 2020. Published by Elsevier B.V.