Face detection based on a human attention guided multi-scale model.

Attention-guided model Facial visual attention Multiscale face detection Multiscale face model

Journal

Biological cybernetics

ISSN: 1432-0770

Titre abrégé: Biol Cybern

Pays: Germany

ID NLM: 7502533

Informations de publication

Date de publication:
01 Dec 2023

Historique:

received: 05 12 2022

accepted: 02 11 2023

medline: 1 12 2023

pubmed: 1 12 2023

entrez: 1 12 2023

Statut: aheadofprint

Résumé

Multiscale models are among the cutting-edge technologies used for face detection and recognition. An example is Deformable part-based models (DPMs), which encode a face as a multiplicity of local areas (parts) at different resolution scales and their hierarchical and spatial relationship. Although these models have proven successful and incredibly efficient in practical applications, the mutual position and spatial resolution of the parts involved are arbitrarily defined by a human specialist and the final choice of the optimal scales and parts is based on heuristics. This work seeks to understand whether a multi-scale model can take inspiration from human fixations to select specific areas and spatial scales. In more detail, it shows that a multi-scale pyramid representation can be adopted to extract interesting points, and that human attention can be used to select the points at the scales that lead to the best face detection performance. Human fixations can therefore provide a valid methodological basis on which to build a multiscale model, by selecting the spatial scales and areas of interest that are most relevant to humans.

Identifiants

DOI: 10.1007/s00422-023-00978-5 PMID: 38038793

pubmed: 38038793

doi: 10.1007/s00422-023-00978-5

pii: 10.1007/s00422-023-00978-5

doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

Informations de copyright

Références

Zafeiriou S, Zhang C, Zhang Z (2015) A survey on face detection in the wild: past, present and future. Comput Vis Image Underst 138:1–24

doi: 10.1016/j.cviu.2015.03.015

Craw I, Ellis H, Lishman JR (1987) Automatic extraction of face-features. Pattern Recogn Lett 5(2):183–187

doi: 10.1016/0167-8655(87)90039-0

Belhumeur PN, Hespanha JP, Kriegman DJ (1997) Eigenfaces vs. fisherfaces: recognition using class specific linear projection. IEEE Trans Pattern Anal Mach Intell 19(7):711–720

doi: 10.1109/34.598228

Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vis 57(2):137–154

doi: 10.1023/B:VISI.0000013087.49260.fb

Li J, Wang T, Zhang Y (2011) Face detection using surf cascade. In: 2011 IEEE international conference on computer vision workshops (ICCV workshops), pp 2183–2190. https://doi.org/10.1109/ICCVW.2011.6130518

Yang B, Yan J, Lei Z, Li SZ (2014) Aggregate channel features for multi-view face detection. In: IEEE international joint conference on biometrics. IEEE, pp 1–8

Zhang Z, Luo P, Loy CC, Tang X (2014) Facial landmark detection by deep multi-task learning. In: European conference on computer vision. Springer, pp 94–108

Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90. https://doi.org/10.1145/3065386

doi: 10.1145/3065386

Yang W, Jiachun Z (2018) Real-time face detection based on yolo. In: 2018 1st IEEE international conference on knowledge innovation and invention (ICKII). IEEE, pp 221–224

Garg D, Goel P, Pandya S, Ganatra A, Kotecha K (2018) A deep learning approach for face detection using yolo. In: 2018 IEEE Punecon. IEEE, pp 1–4

Zhang K, Zhang Z, Li Z, Qiao Y (2016) Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process Lett 23(10):1499–1503

doi: 10.1109/LSP.2016.2603342

Yu J, Jiang Y, Wang Z, Cao Z, Huang T (2016) Unitbox: an advanced object detection network. In: Proceedings of the 24th ACM international conference on multimedia, pp 516–520

Deng J, Guo J, Ververas E, Kotsia I, Zafeiriou S (2020) Retinaface: Single-shot multi-level face localisation in the wild. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5203–5212

Zhang K, Zhang Z, Li Z, Qiao Y (2016) Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process Lett 23(10):1499–1503. https://doi.org/10.1109/LSP.2016.2603342

doi: 10.1109/LSP.2016.2603342

Felzenszwalb P, McAllester D, Ramanan D (2008) A discriminatively trained, multiscale, deformable part model. In: 2008 IEEE conference on computer vision and pattern recognition. IEEE, pp 1–8

Lin T, Dollár P, Girshick RB, He K, Hariharan B, Belongie SJ (2016) Feature pyramid networks for object detection. CoRR arXiv:1612.03144 [cs.CV]

Ranjan R, Patel VM, Chellappa R (2015) A deep pyramid deformable part model for face detection. In: 2015 IEEE 7th international conference on biometrics theory, applications and systems (BTAS). IEEE, pp 1–8

Zhu X, Ramanan D (2012) Face detection, pose estimation, and landmark localization in the wild. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, pp 2879–2886

Mathias M, Benenson R, Pedersoli M, Gool LV (2014) Face detection without bells and whistles. In: European conference on computer vision. Springer, pp 720–735

O’Toole AJ, Castillo CD, Parde CJ, Hill MQ, Chellappa R (2018) Face space representations in deep convolutional neural networks. Trends Cogn Sci 22(9):794–809

doi: 10.1016/j.tics.2018.06.006 pubmed: 30097304

Han Y, Roig G, Geiger G, Poggio T (2020) Scale and translation-invariance for novel objects in human vision. Scie Rep. https://doi.org/10.1038/s41598-019-57261-6

doi: 10.1038/s41598-019-57261-6

Cadoni M, Lagorio A, Khellat Kihel S, Grosso E (2021) On the correlation between human fixations, handcrafted and CNN features. Neural Comput Appl. https://doi.org/10.1007/s00521-021-05863-5

doi: 10.1007/s00521-021-05863-5

Cadoni MI, Lagorio A, Grosso E, Huei TJ, Seng CC (2021) From early biological models to CNNs: do they look where humans look? In: 2020 25th international conference on pattern recognition (ICPR), pp 6313–6320. https://doi.org/10.1109/ICPR48806.2021.9412717

Baek S, Song M, Jang J, Kim G, Paik S-B (2021) Face detection in untrained deep neural networks. Nat Commun 12(1):7328

doi: 10.1038/s41467-021-27606-9 pubmed: 34916514 pmcid: 8677765

Qarooni R, Prunty J, Bindemann M, Jenkins R (2022) Capacity limits in face detection. Cognition 228:105227. https://doi.org/10.1016/j.cognition.2022.105227

doi: 10.1016/j.cognition.2022.105227 pubmed: 35872362

’t Hart BM, Abresch TGJ, Einhaüser W (2011) Faces in places: humans and machines make similar face detection errors. PLoS ONE 6(10):1–7. https://doi.org/10.1371/journal.pone.0025373

doi: 10.1371/journal.pone.0025373

Lindeberg T (2013) Image matching using generalized scale-space interest points. In: Scale space and variational methods in computer vision. Springer, Berlin, pp 355–367

Peterson MF, Eckstein MP (2013) Individual differences in eye movements during face identification reflect observer-specific optimal points of fixation. Psychol Sci 24(7):1216–1225

doi: 10.1177/0956797612471684 pubmed: 23740552

Godwin HJ, Reichle ED, Menneer T (2014) Coarse-to-fine eye movement behavior during visual search. Psychon Bull Rev 21:1244–1249

doi: 10.3758/s13423-014-0613-6 pubmed: 24696389

Lundqvist D, Flykt A, Öhman A (1998) Karolinska directed emotional faces (KDEF). Database records

Huang GB, Mattar M, Berg T, Learned-Miller E (2008) Labeled faces in the wild: A database forstudying face recognition in unconstrained environments. In: Workshop on faces in ‘Real-Life’ images: detection, alignment, and recognition

Huang GB, Mattar M, Berg T, Learned-Miller E (2008) Labeled faces in the wild: a database forstudying face recognition in unconstrained environments. In: Workshop on faces in ‘Real-Life’ images: detection, alignment, and recognition

Veit A, Matera T, Neumann L, Matas J, Belongie S (2016) Coco-text: dataset and benchmark for text detection and recognition in natural images. arXiv preprint arXiv:1601.07140

Cadoni M, Nixon S, Lagorio A, Fadda M (2022) Exploring attention on faces: similarities between humans and transformers. In: 2022 18th IEEE international conference on advanced video and signal based surveillance (AVSS), pp 1–8

AB TT (2010) White paper—Tobii eye tracking: An introduction to eye tracking and Tobii eye trackers. White Paper Tobii, 1–12

Goeleven E, Raedt RD, Leyman L, Verschuere B (2008) The Karolinska directed emotional faces: a validation study. Cogn Emotion 22(6):1094–1118

Schütt HH, Rothkegel LO, Trukenbrod HA, Engbert R, Wichmann FA (2019) Disentangling bottom-up versus top-down and low-level versus high-level influences on eye movements over time. J Vision 19(3):1–1

Jain V, Learned-Miller E (2010) Fddb: a benchmark for face detection in unconstrained settings. Technical Report UM-CS-2010-009, University of Massachusetts, Amherst

Yang S, Luo P, Loy CC, Tang X (2016) Wider face: a face detection benchmark. In: IEEE conference on computer vision and pattern recognition (CVPR)

Face detection based on a human attention guided multi-scale model.

Journal

Informations de publication

Résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Informations de copyright

Références

Auteurs

Marinella Cadoni (M)

Andrea Lagorio (A)

Enrico Grosso (E)

Classifications MeSH