Face detection based on a human attention guided multi-scale model.
Attention-guided model
Facial visual attention
Multiscale face detection
Multiscale face model
Journal
Biological cybernetics
ISSN: 1432-0770
Titre abrégé: Biol Cybern
Pays: Germany
ID NLM: 7502533
Informations de publication
Date de publication:
01 Dec 2023
01 Dec 2023
Historique:
received:
05
12
2022
accepted:
02
11
2023
medline:
1
12
2023
pubmed:
1
12
2023
entrez:
1
12
2023
Statut:
aheadofprint
Résumé
Multiscale models are among the cutting-edge technologies used for face detection and recognition. An example is Deformable part-based models (DPMs), which encode a face as a multiplicity of local areas (parts) at different resolution scales and their hierarchical and spatial relationship. Although these models have proven successful and incredibly efficient in practical applications, the mutual position and spatial resolution of the parts involved are arbitrarily defined by a human specialist and the final choice of the optimal scales and parts is based on heuristics. This work seeks to understand whether a multi-scale model can take inspiration from human fixations to select specific areas and spatial scales. In more detail, it shows that a multi-scale pyramid representation can be adopted to extract interesting points, and that human attention can be used to select the points at the scales that lead to the best face detection performance. Human fixations can therefore provide a valid methodological basis on which to build a multiscale model, by selecting the spatial scales and areas of interest that are most relevant to humans.
Identifiants
pubmed: 38038793
doi: 10.1007/s00422-023-00978-5
pii: 10.1007/s00422-023-00978-5
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Informations de copyright
© 2023. The Author(s).
Références
Zafeiriou S, Zhang C, Zhang Z (2015) A survey on face detection in the wild: past, present and future. Comput Vis Image Underst 138:1–24
doi: 10.1016/j.cviu.2015.03.015
Craw I, Ellis H, Lishman JR (1987) Automatic extraction of face-features. Pattern Recogn Lett 5(2):183–187
doi: 10.1016/0167-8655(87)90039-0
Belhumeur PN, Hespanha JP, Kriegman DJ (1997) Eigenfaces vs. fisherfaces: recognition using class specific linear projection. IEEE Trans Pattern Anal Mach Intell 19(7):711–720
doi: 10.1109/34.598228
Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vis 57(2):137–154
doi: 10.1023/B:VISI.0000013087.49260.fb
Li J, Wang T, Zhang Y (2011) Face detection using surf cascade. In: 2011 IEEE international conference on computer vision workshops (ICCV workshops), pp 2183–2190. https://doi.org/10.1109/ICCVW.2011.6130518
Yang B, Yan J, Lei Z, Li SZ (2014) Aggregate channel features for multi-view face detection. In: IEEE international joint conference on biometrics. IEEE, pp 1–8
Zhang Z, Luo P, Loy CC, Tang X (2014) Facial landmark detection by deep multi-task learning. In: European conference on computer vision. Springer, pp 94–108
Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90. https://doi.org/10.1145/3065386
doi: 10.1145/3065386
Yang W, Jiachun Z (2018) Real-time face detection based on yolo. In: 2018 1st IEEE international conference on knowledge innovation and invention (ICKII). IEEE, pp 221–224
Garg D, Goel P, Pandya S, Ganatra A, Kotecha K (2018) A deep learning approach for face detection using yolo. In: 2018 IEEE Punecon. IEEE, pp 1–4
Zhang K, Zhang Z, Li Z, Qiao Y (2016) Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process Lett 23(10):1499–1503
doi: 10.1109/LSP.2016.2603342
Yu J, Jiang Y, Wang Z, Cao Z, Huang T (2016) Unitbox: an advanced object detection network. In: Proceedings of the 24th ACM international conference on multimedia, pp 516–520
Deng J, Guo J, Ververas E, Kotsia I, Zafeiriou S (2020) Retinaface: Single-shot multi-level face localisation in the wild. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5203–5212
Zhang K, Zhang Z, Li Z, Qiao Y (2016) Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process Lett 23(10):1499–1503. https://doi.org/10.1109/LSP.2016.2603342
doi: 10.1109/LSP.2016.2603342
Felzenszwalb P, McAllester D, Ramanan D (2008) A discriminatively trained, multiscale, deformable part model. In: 2008 IEEE conference on computer vision and pattern recognition. IEEE, pp 1–8
Lin T, Dollár P, Girshick RB, He K, Hariharan B, Belongie SJ (2016) Feature pyramid networks for object detection. CoRR arXiv:1612.03144 [cs.CV]
Ranjan R, Patel VM, Chellappa R (2015) A deep pyramid deformable part model for face detection. In: 2015 IEEE 7th international conference on biometrics theory, applications and systems (BTAS). IEEE, pp 1–8
Zhu X, Ramanan D (2012) Face detection, pose estimation, and landmark localization in the wild. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, pp 2879–2886
Mathias M, Benenson R, Pedersoli M, Gool LV (2014) Face detection without bells and whistles. In: European conference on computer vision. Springer, pp 720–735
O’Toole AJ, Castillo CD, Parde CJ, Hill MQ, Chellappa R (2018) Face space representations in deep convolutional neural networks. Trends Cogn Sci 22(9):794–809
doi: 10.1016/j.tics.2018.06.006
pubmed: 30097304
Han Y, Roig G, Geiger G, Poggio T (2020) Scale and translation-invariance for novel objects in human vision. Scie Rep. https://doi.org/10.1038/s41598-019-57261-6
doi: 10.1038/s41598-019-57261-6
Cadoni M, Lagorio A, Khellat Kihel S, Grosso E (2021) On the correlation between human fixations, handcrafted and CNN features. Neural Comput Appl. https://doi.org/10.1007/s00521-021-05863-5
doi: 10.1007/s00521-021-05863-5
Cadoni MI, Lagorio A, Grosso E, Huei TJ, Seng CC (2021) From early biological models to CNNs: do they look where humans look? In: 2020 25th international conference on pattern recognition (ICPR), pp 6313–6320. https://doi.org/10.1109/ICPR48806.2021.9412717
Baek S, Song M, Jang J, Kim G, Paik S-B (2021) Face detection in untrained deep neural networks. Nat Commun 12(1):7328
doi: 10.1038/s41467-021-27606-9
pubmed: 34916514
pmcid: 8677765
Qarooni R, Prunty J, Bindemann M, Jenkins R (2022) Capacity limits in face detection. Cognition 228:105227. https://doi.org/10.1016/j.cognition.2022.105227
doi: 10.1016/j.cognition.2022.105227
pubmed: 35872362
’t Hart BM, Abresch TGJ, Einhaüser W (2011) Faces in places: humans and machines make similar face detection errors. PLoS ONE 6(10):1–7. https://doi.org/10.1371/journal.pone.0025373
doi: 10.1371/journal.pone.0025373
Lindeberg T (2013) Image matching using generalized scale-space interest points. In: Scale space and variational methods in computer vision. Springer, Berlin, pp 355–367
Peterson MF, Eckstein MP (2013) Individual differences in eye movements during face identification reflect observer-specific optimal points of fixation. Psychol Sci 24(7):1216–1225
doi: 10.1177/0956797612471684
pubmed: 23740552
Godwin HJ, Reichle ED, Menneer T (2014) Coarse-to-fine eye movement behavior during visual search. Psychon Bull Rev 21:1244–1249
doi: 10.3758/s13423-014-0613-6
pubmed: 24696389
Lundqvist D, Flykt A, Öhman A (1998) Karolinska directed emotional faces (KDEF). Database records
Huang GB, Mattar M, Berg T, Learned-Miller E (2008) Labeled faces in the wild: A database forstudying face recognition in unconstrained environments. In: Workshop on faces in ‘Real-Life’ images: detection, alignment, and recognition
Huang GB, Mattar M, Berg T, Learned-Miller E (2008) Labeled faces in the wild: a database forstudying face recognition in unconstrained environments. In: Workshop on faces in ‘Real-Life’ images: detection, alignment, and recognition
Veit A, Matera T, Neumann L, Matas J, Belongie S (2016) Coco-text: dataset and benchmark for text detection and recognition in natural images. arXiv preprint arXiv:1601.07140
Cadoni M, Nixon S, Lagorio A, Fadda M (2022) Exploring attention on faces: similarities between humans and transformers. In: 2022 18th IEEE international conference on advanced video and signal based surveillance (AVSS), pp 1–8
AB TT (2010) White paper—Tobii eye tracking: An introduction to eye tracking and Tobii eye trackers. White Paper Tobii, 1–12
Goeleven E, Raedt RD, Leyman L, Verschuere B (2008) The Karolinska directed emotional faces: a validation study. Cogn Emotion 22(6):1094–1118
Schütt HH, Rothkegel LO, Trukenbrod HA, Engbert R, Wichmann FA (2019) Disentangling bottom-up versus top-down and low-level versus high-level influences on eye movements over time. J Vision 19(3):1–1
Jain V, Learned-Miller E (2010) Fddb: a benchmark for face detection in unconstrained settings. Technical Report UM-CS-2010-009, University of Massachusetts, Amherst
Yang S, Luo P, Loy CC, Tang X (2016) Wider face: a face detection benchmark. In: IEEE conference on computer vision and pattern recognition (CVPR)