3D face-model reconstruction from a single image: A feature aggregation approach using hierarchical transformer with weak supervision.

Face reconstruction Feature fusion Hierarchical transformer Swin Transformer ViT

Journal

Neural networks : the official journal of the International Neural Network Society
ISSN: 1879-2782
Titre abrégé: Neural Netw
Pays: United States
ID NLM: 8805018

Informations de publication

Date de publication:
Dec 2022
Historique:
received: 04 06 2022
revised: 24 08 2022
accepted: 19 09 2022
pubmed: 19 10 2022
medline: 16 11 2022
entrez: 18 10 2022
Statut: ppublish

Résumé

Convolutional Neural Networks (CNN) have gained popularity as the de-facto model for any computer vision task. However, CNN have drawbacks, i.e. they fail to extract long-range perceptions in images. Due to their ability to capture long-range dependencies, transformer networks are adopted in computer vision applications, where they show state-of-the-art (SOTA) results in popular tasks like image classification, instance segmentation, and object detection. Although they gained ample attention, transformers have not been applied to 3D face reconstruction tasks. In this work, we propose a novel hierarchical transformer model, added to a feature pyramid aggregation structure, to extract the 3D face parameters from a single 2D image. More specifically, we use pre-trained Swin Transformer backbone networks in a hierarchical manner and add the feature fusion module to aggregate the features in multiple stages. We use a semi-supervised training approach and train our model in a supervised way with the 3DMM parameters from a publicly available dataset and unsupervised training with a differential renderer on other parameters like facial keypoints and facial features. We also train our network on a hybrid unsupervised loss and compare the results with other SOTA approaches. When evaluated across two public datasets on face reconstruction and dense 3D face alignment tasks, our method can achieve comparable results to the current SOTA performance and in some instances do better than the SOTA methods. A detailed subjective evaluation also shows that our method performs better than the previous works in realism and occlusion resistance.

Identifiants

pubmed: 36257068
pii: S0893-6080(22)00369-0
doi: 10.1016/j.neunet.2022.09.019
pii:
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

108-122

Informations de copyright

Copyright © 2022 Elsevier Ltd. All rights reserved.

Déclaration de conflit d'intérêts

Declaration of Competing Interest The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: Shubhajit Basak reports financial support was provided by Science Foundation Ireland. Funding details : Science Foundation Ireland Centre for Research Training in Digitally-Enhanced Reality (d-real) under Grant 18/CRT/6224

Auteurs

Shubhajit Basak (S)

School of Computer Science, National University of Ireland Galway, Galway H91 TK33, Ireland. Electronic address: s.basak1@nuigalway.ie.

Peter Corcoran (P)

Department of Electronic Engineering, College of Science and Engineering, National University of Ireland Galway, Galway H91 TK33, Ireland.

Rachel McDonnell (R)

School of Computer Science and Statistics, Trinity College Dublin, Dublin 2, Ireland.

Michael Schukat (M)

School of Computer Science, National University of Ireland Galway, Galway H91 TK33, Ireland.

Articles similaires

Unsupervised learning for real-time and continuous gait phase detection.

Dollaporn Anopas, Yodchanan Wongsawat, Jetsada Arnin
1.00
Humans Gait Neural Networks, Computer Unsupervised Machine Learning Walking
Humans Shoulder Fractures Tomography, X-Ray Computed Neural Networks, Computer Female
Humans Artificial Intelligence Neoplasms Prognosis Image Processing, Computer-Assisted
Humans Deep Learning Mouth Neoplasms Drug Resistance, Neoplasm Cell Line, Tumor

Classifications MeSH