End-to-end 3D face reconstruction with deep neural networks

Pengfei Dou; Shishir K. Shah; Ioannis A. Kakadiaris

doi:10.1109/CVPR.2017.164

End-to-end 3D face reconstruction with deep neural networks

Pengfei Dou, Shishir K. Shah, Ioannis A. Kakadiaris

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

187 Scopus citations

Abstract

Monocular 3D facial shape reconstruction from a single 2D facial image has been an active research area due to its wide applications. Inspired by the success of deep neural networks (DNN), we propose a DNN-based approach for End-to-End 3D FAce Reconstruction (UH-E2FAR) from a single 2D image. Different from recent works that reconstruct and refine the 3D face in an iterative manner using both an RGB image and an initial 3D facial shape rendering, our DNN model is end-to-end, and thus the complicated 3D rendering process can be avoided. Moreover, we integrate in the DNN architecture two components, namely a multi-task loss function and a fusion convolutional neural network (CNN) to improve facial expression reconstruction. With the multi-task loss function, 3D face reconstruction is divided into neutral 3D facial shape reconstruction and expressive 3D facial shape reconstruction. The neutral 3D facial shape is class-specific. Therefore, higher layer features are useful. In comparison, the expressive 3D facial shape favors lower or intermediate layer features. With the fusion-CNN, features from different intermediate layers are fused and transformed for predicting the 3D expressive facial shape. Through extensive experiments, we demonstrate the superiority of our end-to-end framework in improving the accuracy of 3D face reconstruction.

Original language	English (US)
Title of host publication	Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	1503-1512
Number of pages	10
ISBN (Electronic)	9781538604571
DOIs	https://doi.org/10.1109/CVPR.2017.164
State	Published - Nov 6 2017
Event	30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017 - Honolulu, United States Duration: Jul 21 2017 → Jul 26 2017

Publication series

Name	Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017
Volume	2017-January

Conference

Conference	30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017
Country/Territory	United States
City	Honolulu
Period	7/21/17 → 7/26/17

ASJC Scopus subject areas

Software
Computer Vision and Pattern Recognition

Access to Document

10.1109/CVPR.2017.164

Cite this

Dou, P., Shah, S. K., & Kakadiaris, I. A. (2017). End-to-end 3D face reconstruction with deep neural networks. In Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017 (pp. 1503-1512). (Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017; Vol. 2017-January). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/CVPR.2017.164

End-to-end 3D face reconstruction with deep neural networks. / Dou, Pengfei; Shah, Shishir K.; Kakadiaris, Ioannis A.
Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017. Institute of Electrical and Electronics Engineers Inc., 2017. p. 1503-1512 (Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017; Vol. 2017-January).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Dou, P, Shah, SK & Kakadiaris, IA 2017, End-to-end 3D face reconstruction with deep neural networks. in Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017. Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, vol. 2017-January, Institute of Electrical and Electronics Engineers Inc., pp. 1503-1512, 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, United States, 7/21/17. https://doi.org/10.1109/CVPR.2017.164

Dou P, Shah SK, Kakadiaris IA. End-to-end 3D face reconstruction with deep neural networks. In Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017. Institute of Electrical and Electronics Engineers Inc. 2017. p. 1503-1512. (Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017). doi: 10.1109/CVPR.2017.164

Dou, Pengfei ; Shah, Shishir K. ; Kakadiaris, Ioannis A. / End-to-end 3D face reconstruction with deep neural networks. Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017. Institute of Electrical and Electronics Engineers Inc., 2017. pp. 1503-1512 (Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017).

@inproceedings{22274a917bf34b1bb8c729853eb94174,

title = "End-to-end 3D face reconstruction with deep neural networks",

abstract = "Monocular 3D facial shape reconstruction from a single 2D facial image has been an active research area due to its wide applications. Inspired by the success of deep neural networks (DNN), we propose a DNN-based approach for End-to-End 3D FAce Reconstruction (UH-E2FAR) from a single 2D image. Different from recent works that reconstruct and refine the 3D face in an iterative manner using both an RGB image and an initial 3D facial shape rendering, our DNN model is end-to-end, and thus the complicated 3D rendering process can be avoided. Moreover, we integrate in the DNN architecture two components, namely a multi-task loss function and a fusion convolutional neural network (CNN) to improve facial expression reconstruction. With the multi-task loss function, 3D face reconstruction is divided into neutral 3D facial shape reconstruction and expressive 3D facial shape reconstruction. The neutral 3D facial shape is class-specific. Therefore, higher layer features are useful. In comparison, the expressive 3D facial shape favors lower or intermediate layer features. With the fusion-CNN, features from different intermediate layers are fused and transformed for predicting the 3D expressive facial shape. Through extensive experiments, we demonstrate the superiority of our end-to-end framework in improving the accuracy of 3D face reconstruction.",

author = "Pengfei Dou and Shah, {Shishir K.} and Kakadiaris, {Ioannis A.}",

note = "Funding Information: This material is based upon work supported by the U.S. Department of Homeland Security under Grant Award Number 2015-ST-061-BSH001. This grant is awarded to the Borders, Trade, and Immigration (BTI) Institute: A DHS Center of Excellence led by the University of Houston, and includes support for the project {"}Image and Video Person Identification in an Operational Environment: Phase I{"} awarded to the University of Houston. The views and conclusions contained in this document are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied, of the U.S. Department of Homeland Security. Publisher Copyright: {\textcopyright} 2017 IEEE.; 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017 ; Conference date: 21-07-2017 Through 26-07-2017",

year = "2017",

month = nov,

day = "6",

doi = "10.1109/CVPR.2017.164",

language = "English (US)",

series = "Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "1503--1512",

booktitle = "Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017",

address = "United States",

}

TY - GEN

T1 - End-to-end 3D face reconstruction with deep neural networks

AU - Dou, Pengfei

AU - Shah, Shishir K.

AU - Kakadiaris, Ioannis A.

N1 - Funding Information: This material is based upon work supported by the U.S. Department of Homeland Security under Grant Award Number 2015-ST-061-BSH001. This grant is awarded to the Borders, Trade, and Immigration (BTI) Institute: A DHS Center of Excellence led by the University of Houston, and includes support for the project "Image and Video Person Identification in an Operational Environment: Phase I" awarded to the University of Houston. The views and conclusions contained in this document are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied, of the U.S. Department of Homeland Security. Publisher Copyright: © 2017 IEEE.

PY - 2017/11/6

Y1 - 2017/11/6

N2 - Monocular 3D facial shape reconstruction from a single 2D facial image has been an active research area due to its wide applications. Inspired by the success of deep neural networks (DNN), we propose a DNN-based approach for End-to-End 3D FAce Reconstruction (UH-E2FAR) from a single 2D image. Different from recent works that reconstruct and refine the 3D face in an iterative manner using both an RGB image and an initial 3D facial shape rendering, our DNN model is end-to-end, and thus the complicated 3D rendering process can be avoided. Moreover, we integrate in the DNN architecture two components, namely a multi-task loss function and a fusion convolutional neural network (CNN) to improve facial expression reconstruction. With the multi-task loss function, 3D face reconstruction is divided into neutral 3D facial shape reconstruction and expressive 3D facial shape reconstruction. The neutral 3D facial shape is class-specific. Therefore, higher layer features are useful. In comparison, the expressive 3D facial shape favors lower or intermediate layer features. With the fusion-CNN, features from different intermediate layers are fused and transformed for predicting the 3D expressive facial shape. Through extensive experiments, we demonstrate the superiority of our end-to-end framework in improving the accuracy of 3D face reconstruction.

AB - Monocular 3D facial shape reconstruction from a single 2D facial image has been an active research area due to its wide applications. Inspired by the success of deep neural networks (DNN), we propose a DNN-based approach for End-to-End 3D FAce Reconstruction (UH-E2FAR) from a single 2D image. Different from recent works that reconstruct and refine the 3D face in an iterative manner using both an RGB image and an initial 3D facial shape rendering, our DNN model is end-to-end, and thus the complicated 3D rendering process can be avoided. Moreover, we integrate in the DNN architecture two components, namely a multi-task loss function and a fusion convolutional neural network (CNN) to improve facial expression reconstruction. With the multi-task loss function, 3D face reconstruction is divided into neutral 3D facial shape reconstruction and expressive 3D facial shape reconstruction. The neutral 3D facial shape is class-specific. Therefore, higher layer features are useful. In comparison, the expressive 3D facial shape favors lower or intermediate layer features. With the fusion-CNN, features from different intermediate layers are fused and transformed for predicting the 3D expressive facial shape. Through extensive experiments, we demonstrate the superiority of our end-to-end framework in improving the accuracy of 3D face reconstruction.

UR - http://www.scopus.com/inward/record.url?scp=85044314381&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85044314381&partnerID=8YFLogxK

U2 - 10.1109/CVPR.2017.164

DO - 10.1109/CVPR.2017.164

M3 - Conference contribution

AN - SCOPUS:85044314381

T3 - Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017

SP - 1503

EP - 1512

BT - Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017

Y2 - 21 July 2017 through 26 July 2017

ER -

End-to-end 3D face reconstruction with deep neural networks

Abstract

Publication series

Conference

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this