On the Fusion of RGB and Depth Information for Hand Pose Estimation

Evangelos Kazakos; Christophoros Nikou; Ioannis A. Kakadiaris

doi:10.1109/ICIP.2018.8451022

On the Fusion of RGB and Depth Information for Hand Pose Estimation

Evangelos Kazakos, Christophoros Nikou, Ioannis A. Kakadiaris

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

8 Scopus citations

Abstract

Recent advances in deep learning have spurred 3D hand pose estimation, as convolutional network (ConvNet) based methods outperformed random forests. However, in the state of the art, ConvNet based methods employ only depth images of the hand without leveraging color and texture information from the RGB domain. In this paper, we investigate whether ConvNets can learn more rich and discriminative em-beddings, by combining RGB and depth information. To answer this question, we propose the fusion of RGB and depth information in a double-stream architecture. More specifically, RGB and depth images are fed into two separate networks by extracting features, which are subsequently fused at an intermediate layer of the ConvNet, implementing input-level fusion, feature-level fusion and score-level fusion. The double-stream scheme is coupled with a deep ConvNet, contrary to the shallow networks that are mostly proposed in the literature. Experimental results show that while the depth of the network is crucial for hand pose estimation, the double-stream nets perform very similarly with the net trained only with depth images. This may suggest that training double-stream architectures purely with supervision may be insufficient for hand pose estimation with RGB-D fusion.

Original language	English (US)
Title of host publication	2018 IEEE International Conference on Image Processing, ICIP 2018 - Proceedings
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	868-872
Number of pages	5
ISBN (Electronic)	9781479970612
DOIs	https://doi.org/10.1109/ICIP.2018.8451022
State	Published - Aug 29 2018
Event	25th IEEE International Conference on Image Processing, ICIP 2018 - Athens, Greece Duration: Oct 7 2018 → Oct 10 2018

Publication series

Name	Proceedings - International Conference on Image Processing, ICIP
ISSN (Print)	1522-4880

Conference

Conference	25th IEEE International Conference on Image Processing, ICIP 2018
Country/Territory	Greece
City	Athens
Period	10/7/18 → 10/10/18

Keywords

Deep learning
Double-stream networks
Fusion
Hand pose estimation
Rgb-d

ASJC Scopus subject areas

Software
Computer Vision and Pattern Recognition
Signal Processing

Access to Document

10.1109/ICIP.2018.8451022

Cite this

Kazakos, E., Nikou, C., & Kakadiaris, I. A. (2018). On the Fusion of RGB and Depth Information for Hand Pose Estimation. In 2018 IEEE International Conference on Image Processing, ICIP 2018 - Proceedings (pp. 868-872). Article 8451022 (Proceedings - International Conference on Image Processing, ICIP). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICIP.2018.8451022

On the Fusion of RGB and Depth Information for Hand Pose Estimation. / Kazakos, Evangelos; Nikou, Christophoros; Kakadiaris, Ioannis A.
2018 IEEE International Conference on Image Processing, ICIP 2018 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2018. p. 868-872 8451022 (Proceedings - International Conference on Image Processing, ICIP).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Kazakos, E, Nikou, C & Kakadiaris, IA 2018, On the Fusion of RGB and Depth Information for Hand Pose Estimation. in 2018 IEEE International Conference on Image Processing, ICIP 2018 - Proceedings., 8451022, Proceedings - International Conference on Image Processing, ICIP, Institute of Electrical and Electronics Engineers Inc., pp. 868-872, 25th IEEE International Conference on Image Processing, ICIP 2018, Athens, Greece, 10/7/18. https://doi.org/10.1109/ICIP.2018.8451022

@inproceedings{f31da842954f48bb8d8fbd93bf50feb0,

title = "On the Fusion of RGB and Depth Information for Hand Pose Estimation",

abstract = "Recent advances in deep learning have spurred 3D hand pose estimation, as convolutional network (ConvNet) based methods outperformed random forests. However, in the state of the art, ConvNet based methods employ only depth images of the hand without leveraging color and texture information from the RGB domain. In this paper, we investigate whether ConvNets can learn more rich and discriminative em-beddings, by combining RGB and depth information. To answer this question, we propose the fusion of RGB and depth information in a double-stream architecture. More specifically, RGB and depth images are fed into two separate networks by extracting features, which are subsequently fused at an intermediate layer of the ConvNet, implementing input-level fusion, feature-level fusion and score-level fusion. The double-stream scheme is coupled with a deep ConvNet, contrary to the shallow networks that are mostly proposed in the literature. Experimental results show that while the depth of the network is crucial for hand pose estimation, the double-stream nets perform very similarly with the net trained only with depth images. This may suggest that training double-stream architectures purely with supervision may be insufficient for hand pose estimation with RGB-D fusion.",

keywords = "Deep learning, Double-stream networks, Fusion, Hand pose estimation, Rgb-d",

author = "Evangelos Kazakos and Christophoros Nikou and Kakadiaris, {Ioannis A.}",

note = "Publisher Copyright: {\textcopyright} 2018 IEEE.; 25th IEEE International Conference on Image Processing, ICIP 2018 ; Conference date: 07-10-2018 Through 10-10-2018",

year = "2018",

month = aug,

day = "29",

doi = "10.1109/ICIP.2018.8451022",

language = "English (US)",

series = "Proceedings - International Conference on Image Processing, ICIP",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "868--872",

booktitle = "2018 IEEE International Conference on Image Processing, ICIP 2018 - Proceedings",

address = "United States",

}

TY - GEN

T1 - On the Fusion of RGB and Depth Information for Hand Pose Estimation

AU - Kazakos, Evangelos

AU - Nikou, Christophoros

AU - Kakadiaris, Ioannis A.

PY - 2018/8/29

Y1 - 2018/8/29

N2 - Recent advances in deep learning have spurred 3D hand pose estimation, as convolutional network (ConvNet) based methods outperformed random forests. However, in the state of the art, ConvNet based methods employ only depth images of the hand without leveraging color and texture information from the RGB domain. In this paper, we investigate whether ConvNets can learn more rich and discriminative em-beddings, by combining RGB and depth information. To answer this question, we propose the fusion of RGB and depth information in a double-stream architecture. More specifically, RGB and depth images are fed into two separate networks by extracting features, which are subsequently fused at an intermediate layer of the ConvNet, implementing input-level fusion, feature-level fusion and score-level fusion. The double-stream scheme is coupled with a deep ConvNet, contrary to the shallow networks that are mostly proposed in the literature. Experimental results show that while the depth of the network is crucial for hand pose estimation, the double-stream nets perform very similarly with the net trained only with depth images. This may suggest that training double-stream architectures purely with supervision may be insufficient for hand pose estimation with RGB-D fusion.

AB - Recent advances in deep learning have spurred 3D hand pose estimation, as convolutional network (ConvNet) based methods outperformed random forests. However, in the state of the art, ConvNet based methods employ only depth images of the hand without leveraging color and texture information from the RGB domain. In this paper, we investigate whether ConvNets can learn more rich and discriminative em-beddings, by combining RGB and depth information. To answer this question, we propose the fusion of RGB and depth information in a double-stream architecture. More specifically, RGB and depth images are fed into two separate networks by extracting features, which are subsequently fused at an intermediate layer of the ConvNet, implementing input-level fusion, feature-level fusion and score-level fusion. The double-stream scheme is coupled with a deep ConvNet, contrary to the shallow networks that are mostly proposed in the literature. Experimental results show that while the depth of the network is crucial for hand pose estimation, the double-stream nets perform very similarly with the net trained only with depth images. This may suggest that training double-stream architectures purely with supervision may be insufficient for hand pose estimation with RGB-D fusion.

KW - Deep learning

KW - Double-stream networks

KW - Fusion

KW - Hand pose estimation

KW - Rgb-d

UR - http://www.scopus.com/inward/record.url?scp=85062903677&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85062903677&partnerID=8YFLogxK

U2 - 10.1109/ICIP.2018.8451022

DO - 10.1109/ICIP.2018.8451022

M3 - Conference contribution

AN - SCOPUS:85062903677

T3 - Proceedings - International Conference on Image Processing, ICIP

SP - 868

EP - 872

BT - 2018 IEEE International Conference on Image Processing, ICIP 2018 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 25th IEEE International Conference on Image Processing, ICIP 2018

Y2 - 7 October 2018 through 10 October 2018

ER -

On the Fusion of RGB and Depth Information for Hand Pose Estimation

Abstract

Publication series

Conference

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this