TY - JOUR
T1 - DDaNet
T2 - Dual-Path Depth-Aware Attention Network for Fingerspelling Recognition Using RGB-D Images
AU - Yang, Shih Hung
AU - Chen, Wei Ren
AU - Huang, Wun Jhu
AU - Chen, Yon Ping
N1 - Funding Information:
This work was supported in part by the Ministry of Science and Technology of Taiwan (Young Scholar Fellowship Program) under Contract MOST 108-2636-E-006-010 and Contract 109-2636-E-006-010 and in part by the Headquarters of University Advancement at the National Cheng Kung University, the Ministry of Education, Taiwan.
Publisher Copyright:
© 2013 IEEE.
PY - 2021
Y1 - 2021
N2 - Automatic fingerspelling recognition aims to overcome communication barriers between people who are deaf and those who can hear. RGB-D cameras are widely used to handle finger occlusion, which usually hinders fingerspelling recognition. However, color-depth misalignment, which is an intrinsic property of RGB-D cameras, hinders the simultaneous processing of color and depth images in the absence of intrinsic parameters of the camera. Furthermore, fine-grained hand gestures performed by various persons and captured from multiple views render the discriminative feature extraction difficult, due to intra-class variability and inter-class similarity. Inspired by the human visual mechanism, we propose a network to learn discriminative features related to fine-grained hand gestures while suppressing the effect of color-depth misalignment. Unlike existing approaches that independently process RGB-D images, a dual-path depth-aware attention network that learns a fingerspelling representation in separate RGB and depth paths, and progressively fuses the features learned from the two paths is proposed. As the hand is usually the closest object to the camera, depth information can contribute to emphasize the key fingers related to a letter sign. Thus, we develop a depth-aware attention module (DAM) to exploit spatial relations in the depth feature maps, refining the RGB and depth feature maps across a bottleneck structure. The module establishes a lateral connection of the RGB and depth paths and provides a depth-aware salient map to both paths. The experimental results demonstrated that the proposed network improved the accuracy (+0.83%) and F score (+1.55%) compared to state-of-the-art methods on a publicly available fingerspelling dataset. The visualization of the network processes demonstrates that the DAM facilitates the selection of representative hand regions from the RGB-D images. Furthermore, the number of parameters and computational overhead of the DAM are negligible in the network. The code is available at https://github.com/cweizen/cweizen-DDaNet_model_master.
AB - Automatic fingerspelling recognition aims to overcome communication barriers between people who are deaf and those who can hear. RGB-D cameras are widely used to handle finger occlusion, which usually hinders fingerspelling recognition. However, color-depth misalignment, which is an intrinsic property of RGB-D cameras, hinders the simultaneous processing of color and depth images in the absence of intrinsic parameters of the camera. Furthermore, fine-grained hand gestures performed by various persons and captured from multiple views render the discriminative feature extraction difficult, due to intra-class variability and inter-class similarity. Inspired by the human visual mechanism, we propose a network to learn discriminative features related to fine-grained hand gestures while suppressing the effect of color-depth misalignment. Unlike existing approaches that independently process RGB-D images, a dual-path depth-aware attention network that learns a fingerspelling representation in separate RGB and depth paths, and progressively fuses the features learned from the two paths is proposed. As the hand is usually the closest object to the camera, depth information can contribute to emphasize the key fingers related to a letter sign. Thus, we develop a depth-aware attention module (DAM) to exploit spatial relations in the depth feature maps, refining the RGB and depth feature maps across a bottleneck structure. The module establishes a lateral connection of the RGB and depth paths and provides a depth-aware salient map to both paths. The experimental results demonstrated that the proposed network improved the accuracy (+0.83%) and F score (+1.55%) compared to state-of-the-art methods on a publicly available fingerspelling dataset. The visualization of the network processes demonstrates that the DAM facilitates the selection of representative hand regions from the RGB-D images. Furthermore, the number of parameters and computational overhead of the DAM are negligible in the network. The code is available at https://github.com/cweizen/cweizen-DDaNet_model_master.
UR - http://www.scopus.com/inward/record.url?scp=85098767565&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85098767565&partnerID=8YFLogxK
U2 - 10.1109/ACCESS.2020.3046667
DO - 10.1109/ACCESS.2020.3046667
M3 - Article
AN - SCOPUS:85098767565
SN - 2169-3536
VL - 9
SP - 7306
EP - 7322
JO - IEEE Access
JF - IEEE Access
M1 - 9302573
ER -