Dynamic hand gesture recognition using 3D-CNN and LSTM networks

Muneeb Ur Rehman; Fawad Ahmed; Muhammad Attique Khan; Usman Tariq; Faisal Abdulaziz Alfouzan; Nouf M. Alzahrani; Jawad Ahmad

doi:10.32604/cmc.2022.019586

Dynamic hand gesture recognition using 3D-CNN and LSTM networks

Muneeb Ur Rehman, Fawad Ahmed, Muhammad Attique Khan, Usman Tariq, Faisal Abdulaziz Alfouzan, Nouf M. Alzahrani, Jawad Ahmad

Management Information Systems

Research output: Contribution to journal › Article › peer-review

49 Scopus citations

Abstract

Recognition of dynamic hand gestures in real-time is a difficult task because the system can never know when or from where the gesture starts and ends in a video stream. Many researchers have been working on vision-based gesture recognition due to its various applications. This paper proposes a deep learning architecture based on the combination of a 3D Convolutional Neural Network (3D-CNN) and a Long Short-Term Memory (LSTM) network. The proposed architecture extracts spatial-temporal information from video sequences input while avoiding extensive computation. The 3D-CNN is used for the extraction of spectral and spatial features which are then given to the LSTM network through which classification is carried out. The proposed model is a light-weight architecture with only 3.7 million training parameters. The model has been evaluated on 15 classes from the 20BN-jester dataset available publicly. The model was trained on 2000 video-clips per class which were separated into 80% training and 20% validation sets. An accuracy of 99% and 97% was achieved on training and testing data, respectively. We further show that the combination of 3D-CNN with LSTM gives superior results as compared to MobileNetv2 + LSTM.

Original language	English
Pages (from-to)	4675-4690
Number of pages	16
Journal	Computers, Materials and Continua
Volume	70
Issue number	3
DOIs	https://doi.org/10.32604/cmc.2022.019586
State	Published - 2022

Keywords

3D-CNN
Convolutional neural networks
Jester
LSTM
Real-time hand gesture recognition
Spatiotemporal

Access to Document

10.32604/cmc.2022.019586

Cite this

@article{289bf578c09240f0a82af1f7b0829d5d,

title = "Dynamic hand gesture recognition using 3D-CNN and LSTM networks",

abstract = "Recognition of dynamic hand gestures in real-time is a difficult task because the system can never know when or from where the gesture starts and ends in a video stream. Many researchers have been working on vision-based gesture recognition due to its various applications. This paper proposes a deep learning architecture based on the combination of a 3D Convolutional Neural Network (3D-CNN) and a Long Short-Term Memory (LSTM) network. The proposed architecture extracts spatial-temporal information from video sequences input while avoiding extensive computation. The 3D-CNN is used for the extraction of spectral and spatial features which are then given to the LSTM network through which classification is carried out. The proposed model is a light-weight architecture with only 3.7 million training parameters. The model has been evaluated on 15 classes from the 20BN-jester dataset available publicly. The model was trained on 2000 video-clips per class which were separated into 80\% training and 20\% validation sets. An accuracy of 99\% and 97\% was achieved on training and testing data, respectively. We further show that the combination of 3D-CNN with LSTM gives superior results as compared to MobileNetv2 + LSTM.",

keywords = "3D-CNN, Convolutional neural networks, Jester, LSTM, Real-time hand gesture recognition, Spatiotemporal",

author = "\{Ur Rehman\}, Muneeb and Fawad Ahmed and Khan, \{Muhammad Attique\} and Usman Tariq and Alfouzan, \{Faisal Abdulaziz\} and Alzahrani, \{Nouf M.\} and Jawad Ahmad",

year = "2022",

doi = "10.32604/cmc.2022.019586",

language = "English",

volume = "70",

pages = "4675--4690",

journal = "Computers, Materials and Continua",

issn = "1546-2218",

publisher = "Tech Science Press",

number = "3",

}

TY - JOUR

T1 - Dynamic hand gesture recognition using 3D-CNN and LSTM networks

AU - Ur Rehman, Muneeb

AU - Ahmed, Fawad

AU - Khan, Muhammad Attique

AU - Tariq, Usman

AU - Alfouzan, Faisal Abdulaziz

AU - Alzahrani, Nouf M.

AU - Ahmad, Jawad

PY - 2022

Y1 - 2022

N2 - Recognition of dynamic hand gestures in real-time is a difficult task because the system can never know when or from where the gesture starts and ends in a video stream. Many researchers have been working on vision-based gesture recognition due to its various applications. This paper proposes a deep learning architecture based on the combination of a 3D Convolutional Neural Network (3D-CNN) and a Long Short-Term Memory (LSTM) network. The proposed architecture extracts spatial-temporal information from video sequences input while avoiding extensive computation. The 3D-CNN is used for the extraction of spectral and spatial features which are then given to the LSTM network through which classification is carried out. The proposed model is a light-weight architecture with only 3.7 million training parameters. The model has been evaluated on 15 classes from the 20BN-jester dataset available publicly. The model was trained on 2000 video-clips per class which were separated into 80% training and 20% validation sets. An accuracy of 99% and 97% was achieved on training and testing data, respectively. We further show that the combination of 3D-CNN with LSTM gives superior results as compared to MobileNetv2 + LSTM.

AB - Recognition of dynamic hand gestures in real-time is a difficult task because the system can never know when or from where the gesture starts and ends in a video stream. Many researchers have been working on vision-based gesture recognition due to its various applications. This paper proposes a deep learning architecture based on the combination of a 3D Convolutional Neural Network (3D-CNN) and a Long Short-Term Memory (LSTM) network. The proposed architecture extracts spatial-temporal information from video sequences input while avoiding extensive computation. The 3D-CNN is used for the extraction of spectral and spatial features which are then given to the LSTM network through which classification is carried out. The proposed model is a light-weight architecture with only 3.7 million training parameters. The model has been evaluated on 15 classes from the 20BN-jester dataset available publicly. The model was trained on 2000 video-clips per class which were separated into 80% training and 20% validation sets. An accuracy of 99% and 97% was achieved on training and testing data, respectively. We further show that the combination of 3D-CNN with LSTM gives superior results as compared to MobileNetv2 + LSTM.

KW - 3D-CNN

KW - Convolutional neural networks

KW - Jester

KW - LSTM

KW - Real-time hand gesture recognition

KW - Spatiotemporal

UR - http://www.scopus.com/inward/record.url?scp=85116975570&partnerID=8YFLogxK

U2 - 10.32604/cmc.2022.019586

DO - 10.32604/cmc.2022.019586

M3 - Article

AN - SCOPUS:85116975570

SN - 1546-2218

VL - 70

SP - 4675

EP - 4690

JO - Computers, Materials and Continua

JF - Computers, Materials and Continua

IS - 3

ER -

Dynamic hand gesture recognition using 3D-CNN and LSTM networks

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this