Dynamic hand gesture recognition using 3D-CNN and LSTM networks

Muneeb Ur Rehman, Fawad Ahmed, Muhammad Attique Khan, Usman Tariq, Faisal Abdulaziz Alfouzan, Nouf M. Alzahrani, Jawad Ahmad

Research output: Contribution to journalArticlepeer-review

49 Scopus citations

Abstract

Recognition of dynamic hand gestures in real-time is a difficult task because the system can never know when or from where the gesture starts and ends in a video stream. Many researchers have been working on vision-based gesture recognition due to its various applications. This paper proposes a deep learning architecture based on the combination of a 3D Convolutional Neural Network (3D-CNN) and a Long Short-Term Memory (LSTM) network. The proposed architecture extracts spatial-temporal information from video sequences input while avoiding extensive computation. The 3D-CNN is used for the extraction of spectral and spatial features which are then given to the LSTM network through which classification is carried out. The proposed model is a light-weight architecture with only 3.7 million training parameters. The model has been evaluated on 15 classes from the 20BN-jester dataset available publicly. The model was trained on 2000 video-clips per class which were separated into 80% training and 20% validation sets. An accuracy of 99% and 97% was achieved on training and testing data, respectively. We further show that the combination of 3D-CNN with LSTM gives superior results as compared to MobileNetv2 + LSTM.

Original languageEnglish
Pages (from-to)4675-4690
Number of pages16
JournalComputers, Materials and Continua
Volume70
Issue number3
DOIs
StatePublished - 2022

Keywords

  • 3D-CNN
  • Convolutional neural networks
  • Jester
  • LSTM
  • Real-time hand gesture recognition
  • Spatiotemporal

Fingerprint

Dive into the research topics of 'Dynamic hand gesture recognition using 3D-CNN and LSTM networks'. Together they form a unique fingerprint.

Cite this