TY - JOUR
T1 - Multi-Layered Deep Learning Features Fusion for Human Action Recognition
AU - Kiran, Sadia
AU - Khan, Muhammad Attique
AU - Javed, Muhammad Younus
AU - Alhaisoni, Majed
AU - Tariq, Usman
AU - Nam, Yunyoung
AU - Damaševǐcius, Robertas
AU - Sharif, Muhammad
N1 - Publisher Copyright:
© 2021 Tech Science Press. All rights reserved.
PY - 2021
Y1 - 2021
N2 - Human Action Recognition (HAR) is an active research topic in machine learning for the last few decades. Visual surveillance, robotics, and pedestrian detection are the main applications for action recognition. Computer vision researchers have introduced many HAR techniques, but they still face challenges such as redundant features and the cost of computing. In this article, we proposed a new method for the use of deep learning for HAR. In the proposed method, video frames are initially pre-processed using a global contrast approach and later used to train a deep learning model using domain transfer learning. The Resnet-50 Pre-TrainedModel is used as a deep learning model in this work. Features are extracted from two layers: Global Average Pool (GAP) and Fully Connected (FC). The features of both layers are fused by the Canonical Correlation Analysis (CCA). Then features are selected using the Shanon Entropy-based threshold function. The selected features are finally passed to multiple classifiers for final classification. Experiments are conducted on five publicly available datasets as IXMAS, UCF Sports, YouTube, UT-Interaction, and KTH. The accuracy of these data sets was 89.6%, 99.7%, 100%, 96.7% and 96.6%, respectively. Comparison with existing techniques has shown that the proposed method provides improved accuracy for HAR. Also, the proposed method is computationally fast based on the time of execution.
AB - Human Action Recognition (HAR) is an active research topic in machine learning for the last few decades. Visual surveillance, robotics, and pedestrian detection are the main applications for action recognition. Computer vision researchers have introduced many HAR techniques, but they still face challenges such as redundant features and the cost of computing. In this article, we proposed a new method for the use of deep learning for HAR. In the proposed method, video frames are initially pre-processed using a global contrast approach and later used to train a deep learning model using domain transfer learning. The Resnet-50 Pre-TrainedModel is used as a deep learning model in this work. Features are extracted from two layers: Global Average Pool (GAP) and Fully Connected (FC). The features of both layers are fused by the Canonical Correlation Analysis (CCA). Then features are selected using the Shanon Entropy-based threshold function. The selected features are finally passed to multiple classifiers for final classification. Experiments are conducted on five publicly available datasets as IXMAS, UCF Sports, YouTube, UT-Interaction, and KTH. The accuracy of these data sets was 89.6%, 99.7%, 100%, 96.7% and 96.6%, respectively. Comparison with existing techniques has shown that the proposed method provides improved accuracy for HAR. Also, the proposed method is computationally fast based on the time of execution.
KW - Action recognition
KW - Classification
KW - Features fusion
KW - Features selection
KW - Transfer learning
UR - http://www.scopus.com/inward/record.url?scp=85115907695&partnerID=8YFLogxK
U2 - 10.32604/cmc.2021.017800
DO - 10.32604/cmc.2021.017800
M3 - Article
AN - SCOPUS:85115907695
SN - 1546-2218
VL - 69
SP - 4061
EP - 4075
JO - Computers, Materials and Continua
JF - Computers, Materials and Continua
IS - 3
ER -