Two-Stream Deep Learning Architecture-Based Human Action Recognition

Faheem Shehzad; Muhammad Attique Khan; Muhammad Asfand E. Yar; Muhammad Sharif; Majed Alhaisoni; Usman Tariq; Arnab Majumdar; Orawit Thinnukool

doi:10.32604/cmc.2023.028743

Two-Stream Deep Learning Architecture-Based Human Action Recognition

Faheem Shehzad
, Muhammad Attique Khan
, Muhammad Asfand E. Yar
, Muhammad Sharif
, Majed Alhaisoni
, Usman Tariq
, Arnab Majumdar
, Orawit Thinnukool

Management Information Systems

Research output: Contribution to journal › Article › peer-review

8 Scopus citations

Abstract

Human action recognition (HAR) based on Artificial intelligence reasoning is the most important research area in computer vision. Big breakthroughs in this field have been observed in the last few years; additionally, the interest in research in this field is evolving, such as understanding of actions and scenes, studying human joints, and human posture recognition. Many HAR techniques are introduced in the literature. Nonetheless, the challenge of redundant and irrelevant features reduces recognition accuracy. They also faced a few other challenges, such as differing perspectives, environmental conditions, and temporal variations, among others. In this work, a deep learning and improved whale optimization algorithm based framework is proposed for HAR. The proposed framework consists of a few core stages i.e., frames initial preprocessing, fine-tuned pre-trained deep learning models through transfer learning (TL), features fusion using modified serial based approach, and improved whale optimization based best features selection for final classification. Two pre-trained deep learning models such as InceptionV3 and Resnet101 are fine-tuned and TL is employed to train on action recognition datasets. The fusion process increases the length of feature vectors; therefore, improved whale optimization algorithm is proposed and selects the best features. The best selected features are finally classified usingmachine learning (ML) classifiers. Four publicly accessible datasets such as Ut-interaction, Hollywood, Free Viewpoint Action Recognition usingMotion History Volumes (IXMAS), and centre of computer vision (UCF) Sports, are employed and achieved the testing accuracy of 100%, 99.9%, 99.1%, and 100% respectively. Comparison with state of the art techniques (SOTA), the proposed method showed the improved accuracy.

Original language	English
Pages (from-to)	5931-5949
Number of pages	19
Journal	Computers, Materials and Continua
Volume	74
Issue number	3
DOIs	https://doi.org/10.32604/cmc.2023.028743
State	Published - 2023

Keywords

Human action recognition
deep learning
features optimization
fusion of multiple features
transfer learning

Access to Document

10.32604/cmc.2023.028743

Cite this

@article{f039a8d27412414aa2693f8191510caa,

title = "Two-Stream Deep Learning Architecture-Based Human Action Recognition",

abstract = "Human action recognition (HAR) based on Artificial intelligence reasoning is the most important research area in computer vision. Big breakthroughs in this field have been observed in the last few years; additionally, the interest in research in this field is evolving, such as understanding of actions and scenes, studying human joints, and human posture recognition. Many HAR techniques are introduced in the literature. Nonetheless, the challenge of redundant and irrelevant features reduces recognition accuracy. They also faced a few other challenges, such as differing perspectives, environmental conditions, and temporal variations, among others. In this work, a deep learning and improved whale optimization algorithm based framework is proposed for HAR. The proposed framework consists of a few core stages i.e., frames initial preprocessing, fine-tuned pre-trained deep learning models through transfer learning (TL), features fusion using modified serial based approach, and improved whale optimization based best features selection for final classification. Two pre-trained deep learning models such as InceptionV3 and Resnet101 are fine-tuned and TL is employed to train on action recognition datasets. The fusion process increases the length of feature vectors; therefore, improved whale optimization algorithm is proposed and selects the best features. The best selected features are finally classified usingmachine learning (ML) classifiers. Four publicly accessible datasets such as Ut-interaction, Hollywood, Free Viewpoint Action Recognition usingMotion History Volumes (IXMAS), and centre of computer vision (UCF) Sports, are employed and achieved the testing accuracy of 100\%, 99.9\%, 99.1\%, and 100\% respectively. Comparison with state of the art techniques (SOTA), the proposed method showed the improved accuracy.",

keywords = "Human action recognition, deep learning, features optimization, fusion of multiple features, transfer learning",

author = "Faheem Shehzad and Khan, \{Muhammad Attique\} and Yar, \{Muhammad Asfand E.\} and Muhammad Sharif and Majed Alhaisoni and Usman Tariq and Arnab Majumdar and Orawit Thinnukool",

year = "2023",

doi = "10.32604/cmc.2023.028743",

language = "English",

volume = "74",

pages = "5931--5949",

journal = "Computers, Materials and Continua",

issn = "1546-2218",

publisher = "Tech Science Press",

number = "3",

}

TY - JOUR

T1 - Two-Stream Deep Learning Architecture-Based Human Action Recognition

AU - Shehzad, Faheem

AU - Khan, Muhammad Attique

AU - Yar, Muhammad Asfand E.

AU - Sharif, Muhammad

AU - Alhaisoni, Majed

AU - Tariq, Usman

AU - Majumdar, Arnab

AU - Thinnukool, Orawit

PY - 2023

Y1 - 2023

N2 - Human action recognition (HAR) based on Artificial intelligence reasoning is the most important research area in computer vision. Big breakthroughs in this field have been observed in the last few years; additionally, the interest in research in this field is evolving, such as understanding of actions and scenes, studying human joints, and human posture recognition. Many HAR techniques are introduced in the literature. Nonetheless, the challenge of redundant and irrelevant features reduces recognition accuracy. They also faced a few other challenges, such as differing perspectives, environmental conditions, and temporal variations, among others. In this work, a deep learning and improved whale optimization algorithm based framework is proposed for HAR. The proposed framework consists of a few core stages i.e., frames initial preprocessing, fine-tuned pre-trained deep learning models through transfer learning (TL), features fusion using modified serial based approach, and improved whale optimization based best features selection for final classification. Two pre-trained deep learning models such as InceptionV3 and Resnet101 are fine-tuned and TL is employed to train on action recognition datasets. The fusion process increases the length of feature vectors; therefore, improved whale optimization algorithm is proposed and selects the best features. The best selected features are finally classified usingmachine learning (ML) classifiers. Four publicly accessible datasets such as Ut-interaction, Hollywood, Free Viewpoint Action Recognition usingMotion History Volumes (IXMAS), and centre of computer vision (UCF) Sports, are employed and achieved the testing accuracy of 100%, 99.9%, 99.1%, and 100% respectively. Comparison with state of the art techniques (SOTA), the proposed method showed the improved accuracy.

AB - Human action recognition (HAR) based on Artificial intelligence reasoning is the most important research area in computer vision. Big breakthroughs in this field have been observed in the last few years; additionally, the interest in research in this field is evolving, such as understanding of actions and scenes, studying human joints, and human posture recognition. Many HAR techniques are introduced in the literature. Nonetheless, the challenge of redundant and irrelevant features reduces recognition accuracy. They also faced a few other challenges, such as differing perspectives, environmental conditions, and temporal variations, among others. In this work, a deep learning and improved whale optimization algorithm based framework is proposed for HAR. The proposed framework consists of a few core stages i.e., frames initial preprocessing, fine-tuned pre-trained deep learning models through transfer learning (TL), features fusion using modified serial based approach, and improved whale optimization based best features selection for final classification. Two pre-trained deep learning models such as InceptionV3 and Resnet101 are fine-tuned and TL is employed to train on action recognition datasets. The fusion process increases the length of feature vectors; therefore, improved whale optimization algorithm is proposed and selects the best features. The best selected features are finally classified usingmachine learning (ML) classifiers. Four publicly accessible datasets such as Ut-interaction, Hollywood, Free Viewpoint Action Recognition usingMotion History Volumes (IXMAS), and centre of computer vision (UCF) Sports, are employed and achieved the testing accuracy of 100%, 99.9%, 99.1%, and 100% respectively. Comparison with state of the art techniques (SOTA), the proposed method showed the improved accuracy.

KW - Human action recognition

KW - deep learning

KW - features optimization

KW - fusion of multiple features

KW - transfer learning

UR - https://www.scopus.com/pages/publications/85145356115

U2 - 10.32604/cmc.2023.028743

DO - 10.32604/cmc.2023.028743

M3 - Article

AN - SCOPUS:85145356115

SN - 1546-2218

VL - 74

SP - 5931

EP - 5949

JO - Computers, Materials and Continua

JF - Computers, Materials and Continua

IS - 3

ER -

Two-Stream Deep Learning Architecture-Based Human Action Recognition

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this