Vision Transformer with Scale-Invariant Features for Human Activity Recognition in Smart Surveillance-Based Fall Detection

  • Ebtisam Abdullah Alabdulqader
  • , Asma Aldrees
  • , Ghada Atteia
  • , Arwa Allinjawi
  • , Shtwai Alsubai
  • , Amjad Qashlan
  • , Younhyun Jung

Research output: Contribution to journalArticlepeer-review

Abstract

Human activity recognition (HAR) plays a vital role in smart surveillance, healthcare monitoring, and human–computer interaction. Among its applications, fall detection for elderly people is particularly important due to its potential to reduce severe health risks through timely intervention. However, achieving accurate HAR in real-world surveillance scenarios is challenging because of noise, occlusions, and complex environments. In this study, we leverage the capabilities of the Vision Transformer (ViT) combined with scale-invariant feature transform (SIFT) descriptors for robust fall detection and HAR. The framework is evaluated on a dataset that is acquired through video recordings simulating surveillance conditions, focusing on daily human activities such as walking and falling. Comparative experiments against baseline models, including multilayer perceptrons, convolutional neural networks, long short-term memory, EfficientNetB4, Inception, ResNet, and Xception, reveal the superiority of the proposed approach, achieving 99.69% accuracy. This research highlights the potential of ViT-based vision backbones as a building block for multisensor smart surveillance pipelines, where integration with inertial, audio, or radar sensors can further enhance robustness. This research contributes to the development of smart surveillance systems for elderly assistance, real-time monitoring, and public safety applications.

Original languageEnglish
Article number2540021
JournalInternational Journal of Humanoid Robotics
DOIs
StateAccepted/In press - 2026

Keywords

  • computer vision
  • deep learning for action recognition
  • multisensor fusion
  • scale invariant feature transform (SIFT)
  • Smart surveillance
  • vision transformer model

Fingerprint

Dive into the research topics of 'Vision Transformer with Scale-Invariant Features for Human Activity Recognition in Smart Surveillance-Based Fall Detection'. Together they form a unique fingerprint.

Cite this