Abstract
Human activity recognition (HAR) plays a vital role in smart surveillance, healthcare monitoring, and human–computer interaction. Among its applications, fall detection for elderly people is particularly important due to its potential to reduce severe health risks through timely intervention. However, achieving accurate HAR in real-world surveillance scenarios is challenging because of noise, occlusions, and complex environments. In this study, we leverage the capabilities of the Vision Transformer (ViT) combined with scale-invariant feature transform (SIFT) descriptors for robust fall detection and HAR. The framework is evaluated on a dataset that is acquired through video recordings simulating surveillance conditions, focusing on daily human activities such as walking and falling. Comparative experiments against baseline models, including multilayer perceptrons, convolutional neural networks, long short-term memory, EfficientNetB4, Inception, ResNet, and Xception, reveal the superiority of the proposed approach, achieving 99.69% accuracy. This research highlights the potential of ViT-based vision backbones as a building block for multisensor smart surveillance pipelines, where integration with inertial, audio, or radar sensors can further enhance robustness. This research contributes to the development of smart surveillance systems for elderly assistance, real-time monitoring, and public safety applications.
| Original language | English |
|---|---|
| Article number | 2540021 |
| Journal | International Journal of Humanoid Robotics |
| DOIs | |
| State | Accepted/In press - 2026 |
Keywords
- computer vision
- deep learning for action recognition
- multisensor fusion
- scale invariant feature transform (SIFT)
- Smart surveillance
- vision transformer model
Fingerprint
Dive into the research topics of 'Vision Transformer with Scale-Invariant Features for Human Activity Recognition in Smart Surveillance-Based Fall Detection'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver