An optimized multi-scale convolutional autoencoder for efficient abnormal event detection using rgb, depth and optical flow data

Abdullah Alqahtani

doi:10.1007/s11042-025-20608-5

An optimized multi-scale convolutional autoencoder for efficient abnormal event detection using rgb, depth and optical flow data

Abdullah Alqahtani

Computer Sciences

Research output: Contribution to journal › Article › peer-review

Abstract

In this study, we propose a novel framework for detecting abnormal events in surveillance videos, a critical yet challenging task in security applications. This research introduces a robust and efficient solution for video anomaly detection, offering substantial improvements in surveillance systems' ability to detect abnormal events, thereby contributing to enhanced security measures in public spaces. The proposed framework utilizes a Multiscale Convolutional Autoencoder (MSCAE) that processes inputs from RGB, depth, and optical flow video clips, enhancing the detection accuracy in complex scenes characterized by varying object scales, aspect ratios, and occlusions. To address the challenge of noise and preserve edges in video data, we implement a two-pass bilateral smooth filtering method, which is effective for noise-invariant, edge-preserving image smoothing. For object detection within these complex scenes, an enhanced Faster R-CNN model is employed. This model's performance is further refined through transfer learning on a dataset specifically composed of abnormal event videos. We also introduce significant improvements to the region proposal network (RPN) of the Faster R-CNN, particularly in non-maximum suppression (NMS) and anchor generation techniques, to better detect anomalies in diverse and complex environments. Furthermore, the MSCAE is integrated with Long Short-Term Memory (LSTM) neural networks to classify the detected anomalies, creating an end-to-end solution for video anomaly detection. Hyperparameter optimization for our deep learning models is performed using the Chameleon Swarm Algorithm, ensuring optimal model performance. Our framework was rigorously tested on the CUHK Avenue dataset, where it achieved a remarkable 99.5% accuracy, significantly outperforming existing methods and demonstrating the effectiveness of our approach.

Original language	English
Pages (from-to)	34401-34435
Number of pages	35
Journal	Multimedia Tools and Applications
Volume	84
Issue number	28
DOIs	https://doi.org/10.1007/s11042-025-20608-5
State	Published - Aug 2025

Keywords

Abnormal event detection
Deep learning
Feature fusion
Key frame extraction
Object detection
Optimization algorithm
Video anomaly detection

Access to Document

10.1007/s11042-025-20608-5

Cite this

@article{36d9859aeaba403fa9f4b47d326a5e24,

title = "An optimized multi-scale convolutional autoencoder for efficient abnormal event detection using rgb, depth and optical flow data",

abstract = "In this study, we propose a novel framework for detecting abnormal events in surveillance videos, a critical yet challenging task in security applications. This research introduces a robust and efficient solution for video anomaly detection, offering substantial improvements in surveillance systems' ability to detect abnormal events, thereby contributing to enhanced security measures in public spaces. The proposed framework utilizes a Multiscale Convolutional Autoencoder (MSCAE) that processes inputs from RGB, depth, and optical flow video clips, enhancing the detection accuracy in complex scenes characterized by varying object scales, aspect ratios, and occlusions. To address the challenge of noise and preserve edges in video data, we implement a two-pass bilateral smooth filtering method, which is effective for noise-invariant, edge-preserving image smoothing. For object detection within these complex scenes, an enhanced Faster R-CNN model is employed. This model's performance is further refined through transfer learning on a dataset specifically composed of abnormal event videos. We also introduce significant improvements to the region proposal network (RPN) of the Faster R-CNN, particularly in non-maximum suppression (NMS) and anchor generation techniques, to better detect anomalies in diverse and complex environments. Furthermore, the MSCAE is integrated with Long Short-Term Memory (LSTM) neural networks to classify the detected anomalies, creating an end-to-end solution for video anomaly detection. Hyperparameter optimization for our deep learning models is performed using the Chameleon Swarm Algorithm, ensuring optimal model performance. Our framework was rigorously tested on the CUHK Avenue dataset, where it achieved a remarkable 99.5\% accuracy, significantly outperforming existing methods and demonstrating the effectiveness of our approach.",

keywords = "Abnormal event detection, Deep learning, Feature fusion, Key frame extraction, Object detection, Optimization algorithm, Video anomaly detection",

author = "Abdullah Alqahtani",

note = "Publisher Copyright: {\textcopyright} The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2025.",

year = "2025",

month = aug,

doi = "10.1007/s11042-025-20608-5",

language = "English",

volume = "84",

pages = "34401--34435",

journal = "Multimedia Tools and Applications",

issn = "1380-7501",

publisher = "Springer",

number = "28",

}

TY - JOUR

T1 - An optimized multi-scale convolutional autoencoder for efficient abnormal event detection using rgb, depth and optical flow data

AU - Alqahtani, Abdullah

N1 - Publisher Copyright: © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2025.

PY - 2025/8

Y1 - 2025/8

N2 - In this study, we propose a novel framework for detecting abnormal events in surveillance videos, a critical yet challenging task in security applications. This research introduces a robust and efficient solution for video anomaly detection, offering substantial improvements in surveillance systems' ability to detect abnormal events, thereby contributing to enhanced security measures in public spaces. The proposed framework utilizes a Multiscale Convolutional Autoencoder (MSCAE) that processes inputs from RGB, depth, and optical flow video clips, enhancing the detection accuracy in complex scenes characterized by varying object scales, aspect ratios, and occlusions. To address the challenge of noise and preserve edges in video data, we implement a two-pass bilateral smooth filtering method, which is effective for noise-invariant, edge-preserving image smoothing. For object detection within these complex scenes, an enhanced Faster R-CNN model is employed. This model's performance is further refined through transfer learning on a dataset specifically composed of abnormal event videos. We also introduce significant improvements to the region proposal network (RPN) of the Faster R-CNN, particularly in non-maximum suppression (NMS) and anchor generation techniques, to better detect anomalies in diverse and complex environments. Furthermore, the MSCAE is integrated with Long Short-Term Memory (LSTM) neural networks to classify the detected anomalies, creating an end-to-end solution for video anomaly detection. Hyperparameter optimization for our deep learning models is performed using the Chameleon Swarm Algorithm, ensuring optimal model performance. Our framework was rigorously tested on the CUHK Avenue dataset, where it achieved a remarkable 99.5% accuracy, significantly outperforming existing methods and demonstrating the effectiveness of our approach.

AB - In this study, we propose a novel framework for detecting abnormal events in surveillance videos, a critical yet challenging task in security applications. This research introduces a robust and efficient solution for video anomaly detection, offering substantial improvements in surveillance systems' ability to detect abnormal events, thereby contributing to enhanced security measures in public spaces. The proposed framework utilizes a Multiscale Convolutional Autoencoder (MSCAE) that processes inputs from RGB, depth, and optical flow video clips, enhancing the detection accuracy in complex scenes characterized by varying object scales, aspect ratios, and occlusions. To address the challenge of noise and preserve edges in video data, we implement a two-pass bilateral smooth filtering method, which is effective for noise-invariant, edge-preserving image smoothing. For object detection within these complex scenes, an enhanced Faster R-CNN model is employed. This model's performance is further refined through transfer learning on a dataset specifically composed of abnormal event videos. We also introduce significant improvements to the region proposal network (RPN) of the Faster R-CNN, particularly in non-maximum suppression (NMS) and anchor generation techniques, to better detect anomalies in diverse and complex environments. Furthermore, the MSCAE is integrated with Long Short-Term Memory (LSTM) neural networks to classify the detected anomalies, creating an end-to-end solution for video anomaly detection. Hyperparameter optimization for our deep learning models is performed using the Chameleon Swarm Algorithm, ensuring optimal model performance. Our framework was rigorously tested on the CUHK Avenue dataset, where it achieved a remarkable 99.5% accuracy, significantly outperforming existing methods and demonstrating the effectiveness of our approach.

KW - Abnormal event detection

KW - Deep learning

KW - Feature fusion

KW - Key frame extraction

KW - Object detection

KW - Optimization algorithm

KW - Video anomaly detection

UR - http://www.scopus.com/inward/record.url?scp=85217164149&partnerID=8YFLogxK

U2 - 10.1007/s11042-025-20608-5

DO - 10.1007/s11042-025-20608-5

M3 - Article

AN - SCOPUS:85217164149

SN - 1380-7501

VL - 84

SP - 34401

EP - 34435

JO - Multimedia Tools and Applications

JF - Multimedia Tools and Applications

IS - 28

ER -

An optimized multi-scale convolutional autoencoder for efficient abnormal event detection using rgb, depth and optical flow data

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this