Remote Sensing Surveillance Using Multilevel Feature Fusion and Deep Neural Network

Laiba Zahoor; Haifa F. Alhasson; Mohammed Alnusayri; Mohammed Alatiyyah; Dina Abdulaziz Alhammadi; Ahmad Jalal; Hui Liu

doi:10.1109/ACCESS.2025.3542435

Remote Sensing Surveillance Using Multilevel Feature Fusion and Deep Neural Network

Laiba Zahoor, Haifa F. Alhasson, Mohammed Alnusayri, Mohammed Alatiyyah, Dina Abdulaziz Alhammadi, Ahmad Jalal, Hui Liu

Computer Sciences

Research output: Contribution to journal › Article › peer-review

3 Scopus citations

Abstract

Human action recognition from aerial imagery poses significant challenges due to the dynamic nature of the scenes and the complexity of human movements. In this paper, we present an enhanced system that combines YOLO for human detection with a complete multilevel feature fusion approach to improve recognition of human actions in drone-captured photos. Our system presents a reliable drone-based human action system through the integration of state-of-the-art methods for multilevel feature extraction and object detection. Initially, frames are extracted individually from drone footage sequences. Preprocessing techniques, which include Gaussian blur, grayscale conversion, and background removal, are used for every frame to improve image quality and feature reliability. For object detection, we effectively locate and recognize human subjects in these aerial frames using YOLO approach. Afterward, the framework extracts 14 body landmarks that represent the shape of the human body by keypoint extraction. Four significant features are employed to capture the complexity of human movement effectively: the incorporation of 3D point cloud data adds depth to the image and makes it feasible to construct a more detailed three-dimensional representation; measuring the angles between keypoints provides significant details on joint orientations which are essential for posture analysis; and geodesic distance measure the shortest paths along the surface of the body to provide useful insight into the spatial relationships between keypoints. The extracted features are optimized by using quadratic discriminant analysis. In the end, a deep neural network was trained to perform the action classification. Three benchmark datasets, the UAV Gesture, UAV Human, and UCF-ARG datasets, were used for our experiments and system testing. Our model achieved corresponding action recognition accuracy values of 90.15%, 72.37%, and 76.50% on each of these datasets.

Original language	English
Pages (from-to)	38282-38300
Number of pages	19
Journal	IEEE Access
Volume	13
DOIs	https://doi.org/10.1109/ACCESS.2025.3542435
State	Published - 2025

Keywords

Human action recognition
aerial imaging
body pose
deep learning
image analysis
multilevel feature fusion
object detectors

Access to Document

10.1109/ACCESS.2025.3542435

Cite this

@article{511232da6d314cd3969263dcf3604013,

title = "Remote Sensing Surveillance Using Multilevel Feature Fusion and Deep Neural Network",

abstract = "Human action recognition from aerial imagery poses significant challenges due to the dynamic nature of the scenes and the complexity of human movements. In this paper, we present an enhanced system that combines YOLO for human detection with a complete multilevel feature fusion approach to improve recognition of human actions in drone-captured photos. Our system presents a reliable drone-based human action system through the integration of state-of-the-art methods for multilevel feature extraction and object detection. Initially, frames are extracted individually from drone footage sequences. Preprocessing techniques, which include Gaussian blur, grayscale conversion, and background removal, are used for every frame to improve image quality and feature reliability. For object detection, we effectively locate and recognize human subjects in these aerial frames using YOLO approach. Afterward, the framework extracts 14 body landmarks that represent the shape of the human body by keypoint extraction. Four significant features are employed to capture the complexity of human movement effectively: the incorporation of 3D point cloud data adds depth to the image and makes it feasible to construct a more detailed three-dimensional representation; measuring the angles between keypoints provides significant details on joint orientations which are essential for posture analysis; and geodesic distance measure the shortest paths along the surface of the body to provide useful insight into the spatial relationships between keypoints. The extracted features are optimized by using quadratic discriminant analysis. In the end, a deep neural network was trained to perform the action classification. Three benchmark datasets, the UAV Gesture, UAV Human, and UCF-ARG datasets, were used for our experiments and system testing. Our model achieved corresponding action recognition accuracy values of 90.15\%, 72.37\%, and 76.50\% on each of these datasets.",

keywords = "Human action recognition, aerial imaging, body pose, deep learning, image analysis, multilevel feature fusion, object detectors",

author = "Laiba Zahoor and Alhasson, \{Haifa F.\} and Mohammed Alnusayri and Mohammed Alatiyyah and Alhammadi, \{Dina Abdulaziz\} and Ahmad Jalal and Hui Liu",

note = "Publisher Copyright: {\textcopyright} 2013 IEEE.",

year = "2025",

doi = "10.1109/ACCESS.2025.3542435",

language = "English",

volume = "13",

pages = "38282--38300",

journal = "IEEE Access",

issn = "2169-3536",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Remote Sensing Surveillance Using Multilevel Feature Fusion and Deep Neural Network

AU - Zahoor, Laiba

AU - Alhasson, Haifa F.

AU - Alnusayri, Mohammed

AU - Alatiyyah, Mohammed

AU - Alhammadi, Dina Abdulaziz

AU - Jalal, Ahmad

AU - Liu, Hui

PY - 2025

Y1 - 2025

N2 - Human action recognition from aerial imagery poses significant challenges due to the dynamic nature of the scenes and the complexity of human movements. In this paper, we present an enhanced system that combines YOLO for human detection with a complete multilevel feature fusion approach to improve recognition of human actions in drone-captured photos. Our system presents a reliable drone-based human action system through the integration of state-of-the-art methods for multilevel feature extraction and object detection. Initially, frames are extracted individually from drone footage sequences. Preprocessing techniques, which include Gaussian blur, grayscale conversion, and background removal, are used for every frame to improve image quality and feature reliability. For object detection, we effectively locate and recognize human subjects in these aerial frames using YOLO approach. Afterward, the framework extracts 14 body landmarks that represent the shape of the human body by keypoint extraction. Four significant features are employed to capture the complexity of human movement effectively: the incorporation of 3D point cloud data adds depth to the image and makes it feasible to construct a more detailed three-dimensional representation; measuring the angles between keypoints provides significant details on joint orientations which are essential for posture analysis; and geodesic distance measure the shortest paths along the surface of the body to provide useful insight into the spatial relationships between keypoints. The extracted features are optimized by using quadratic discriminant analysis. In the end, a deep neural network was trained to perform the action classification. Three benchmark datasets, the UAV Gesture, UAV Human, and UCF-ARG datasets, were used for our experiments and system testing. Our model achieved corresponding action recognition accuracy values of 90.15%, 72.37%, and 76.50% on each of these datasets.

AB - Human action recognition from aerial imagery poses significant challenges due to the dynamic nature of the scenes and the complexity of human movements. In this paper, we present an enhanced system that combines YOLO for human detection with a complete multilevel feature fusion approach to improve recognition of human actions in drone-captured photos. Our system presents a reliable drone-based human action system through the integration of state-of-the-art methods for multilevel feature extraction and object detection. Initially, frames are extracted individually from drone footage sequences. Preprocessing techniques, which include Gaussian blur, grayscale conversion, and background removal, are used for every frame to improve image quality and feature reliability. For object detection, we effectively locate and recognize human subjects in these aerial frames using YOLO approach. Afterward, the framework extracts 14 body landmarks that represent the shape of the human body by keypoint extraction. Four significant features are employed to capture the complexity of human movement effectively: the incorporation of 3D point cloud data adds depth to the image and makes it feasible to construct a more detailed three-dimensional representation; measuring the angles between keypoints provides significant details on joint orientations which are essential for posture analysis; and geodesic distance measure the shortest paths along the surface of the body to provide useful insight into the spatial relationships between keypoints. The extracted features are optimized by using quadratic discriminant analysis. In the end, a deep neural network was trained to perform the action classification. Three benchmark datasets, the UAV Gesture, UAV Human, and UCF-ARG datasets, were used for our experiments and system testing. Our model achieved corresponding action recognition accuracy values of 90.15%, 72.37%, and 76.50% on each of these datasets.

KW - Human action recognition

KW - aerial imaging

KW - body pose

KW - deep learning

KW - image analysis

KW - multilevel feature fusion

KW - object detectors

UR - http://www.scopus.com/inward/record.url?scp=105001059056&partnerID=8YFLogxK

U2 - 10.1109/ACCESS.2025.3542435

DO - 10.1109/ACCESS.2025.3542435

M3 - Article

AN - SCOPUS:105001059056

SN - 2169-3536

VL - 13

SP - 38282

EP - 38300

JO - IEEE Access

JF - IEEE Access

ER -

Remote Sensing Surveillance Using Multilevel Feature Fusion and Deep Neural Network

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this