TY - JOUR
T1 - Advanced Biosignal-RGB Fusion With Adaptive Neurofuzzy Classification for High-Precision Action Recognition
AU - Abro, Iqra Aijaz
AU - Alhasson, Haifa F.
AU - Alharbi, Shuaa S.
AU - Alatiyyah, Mohammed
AU - AlHammadi, Dina Abdulaziz
AU - Jalal, Ahmad
AU - Liu, Hui
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2025
Y1 - 2025
N2 - In the domain of action recognition using multisensory data, the integration of RGB and signal-based modalities offers a promising approach to enhance the accuracy of action classification systems. Our system was developed through experimentation on three benchmark datasets: UTD-MHAD (University of Texas at Dallas Multimodal Human Action Dataset), HWU-USP and LaRa. Initially, the data undergoes preprocessing, where gaussian and butterworth filters are applied to the RGB and signal data, respectively. Following this, windowing/segmentation is applied to signals and RGB data. After that, features are extracted from the signal data, including auto-regression, MFCC (Mel-frequency Cepstral Coefficients), and transient detection principle, while the RGB (Red Green Blue) was processed as a combined input to extract features such as angles, velocity, full-body elliptical modeling, fiducial points, and a 2.5D point cloud of the entire body. These features are then fused, followed by the application of the Yeo-Johnson power optimizer to refine the data. The optimized data is subsequently classified using a Neurofuzzy classifier to recognize different actions. This classifier is chosen for its ability to adapt to the heterogeneous nature of multimodal data, where features are spread across different domains, making traditional classifiers less effective. The Neurofuzzy model employs cross-validation for training and testing to ensure reliable results. The results also suggest that the proposed model yields a higher accuracy than the existing models. More specifically, in the HWU-USP dataset, the accuracy amounts to mean 89%, in the LaRa, to mean 91% and 88% over the UTD-MHAD dataset. The system under study effectively distinguishes related actions, but its efficiency is hindered by the complexity of individual actions and the increased noise in the dataset.
AB - In the domain of action recognition using multisensory data, the integration of RGB and signal-based modalities offers a promising approach to enhance the accuracy of action classification systems. Our system was developed through experimentation on three benchmark datasets: UTD-MHAD (University of Texas at Dallas Multimodal Human Action Dataset), HWU-USP and LaRa. Initially, the data undergoes preprocessing, where gaussian and butterworth filters are applied to the RGB and signal data, respectively. Following this, windowing/segmentation is applied to signals and RGB data. After that, features are extracted from the signal data, including auto-regression, MFCC (Mel-frequency Cepstral Coefficients), and transient detection principle, while the RGB (Red Green Blue) was processed as a combined input to extract features such as angles, velocity, full-body elliptical modeling, fiducial points, and a 2.5D point cloud of the entire body. These features are then fused, followed by the application of the Yeo-Johnson power optimizer to refine the data. The optimized data is subsequently classified using a Neurofuzzy classifier to recognize different actions. This classifier is chosen for its ability to adapt to the heterogeneous nature of multimodal data, where features are spread across different domains, making traditional classifiers less effective. The Neurofuzzy model employs cross-validation for training and testing to ensure reliable results. The results also suggest that the proposed model yields a higher accuracy than the existing models. More specifically, in the HWU-USP dataset, the accuracy amounts to mean 89%, in the LaRa, to mean 91% and 88% over the UTD-MHAD dataset. The system under study effectively distinguishes related actions, but its efficiency is hindered by the complexity of individual actions and the increased noise in the dataset.
KW - Video sensors
KW - artificial intelligence
KW - body pose
KW - computer vision
KW - feature fusion
KW - inertial sensing
KW - machine learning
KW - multimodel system
KW - neurofuzzy
UR - http://www.scopus.com/inward/record.url?scp=105003036515&partnerID=8YFLogxK
U2 - 10.1109/ACCESS.2025.3553196
DO - 10.1109/ACCESS.2025.3553196
M3 - Article
AN - SCOPUS:105003036515
SN - 2169-3536
VL - 13
SP - 57287
EP - 57310
JO - IEEE Access
JF - IEEE Access
ER -