TY - JOUR
T1 - Integrating end-to-end multimodal deep learning and domain adaptation for robust facial expression recognition
AU - Hassaballah, Mahmoud
AU - Pero, Chiara
AU - Rout, Ranjeet Kumar
AU - Umer, Saiyed
N1 - Publisher Copyright:
© 2025
PY - 2025/6
Y1 - 2025/6
N2 - This paper presents an advanced approach to a facial expression recognition (FER) system designed for robust performance across diverse imaging environments. The proposed method consists of four primary components: image preprocessing, feature representation and classification, cross-domain feature analysis, and domain adaptation. The process begins with facial region extraction from input images, including those captured in unconstrained imaging conditions, where variations in lighting, background, and image quality significantly impact recognition performance. The extracted facial region undergoes feature extraction using an ensemble of multimodal deep learning techniques, including end-to-end CNNs, BilinearCNN, TrilinearCNN, and pretrained CNN models, which capture both local and global facial features with high precision. The ensemble approach enriches feature representation by integrating information from multiple models, enhancing the system's ability to generalize across different subjects and expressions. These deep features are then passed to a classifier trained to recognize facial expressions effectively in real-time scenarios. Since images captured in real-world conditions often contain noise and artifacts that can compromise accuracy, cross-domain analysis is performed to evaluate the discriminative power and robustness of the extracted deep features. FER systems typically experience performance degradation when applied to domains that differ from the original training environment. To mitigate this issue, domain adaptation techniques are incorporated, enabling the system to effectively adjust to new imaging conditions and improving recognition accuracy even in challenging real-time acquisition environments. The proposed FER system is validated using four well-established benchmark datasets: CK+, KDEF, IMFDB and AffectNet. Experimental results demonstrate that the proposed system achieves high performance within original domains and exhibits superior cross-domain recognition compared to existing state-of-the-art methods. These findings indicate that the system is highly reliable for applications requiring robust and adaptive FER capabilities across varying imaging conditions and domains.
AB - This paper presents an advanced approach to a facial expression recognition (FER) system designed for robust performance across diverse imaging environments. The proposed method consists of four primary components: image preprocessing, feature representation and classification, cross-domain feature analysis, and domain adaptation. The process begins with facial region extraction from input images, including those captured in unconstrained imaging conditions, where variations in lighting, background, and image quality significantly impact recognition performance. The extracted facial region undergoes feature extraction using an ensemble of multimodal deep learning techniques, including end-to-end CNNs, BilinearCNN, TrilinearCNN, and pretrained CNN models, which capture both local and global facial features with high precision. The ensemble approach enriches feature representation by integrating information from multiple models, enhancing the system's ability to generalize across different subjects and expressions. These deep features are then passed to a classifier trained to recognize facial expressions effectively in real-time scenarios. Since images captured in real-world conditions often contain noise and artifacts that can compromise accuracy, cross-domain analysis is performed to evaluate the discriminative power and robustness of the extracted deep features. FER systems typically experience performance degradation when applied to domains that differ from the original training environment. To mitigate this issue, domain adaptation techniques are incorporated, enabling the system to effectively adjust to new imaging conditions and improving recognition accuracy even in challenging real-time acquisition environments. The proposed FER system is validated using four well-established benchmark datasets: CK+, KDEF, IMFDB and AffectNet. Experimental results demonstrate that the proposed system achieves high performance within original domains and exhibits superior cross-domain recognition compared to existing state-of-the-art methods. These findings indicate that the system is highly reliable for applications requiring robust and adaptive FER capabilities across varying imaging conditions and domains.
KW - Domain adaptation
KW - Ensemble learning
KW - Facial expressions
KW - Multimodal deep learning
UR - http://www.scopus.com/inward/record.url?scp=105003992470&partnerID=8YFLogxK
U2 - 10.1016/j.imavis.2025.105548
DO - 10.1016/j.imavis.2025.105548
M3 - Article
AN - SCOPUS:105003992470
SN - 0262-8856
VL - 159
JO - Image and Vision Computing
JF - Image and Vision Computing
M1 - 105548
ER -