MULTICAUSENET temporal attention for multimodal emotion cause pair extraction

Ma Junchi, Hassan Nazeer Chaudhry, Farzana Kulsoom, Yang Guihua, Sajid Ullah Khan, Sujit Biswas, Zahid Ullah Khan, Faheem Khan

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

In the realm of emotion recognition, understanding the intricate relationships between emotions and their underlying causes remains a significant challenge. This paper presents MultiCauseNet, a novel framework designed to effectively extract emotion-cause pairs by leveraging multimodal data, including text, audio, and video. The proposed approach integrates advanced multimodal feature extraction techniques with attention mechanisms to enhance the understanding of emotional contexts. The key text, audio, and video features are extracted using BERT, Wav2Vec, and Vision transformers (ViTs), which are then employed to construct a comprehensive multimodal graph. The graph encodes the relationships between emotions and potential causes, and Graph Attention Networks (GATs) are used to weigh and prioritize relevant features across the modalities. To further improve performance, Transformers are employed to model intra-modal and inter-modal dependencies through self-attention and cross-attention mechanisms. This enables a more robust multimodal information fusion, capturing the global context of emotional interactions. This dynamic attention mechanism enables MultiCauseNet to capture complex interactions between emotional triggers and causes, improving extraction accuracy. Experiments on emotion benchmark datasets, including IEMOCAP and MELD achieved a WFI score of 73.02 and 53.67 respectively. The results for cause pair analysis are evaluated on ECF and ConvECPE with a Cause recognition F1 score of 65.12 and 84.51, and a Pair extraction F1 score of 55.12 and 51.34.

Original languageEnglish
Article number19372
JournalScientific Reports
Volume15
Issue number1
DOIs
StatePublished - Dec 2025

Keywords

  • Emotion triggers
  • Emotion–cause pair extraction
  • Feature fusion
  • Graph attention networks (GATs)
  • Multimodal emotion recognition
  • Multimodal graphs
  • Self and cross attention
  • Transformers and attention mechanisms
  • Vision transformers (ViTs)

Fingerprint

Dive into the research topics of 'MULTICAUSENET temporal attention for multimodal emotion cause pair extraction'. Together they form a unique fingerprint.

Cite this