TY - JOUR
T1 - AraEventCoref
T2 - An Arabic Event Coreference Dataset and LLM Benchmarks
AU - Aldawsari, Mohammed
AU - Dawood, Omer
N1 - Publisher Copyright:
© 2025 Copyright held by the owner/author(s). Publication rights licensed to ACM.
PY - 2025/7/10
Y1 - 2025/7/10
N2 - Event coreference resolution is a critical task in Natural Language Processing (NLP), enabling applications such as information extraction, text summarization, and question answering. However, resolving event coreference in Arabic presents unique challenges due to the language's rich morphology, complex syntax, and lack of annotated resources. This article introduces AraEventCoref, the first publicly available Arabic event coreference dataset, comprising 50 annotated news articles with 1,381 events and 159 coreference chains. The dataset's annotation agreement achieved a CoNLL score of 75.8%, ensuring high reliability across B3, MUC, and CEAFe metrics. Additionally, event triggers were annotated with an inter-annotator agreement of 96% using Cohen's Kappa, further validating dataset quality. To establish benchmarks, we developed a fine-tuned CamelBERT-msa model as a strong baseline and evaluated state-of-the-art Arabic large language models (LLMs) using both bilingual and Arabic-only prompts. Results demonstrate the effectiveness of fine-tuning for domain-specific adaptation and reveal the impact of bilingual prompting on LLM performance. By providing a high-quality dataset and benchmarking results, this work lays a foundation for advancing Arabic event coreference research and supports future developments in event relation extraction.
AB - Event coreference resolution is a critical task in Natural Language Processing (NLP), enabling applications such as information extraction, text summarization, and question answering. However, resolving event coreference in Arabic presents unique challenges due to the language's rich morphology, complex syntax, and lack of annotated resources. This article introduces AraEventCoref, the first publicly available Arabic event coreference dataset, comprising 50 annotated news articles with 1,381 events and 159 coreference chains. The dataset's annotation agreement achieved a CoNLL score of 75.8%, ensuring high reliability across B3, MUC, and CEAFe metrics. Additionally, event triggers were annotated with an inter-annotator agreement of 96% using Cohen's Kappa, further validating dataset quality. To establish benchmarks, we developed a fine-tuned CamelBERT-msa model as a strong baseline and evaluated state-of-the-art Arabic large language models (LLMs) using both bilingual and Arabic-only prompts. Results demonstrate the effectiveness of fine-tuning for domain-specific adaptation and reveal the impact of bilingual prompting on LLM performance. By providing a high-quality dataset and benchmarking results, this work lays a foundation for advancing Arabic event coreference research and supports future developments in event relation extraction.
KW - Arabic event
KW - Arabic event coreference
KW - Arabic event relation extraction
UR - http://www.scopus.com/inward/record.url?scp=105012404196&partnerID=8YFLogxK
U2 - 10.1145/3743047
DO - 10.1145/3743047
M3 - Article
AN - SCOPUS:105012404196
SN - 2375-4699
VL - 24
JO - ACM Transactions on Asian and Low-Resource Language Information Processing
JF - ACM Transactions on Asian and Low-Resource Language Information Processing
IS - 7
M1 - 67
ER -