AraEventCoref: An Arabic Event Coreference Dataset and LLM Benchmarks

Mohammed Aldawsari; Omer Dawood

doi:10.1145/3743047

AraEventCoref: An Arabic Event Coreference Dataset and LLM Benchmarks

Mohammed Aldawsari, Omer Dawood

Computer Engineering

Prince Sattam Bin Abdulaziz University

Research output: Contribution to journal › Article › peer-review

Abstract

Event coreference resolution is a critical task in Natural Language Processing (NLP), enabling applications such as information extraction, text summarization, and question answering. However, resolving event coreference in Arabic presents unique challenges due to the language's rich morphology, complex syntax, and lack of annotated resources. This article introduces AraEventCoref, the first publicly available Arabic event coreference dataset, comprising 50 annotated news articles with 1,381 events and 159 coreference chains. The dataset's annotation agreement achieved a CoNLL score of 75.8%, ensuring high reliability across B³, MUC, and CEAF_e metrics. Additionally, event triggers were annotated with an inter-annotator agreement of 96% using Cohen's Kappa, further validating dataset quality. To establish benchmarks, we developed a fine-tuned CamelBERT-msa model as a strong baseline and evaluated state-of-the-art Arabic large language models (LLMs) using both bilingual and Arabic-only prompts. Results demonstrate the effectiveness of fine-tuning for domain-specific adaptation and reveal the impact of bilingual prompting on LLM performance. By providing a high-quality dataset and benchmarking results, this work lays a foundation for advancing Arabic event coreference research and supports future developments in event relation extraction.

Original language	English
Article number	67
Journal	ACM Transactions on Asian and Low-Resource Language Information Processing
Volume	24
Issue number	7
DOIs	https://doi.org/10.1145/3743047
State	Published - 10 Jul 2025

Keywords

Arabic event
Arabic event coreference
Arabic event relation extraction

Access to Document

10.1145/3743047

Cite this

@article{2cc0f8e9513042a181c7d4d6013a8478,

title = "AraEventCoref: An Arabic Event Coreference Dataset and LLM Benchmarks",

abstract = "Event coreference resolution is a critical task in Natural Language Processing (NLP), enabling applications such as information extraction, text summarization, and question answering. However, resolving event coreference in Arabic presents unique challenges due to the language's rich morphology, complex syntax, and lack of annotated resources. This article introduces AraEventCoref, the first publicly available Arabic event coreference dataset, comprising 50 annotated news articles with 1,381 events and 159 coreference chains. The dataset's annotation agreement achieved a CoNLL score of 75.8\%, ensuring high reliability across B3, MUC, and CEAFe metrics. Additionally, event triggers were annotated with an inter-annotator agreement of 96\% using Cohen's Kappa, further validating dataset quality. To establish benchmarks, we developed a fine-tuned CamelBERT-msa model as a strong baseline and evaluated state-of-the-art Arabic large language models (LLMs) using both bilingual and Arabic-only prompts. Results demonstrate the effectiveness of fine-tuning for domain-specific adaptation and reveal the impact of bilingual prompting on LLM performance. By providing a high-quality dataset and benchmarking results, this work lays a foundation for advancing Arabic event coreference research and supports future developments in event relation extraction.",

keywords = "Arabic event, Arabic event coreference, Arabic event relation extraction",

author = "Mohammed Aldawsari and Omer Dawood",

note = "Publisher Copyright: {\textcopyright} 2025 Copyright held by the owner/author(s). Publication rights licensed to ACM.",

year = "2025",

month = jul,

day = "10",

doi = "10.1145/3743047",

language = "English",

volume = "24",

journal = "ACM Transactions on Asian and Low-Resource Language Information Processing",

issn = "2375-4699",

publisher = "Association for Computing Machinery (ACM)",

number = "7",

}

TY - JOUR

T1 - AraEventCoref

T2 - An Arabic Event Coreference Dataset and LLM Benchmarks

AU - Aldawsari, Mohammed

AU - Dawood, Omer

PY - 2025/7/10

Y1 - 2025/7/10

N2 - Event coreference resolution is a critical task in Natural Language Processing (NLP), enabling applications such as information extraction, text summarization, and question answering. However, resolving event coreference in Arabic presents unique challenges due to the language's rich morphology, complex syntax, and lack of annotated resources. This article introduces AraEventCoref, the first publicly available Arabic event coreference dataset, comprising 50 annotated news articles with 1,381 events and 159 coreference chains. The dataset's annotation agreement achieved a CoNLL score of 75.8%, ensuring high reliability across B3, MUC, and CEAFe metrics. Additionally, event triggers were annotated with an inter-annotator agreement of 96% using Cohen's Kappa, further validating dataset quality. To establish benchmarks, we developed a fine-tuned CamelBERT-msa model as a strong baseline and evaluated state-of-the-art Arabic large language models (LLMs) using both bilingual and Arabic-only prompts. Results demonstrate the effectiveness of fine-tuning for domain-specific adaptation and reveal the impact of bilingual prompting on LLM performance. By providing a high-quality dataset and benchmarking results, this work lays a foundation for advancing Arabic event coreference research and supports future developments in event relation extraction.

AB - Event coreference resolution is a critical task in Natural Language Processing (NLP), enabling applications such as information extraction, text summarization, and question answering. However, resolving event coreference in Arabic presents unique challenges due to the language's rich morphology, complex syntax, and lack of annotated resources. This article introduces AraEventCoref, the first publicly available Arabic event coreference dataset, comprising 50 annotated news articles with 1,381 events and 159 coreference chains. The dataset's annotation agreement achieved a CoNLL score of 75.8%, ensuring high reliability across B3, MUC, and CEAFe metrics. Additionally, event triggers were annotated with an inter-annotator agreement of 96% using Cohen's Kappa, further validating dataset quality. To establish benchmarks, we developed a fine-tuned CamelBERT-msa model as a strong baseline and evaluated state-of-the-art Arabic large language models (LLMs) using both bilingual and Arabic-only prompts. Results demonstrate the effectiveness of fine-tuning for domain-specific adaptation and reveal the impact of bilingual prompting on LLM performance. By providing a high-quality dataset and benchmarking results, this work lays a foundation for advancing Arabic event coreference research and supports future developments in event relation extraction.

KW - Arabic event

KW - Arabic event coreference

KW - Arabic event relation extraction

UR - http://www.scopus.com/inward/record.url?scp=105012404196&partnerID=8YFLogxK

U2 - 10.1145/3743047

DO - 10.1145/3743047

M3 - Article

AN - SCOPUS:105012404196

SN - 2375-4699

VL - 24

JO - ACM Transactions on Asian and Low-Resource Language Information Processing

JF - ACM Transactions on Asian and Low-Resource Language Information Processing

IS - 7

M1 - 67

ER -

AraEventCoref: An Arabic Event Coreference Dataset and LLM Benchmarks

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this