Comparing Open Arabic Named Entity Recognition Tools

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Scopus citations

Abstract

The main objective of this paper is to compare and evaluate the performances of three open Arabic Named Entity Recognition (NER) tools: CAMeL, Hatmi, and Stanza. We collected a corpus consisting of 30 articles written in Modern Standard Arabic (MSA) and manually annotated all the entities of the person, organization, and location types at the article (document) level. Our results suggest a similarity between Stanza and Hatmi with the latter receiving the highest F1 score for the three entity types. However, CAMeL achieved the highest precision values for names of people and organizations. Following this, we implemented a 'merge' method that combined the results from the three tools and a 'vote' method that tagged named entities only when two of the three identified them as entities. Our results showed that merging achieved the highest overall F1 scores. Moreover, merging had the highest recall values while voting had the highest precision values for the three entity types. This indicates that merging is more suitable when recall is desired, while voting is optimal when precision is required. Finally, we collected a corpus of 21,635 articles related to COVID-19 and applied the merge and vote methods. Our analysis demonstrates the tradeoff between precision and recall for the two methods.

Original languageEnglish
Title of host publicationProceedings - 2023 IEEE 24th International Conference on Information Reuse and Integration for Data Science, IRI 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages46-51
Number of pages6
ISBN (Electronic)9798350334586
DOIs
StatePublished - 2023
Event24th IEEE International Conference on Information Reuse and Integration for Data Science, IRI 2023 - Bellevue, United States
Duration: 4 Aug 20236 Aug 2023

Publication series

NameProceedings - 2023 IEEE 24th International Conference on Information Reuse and Integration for Data Science, IRI 2023

Conference

Conference24th IEEE International Conference on Information Reuse and Integration for Data Science, IRI 2023
Country/TerritoryUnited States
CityBellevue
Period4/08/236/08/23

Keywords

  • Named Entity Recognition
  • Natural Language Processing
  • Platforms and Tools
  • Software and Systems Reuse and Reusability

Fingerprint

Dive into the research topics of 'Comparing Open Arabic Named Entity Recognition Tools'. Together they form a unique fingerprint.

Cite this