Pre-indexing techniques in Arabic information retrieval

  • Souheila Ben Guirat
  • , Ibrahim Bounhas
  • , Yahia Slimani

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

Arabic document indexing is yet challenging given the morphological specificities of this language. Although there has been much effort in the field, developing more efficient indexing approaches is more and more demanding. One of the most important issues concerns the choice of the indexing units (e.g. stems, roots, lemmas, etc.) which both enhances retrieval efficiency and optimizes the indexing process. The question is how to process Arabic texts to retrieve the basic forms which better reflect the meaning of words and documents? In the literature several indexing units have been compared, while combining multiple indexes seems to be promising. In our previous works, we showed that hybrid indexes based on stems, patterns and roots enhances results. However, we need to find the optimal weight of each indexing unit. Therefore, this paper proposes to contribute in optimizing hybrid indexing. We compare and evaluate four pre-indexing methods.

Original languageEnglish
Title of host publicationICAART 2019 - Proceedings of the 11th International Conference on Agents and Artificial Intelligence
EditorsAna Rocha, Luc Steels, Jaap van den Herik
PublisherSciTePress
Pages237-246
Number of pages10
ISBN (Electronic)9789897583506
DOIs
StatePublished - 2019
Externally publishedYes
Event11th International Conference on Agents and Artificial Intelligence, ICAART 2019 - Prague, Czech Republic
Duration: 19 Feb 201921 Feb 2019

Publication series

NameICAART 2019 - Proceedings of the 11th International Conference on Agents and Artificial Intelligence
Volume2

Conference

Conference11th International Conference on Agents and Artificial Intelligence, ICAART 2019
Country/TerritoryCzech Republic
CityPrague
Period19/02/1921/02/19

Keywords

  • Arabic Information Retrieval
  • Hybrid Index
  • Smoothing
  • Statistical Modeling

Fingerprint

Dive into the research topics of 'Pre-indexing techniques in Arabic information retrieval'. Together they form a unique fingerprint.

Cite this