TY - GEN
T1 - Pre-indexing techniques in Arabic information retrieval
AU - Guirat, Souheila Ben
AU - Bounhas, Ibrahim
AU - Slimani, Yahia
N1 - Publisher Copyright:
Copyright © 2019 by SCITEPRESS - Science and Technology Publications, Lda. All rights reserved
PY - 2019
Y1 - 2019
N2 - Arabic document indexing is yet challenging given the morphological specificities of this language. Although there has been much effort in the field, developing more efficient indexing approaches is more and more demanding. One of the most important issues concerns the choice of the indexing units (e.g. stems, roots, lemmas, etc.) which both enhances retrieval efficiency and optimizes the indexing process. The question is how to process Arabic texts to retrieve the basic forms which better reflect the meaning of words and documents? In the literature several indexing units have been compared, while combining multiple indexes seems to be promising. In our previous works, we showed that hybrid indexes based on stems, patterns and roots enhances results. However, we need to find the optimal weight of each indexing unit. Therefore, this paper proposes to contribute in optimizing hybrid indexing. We compare and evaluate four pre-indexing methods.
AB - Arabic document indexing is yet challenging given the morphological specificities of this language. Although there has been much effort in the field, developing more efficient indexing approaches is more and more demanding. One of the most important issues concerns the choice of the indexing units (e.g. stems, roots, lemmas, etc.) which both enhances retrieval efficiency and optimizes the indexing process. The question is how to process Arabic texts to retrieve the basic forms which better reflect the meaning of words and documents? In the literature several indexing units have been compared, while combining multiple indexes seems to be promising. In our previous works, we showed that hybrid indexes based on stems, patterns and roots enhances results. However, we need to find the optimal weight of each indexing unit. Therefore, this paper proposes to contribute in optimizing hybrid indexing. We compare and evaluate four pre-indexing methods.
KW - Arabic Information Retrieval
KW - Hybrid Index
KW - Smoothing
KW - Statistical Modeling
UR - https://www.scopus.com/pages/publications/85064806665
U2 - 10.5220/0007393402370246
DO - 10.5220/0007393402370246
M3 - Conference contribution
AN - SCOPUS:85064806665
T3 - ICAART 2019 - Proceedings of the 11th International Conference on Agents and Artificial Intelligence
SP - 237
EP - 246
BT - ICAART 2019 - Proceedings of the 11th International Conference on Agents and Artificial Intelligence
A2 - Rocha, Ana
A2 - Steels, Luc
A2 - van den Herik, Jaap
PB - SciTePress
T2 - 11th International Conference on Agents and Artificial Intelligence, ICAART 2019
Y2 - 19 February 2019 through 21 February 2019
ER -