TY - JOUR
T1 - Similarities between Arabic dialects
T2 - Investigating geographical proximity
AU - Alsudais, Abdulkareem
AU - Alotaibi, Wafa
AU - Alomary, Faye
N1 - Publisher Copyright:
© 2021 Elsevier Ltd
PY - 2022/1
Y1 - 2022/1
N2 - The automatic classification of Arabic dialects is an ongoing research challenge, which has been explored in recent work that defines dialects based on increasingly limited geographic areas like cities and provinces. This paper focuses on a related, yet relatively unexplored topic: the effects of the geographical proximity of cities located in Arab countries on their dialectal similarity. Our work is twofold, reliant on: (1) comparing the textual similarities between dialects using cosine similarity and (2) measuring the geographical distance between locations. We study MADAR and NADI, two established datasets with Arabic dialects from many cities and provinces. Our results indicate that cities located in different countries may in fact have more dialectal similarity than cities within the same country, depending on their geographical proximity. The correlation between dialectal similarity and city proximity suggests that cities that are closer together are more likely to share dialectal attributes, regardless of country borders. This nuance provides the potential for important advancements in Arabic dialect research because it indicates that a more granular approach to dialect classification is essential to understanding how to frame the problem of Arabic dialect identification.
AB - The automatic classification of Arabic dialects is an ongoing research challenge, which has been explored in recent work that defines dialects based on increasingly limited geographic areas like cities and provinces. This paper focuses on a related, yet relatively unexplored topic: the effects of the geographical proximity of cities located in Arab countries on their dialectal similarity. Our work is twofold, reliant on: (1) comparing the textual similarities between dialects using cosine similarity and (2) measuring the geographical distance between locations. We study MADAR and NADI, two established datasets with Arabic dialects from many cities and provinces. Our results indicate that cities located in different countries may in fact have more dialectal similarity than cities within the same country, depending on their geographical proximity. The correlation between dialectal similarity and city proximity suggests that cities that are closer together are more likely to share dialectal attributes, regardless of country borders. This nuance provides the potential for important advancements in Arabic dialect research because it indicates that a more granular approach to dialect classification is essential to understanding how to frame the problem of Arabic dialect identification.
KW - Arabic dialects
KW - Arabic natural language processing
KW - Geolocation
KW - Textual similarity
UR - https://www.scopus.com/pages/publications/85116067593
U2 - 10.1016/j.ipm.2021.102770
DO - 10.1016/j.ipm.2021.102770
M3 - Article
AN - SCOPUS:85116067593
SN - 0306-4573
VL - 59
JO - Information Processing and Management
JF - Information Processing and Management
IS - 1
M1 - 102770
ER -