TY - GEN
T1 - A Survey of Natural Language Processing for Classification of Saudi Arabic Dialect
T2 - 7th EAI International Conference on Emerging Technologies in Computing, iCETiC 2024
AU - Aftan, Sulaiman
AU - Zhuang, Yu
AU - Aseeri, Ahmad O.
AU - Shah, Habib
N1 - Publisher Copyright:
© ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2026.
PY - 2026
Y1 - 2026
N2 - Multiple areas of artificial intelligence, such as machine learning, deep neural networks, and large language models (LLMs), have greatly influenced human communication domains via natural language processing (NLP) technologies, including text generation, translation, text analysis, sentiment analysis, etc., across various languages including English, Arabic, and others. Arabic is particularly influential among these languages, with approximately 300 million speakers worldwide, leading to Arabic Natural Language Processing (ANLP). ANLP has emerged as a successful NLP area, particularly in dialect classification, generation, and translation, with the Saudi Dialect (SD) being a notable focus due to its value in the Middle East. Various researchers have effectively utilized different types of NLP architectures across different domains, ranging from everyday use to social and business platforms, to address the challenges and applications associated with SD. This survey aims to review and summarize five years of research in this field, from 2020 to 2024, showcasing the successes achieved and identifying research opportunities to enhance the understanding and utilization of NLP in diverse SD scenarios. Additionally, the survey will shed light on the challenges encountered in acquiring SD datasets for efficient analysis using different NLP methodologies.
AB - Multiple areas of artificial intelligence, such as machine learning, deep neural networks, and large language models (LLMs), have greatly influenced human communication domains via natural language processing (NLP) technologies, including text generation, translation, text analysis, sentiment analysis, etc., across various languages including English, Arabic, and others. Arabic is particularly influential among these languages, with approximately 300 million speakers worldwide, leading to Arabic Natural Language Processing (ANLP). ANLP has emerged as a successful NLP area, particularly in dialect classification, generation, and translation, with the Saudi Dialect (SD) being a notable focus due to its value in the Middle East. Various researchers have effectively utilized different types of NLP architectures across different domains, ranging from everyday use to social and business platforms, to address the challenges and applications associated with SD. This survey aims to review and summarize five years of research in this field, from 2020 to 2024, showcasing the successes achieved and identifying research opportunities to enhance the understanding and utilization of NLP in diverse SD scenarios. Additionally, the survey will shed light on the challenges encountered in acquiring SD datasets for efficient analysis using different NLP methodologies.
KW - DL
KW - NLP Survey
KW - Saudi and Arabic Dialect Classification
UR - http://www.scopus.com/inward/record.url?scp=105011987015&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-92625-9_8
DO - 10.1007/978-3-031-92625-9_8
M3 - Conference contribution
AN - SCOPUS:105011987015
SN - 9783031926242
T3 - Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, LNICST
SP - 105
EP - 124
BT - Emerging Technologies in Computing - 7th EAI International Conference, iCETiC 2024, Proceedings
A2 - Miraz, Mahdi H.
A2 - Miraz, Mahdi H.
A2 - Ware, Andrew
A2 - Southall, Garfield
A2 - Ali, Maaruf
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 15 August 2024 through 16 August 2024
ER -