TY - GEN
T1 - Phishing Detection in Arabic SMS Messages using Natural Language Processing
AU - Ibrahim, Alya
AU - Alyousef, Sarah
AU - Alajmi, Hayfa
AU - Aldossari, Rana
AU - Masmoudi, Fatma
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Mobile phone integration into daily life has elevated Short Message Service (SMS) to a crucial tool for communication. Users receive text messages from banks, electronic government services, businesses, and payment services to verify their identities. Which makes them a source of manipulation to gain access to personal data. This study proposes a technique that detects Arabic phishing messages using natural language processing and a random forest classifier. The performance of the random forest classifier is compared with other machine learning algorithms, namely, K-Nearest Neighbors (KNN), AdaBoost, and Logistic Regression. According to all evaluation matrices, the random forest classifier has outperformed other classifiers. The model was trained with 638 phishing messages and 4844 legitimate ones. The experimental outcomes indicate that the proposed approach has obtained an accuracy of 98.66%, 99.10% precision, 98.23% recall, and 98.67% F1 score.
AB - Mobile phone integration into daily life has elevated Short Message Service (SMS) to a crucial tool for communication. Users receive text messages from banks, electronic government services, businesses, and payment services to verify their identities. Which makes them a source of manipulation to gain access to personal data. This study proposes a technique that detects Arabic phishing messages using natural language processing and a random forest classifier. The performance of the random forest classifier is compared with other machine learning algorithms, namely, K-Nearest Neighbors (KNN), AdaBoost, and Logistic Regression. According to all evaluation matrices, the random forest classifier has outperformed other classifiers. The model was trained with 638 phishing messages and 4844 legitimate ones. The experimental outcomes indicate that the proposed approach has obtained an accuracy of 98.66%, 99.10% precision, 98.23% recall, and 98.67% F1 score.
KW - cyber security
KW - Machine learning
KW - natural language processing
KW - random forest classifier
KW - SMS phishing
UR - http://www.scopus.com/inward/record.url?scp=85198643051&partnerID=8YFLogxK
U2 - 10.1109/WiDS-PSU61003.2024.00040
DO - 10.1109/WiDS-PSU61003.2024.00040
M3 - Conference contribution
AN - SCOPUS:85198643051
T3 - Proceedings - 2024 7th International Women in Data Science Conference at Prince Sultan University, WiDS-PSU 2024
SP - 141
EP - 146
BT - Proceedings - 2024 7th International Women in Data Science Conference at Prince Sultan University, WiDS-PSU 2024
A2 - Rehm, Amjad
A2 - Azar, Ahmad Taher
A2 - Saba, Tanzila
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 7th International Women in Data Science Conference at Prince Sultan University, WiDS-PSU 2024
Y2 - 3 March 2024 through 4 March 2024
ER -