Phishing Detection in Arabic SMS Messages using Natural Language Processing

Alya Ibrahim; Sarah Alyousef; Hayfa Alajmi; Rana Aldossari; Fatma Masmoudi

doi:10.1109/WiDS-PSU61003.2024.00040

Phishing Detection in Arabic SMS Messages using Natural Language Processing

Alya Ibrahim, Sarah Alyousef, Hayfa Alajmi, Rana Aldossari, Fatma Masmoudi

Prince Sattam Bin Abdulaziz University

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

2 Scopus citations

Abstract

Mobile phone integration into daily life has elevated Short Message Service (SMS) to a crucial tool for communication. Users receive text messages from banks, electronic government services, businesses, and payment services to verify their identities. Which makes them a source of manipulation to gain access to personal data. This study proposes a technique that detects Arabic phishing messages using natural language processing and a random forest classifier. The performance of the random forest classifier is compared with other machine learning algorithms, namely, K-Nearest Neighbors (KNN), AdaBoost, and Logistic Regression. According to all evaluation matrices, the random forest classifier has outperformed other classifiers. The model was trained with 638 phishing messages and 4844 legitimate ones. The experimental outcomes indicate that the proposed approach has obtained an accuracy of 98.66%, 99.10% precision, 98.23% recall, and 98.67% F1 score.

Original language	English
Title of host publication	Proceedings - 2024 7th International Women in Data Science Conference at Prince Sultan University, WiDS-PSU 2024
Editors	Amjad Rehm, Ahmad Taher Azar, Tanzila Saba
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	141-146
Number of pages	6
ISBN (Electronic)	9798350395839
DOIs	https://doi.org/10.1109/WiDS-PSU61003.2024.00040
State	Published - 2024
Event	7th International Women in Data Science Conference at Prince Sultan University, WiDS-PSU 2024 - Riyadh, Saudi Arabia Duration: 3 Mar 2024 → 4 Mar 2024

Publication series

Name	Proceedings - 2024 7th International Women in Data Science Conference at Prince Sultan University, WiDS-PSU 2024

Conference

Conference	7th International Women in Data Science Conference at Prince Sultan University, WiDS-PSU 2024
Country/Territory	Saudi Arabia
City	Riyadh
Period	3/03/24 → 4/03/24

Keywords

cyber security
Machine learning
natural language processing
random forest classifier
SMS phishing

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

Access to Document

10.1109/WiDS-PSU61003.2024.00040

Cite this

Ibrahim, A., Alyousef, S., Alajmi, H., Aldossari, R., & Masmoudi, F. (2024). Phishing Detection in Arabic SMS Messages using Natural Language Processing. In A. Rehm, A. T. Azar, & T. Saba (Eds.), Proceedings - 2024 7th International Women in Data Science Conference at Prince Sultan University, WiDS-PSU 2024 (pp. 141-146). (Proceedings - 2024 7th International Women in Data Science Conference at Prince Sultan University, WiDS-PSU 2024). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/WiDS-PSU61003.2024.00040

Ibrahim, Alya ; Alyousef, Sarah ; Alajmi, Hayfa et al. / Phishing Detection in Arabic SMS Messages using Natural Language Processing. Proceedings - 2024 7th International Women in Data Science Conference at Prince Sultan University, WiDS-PSU 2024. editor / Amjad Rehm ; Ahmad Taher Azar ; Tanzila Saba. Institute of Electrical and Electronics Engineers Inc., 2024. pp. 141-146 (Proceedings - 2024 7th International Women in Data Science Conference at Prince Sultan University, WiDS-PSU 2024).

@inproceedings{e93b79d8a5a249aa99acc734b03ddf45,

title = "Phishing Detection in Arabic SMS Messages using Natural Language Processing",

abstract = "Mobile phone integration into daily life has elevated Short Message Service (SMS) to a crucial tool for communication. Users receive text messages from banks, electronic government services, businesses, and payment services to verify their identities. Which makes them a source of manipulation to gain access to personal data. This study proposes a technique that detects Arabic phishing messages using natural language processing and a random forest classifier. The performance of the random forest classifier is compared with other machine learning algorithms, namely, K-Nearest Neighbors (KNN), AdaBoost, and Logistic Regression. According to all evaluation matrices, the random forest classifier has outperformed other classifiers. The model was trained with 638 phishing messages and 4844 legitimate ones. The experimental outcomes indicate that the proposed approach has obtained an accuracy of 98.66\%, 99.10\% precision, 98.23\% recall, and 98.67\% F1 score.",

keywords = "cyber security, Machine learning, natural language processing, random forest classifier, SMS phishing",

author = "Alya Ibrahim and Sarah Alyousef and Hayfa Alajmi and Rana Aldossari and Fatma Masmoudi",

note = "Publisher Copyright: {\textcopyright} 2024 IEEE.; 7th International Women in Data Science Conference at Prince Sultan University, WiDS-PSU 2024 ; Conference date: 03-03-2024 Through 04-03-2024",

year = "2024",

doi = "10.1109/WiDS-PSU61003.2024.00040",

language = "English",

series = "Proceedings - 2024 7th International Women in Data Science Conference at Prince Sultan University, WiDS-PSU 2024",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "141--146",

editor = "Amjad Rehm and Azar, \{Ahmad Taher\} and Tanzila Saba",

booktitle = "Proceedings - 2024 7th International Women in Data Science Conference at Prince Sultan University, WiDS-PSU 2024",

address = "United States",

}

Ibrahim, A, Alyousef, S, Alajmi, H, Aldossari, R & Masmoudi, F 2024, Phishing Detection in Arabic SMS Messages using Natural Language Processing. in A Rehm, AT Azar & T Saba (eds), Proceedings - 2024 7th International Women in Data Science Conference at Prince Sultan University, WiDS-PSU 2024. Proceedings - 2024 7th International Women in Data Science Conference at Prince Sultan University, WiDS-PSU 2024, Institute of Electrical and Electronics Engineers Inc., pp. 141-146, 7th International Women in Data Science Conference at Prince Sultan University, WiDS-PSU 2024, Riyadh, Saudi Arabia, 3/03/24. https://doi.org/10.1109/WiDS-PSU61003.2024.00040

Phishing Detection in Arabic SMS Messages using Natural Language Processing. / Ibrahim, Alya; Alyousef, Sarah; Alajmi, Hayfa et al.
Proceedings - 2024 7th International Women in Data Science Conference at Prince Sultan University, WiDS-PSU 2024. ed. / Amjad Rehm; Ahmad Taher Azar; Tanzila Saba. Institute of Electrical and Electronics Engineers Inc., 2024. p. 141-146 (Proceedings - 2024 7th International Women in Data Science Conference at Prince Sultan University, WiDS-PSU 2024).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Phishing Detection in Arabic SMS Messages using Natural Language Processing

AU - Ibrahim, Alya

AU - Alyousef, Sarah

AU - Alajmi, Hayfa

AU - Aldossari, Rana

AU - Masmoudi, Fatma

PY - 2024

Y1 - 2024

N2 - Mobile phone integration into daily life has elevated Short Message Service (SMS) to a crucial tool for communication. Users receive text messages from banks, electronic government services, businesses, and payment services to verify their identities. Which makes them a source of manipulation to gain access to personal data. This study proposes a technique that detects Arabic phishing messages using natural language processing and a random forest classifier. The performance of the random forest classifier is compared with other machine learning algorithms, namely, K-Nearest Neighbors (KNN), AdaBoost, and Logistic Regression. According to all evaluation matrices, the random forest classifier has outperformed other classifiers. The model was trained with 638 phishing messages and 4844 legitimate ones. The experimental outcomes indicate that the proposed approach has obtained an accuracy of 98.66%, 99.10% precision, 98.23% recall, and 98.67% F1 score.

AB - Mobile phone integration into daily life has elevated Short Message Service (SMS) to a crucial tool for communication. Users receive text messages from banks, electronic government services, businesses, and payment services to verify their identities. Which makes them a source of manipulation to gain access to personal data. This study proposes a technique that detects Arabic phishing messages using natural language processing and a random forest classifier. The performance of the random forest classifier is compared with other machine learning algorithms, namely, K-Nearest Neighbors (KNN), AdaBoost, and Logistic Regression. According to all evaluation matrices, the random forest classifier has outperformed other classifiers. The model was trained with 638 phishing messages and 4844 legitimate ones. The experimental outcomes indicate that the proposed approach has obtained an accuracy of 98.66%, 99.10% precision, 98.23% recall, and 98.67% F1 score.

KW - cyber security

KW - Machine learning

KW - natural language processing

KW - random forest classifier

KW - SMS phishing

UR - https://www.scopus.com/pages/publications/85198643051

U2 - 10.1109/WiDS-PSU61003.2024.00040

DO - 10.1109/WiDS-PSU61003.2024.00040

M3 - Conference contribution

AN - SCOPUS:85198643051

T3 - Proceedings - 2024 7th International Women in Data Science Conference at Prince Sultan University, WiDS-PSU 2024

SP - 141

EP - 146

BT - Proceedings - 2024 7th International Women in Data Science Conference at Prince Sultan University, WiDS-PSU 2024

A2 - Rehm, Amjad

A2 - Azar, Ahmad Taher

A2 - Saba, Tanzila

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 7th International Women in Data Science Conference at Prince Sultan University, WiDS-PSU 2024

Y2 - 3 March 2024 through 4 March 2024

ER -

Ibrahim A, Alyousef S, Alajmi H, Aldossari R , Masmoudi F. Phishing Detection in Arabic SMS Messages using Natural Language Processing. In Rehm A, Azar AT, Saba T, editors, Proceedings - 2024 7th International Women in Data Science Conference at Prince Sultan University, WiDS-PSU 2024. Institute of Electrical and Electronics Engineers Inc. 2024. p. 141-146. (Proceedings - 2024 7th International Women in Data Science Conference at Prince Sultan University, WiDS-PSU 2024). doi: 10.1109/WiDS-PSU61003.2024.00040

Phishing Detection in Arabic SMS Messages using Natural Language Processing

Abstract

Publication series

Conference

Keywords

UN SDGs

Access to Document

Other files and links

Fingerprint

Cite this