Rumor detection in Arabic tweets using semi-supervised and unsupervised expectation–maximization

Samah M. Alzanin; Aqil M. Azmi

doi:10.1016/j.knosys.2019.104945

Rumor detection in Arabic tweets using semi-supervised and unsupervised expectation–maximization

Samah M. Alzanin
, Aqil M. Azmi

King Saud University

Research output: Contribution to journal › Article › peer-review

72 Scopus citations

Abstract

With the continued development of social networks, the spreading of information has become faster than ever. Consequently, this has resulted in a problem with the reliability of the information, where any user can publish whatever he/she wants. Automated systems capable of detecting fake contents with similar striking speed as the information being disseminated are urgently required. Detecting rumors in Arabic language social networks has lagged behind the work on other languages, particularly in English. In this paper, we address the problem of detecting rumors in Arabic tweets. We used a set of features extracted from the user and the content. These features were analyzed to determine their significance. Semi-supervised expectation–maximization (E–M) was used to train the proposed system with topics of newsworthy tweets. A comparison with supervised Gaussian Naïve Bayes (NB) showed that our semi-supervised system, using a small base of labeled data, outperforms Gaussian NB achieving an accuracy of 78.6%. The performance of the unsupervised E–M depends on the initial values, and we achieved an F₁ score of 80% in one of our experiments.

Original language	English
Article number	104945
Journal	Knowledge-Based Systems
Volume	185
DOIs	https://doi.org/10.1016/j.knosys.2019.104945
State	Published - 1 Dec 2019
Externally published	Yes

Keywords

Arabic
Expectation–maximization
Rumor detection
Semi-supervised
Twitter
Unsupervised

Access to Document

10.1016/j.knosys.2019.104945

Cite this

@article{5317ce13f3ac4f37ad4c88c8e7af820f,

title = "Rumor detection in Arabic tweets using semi-supervised and unsupervised expectation–maximization",

abstract = "With the continued development of social networks, the spreading of information has become faster than ever. Consequently, this has resulted in a problem with the reliability of the information, where any user can publish whatever he/she wants. Automated systems capable of detecting fake contents with similar striking speed as the information being disseminated are urgently required. Detecting rumors in Arabic language social networks has lagged behind the work on other languages, particularly in English. In this paper, we address the problem of detecting rumors in Arabic tweets. We used a set of features extracted from the user and the content. These features were analyzed to determine their significance. Semi-supervised expectation–maximization (E–M) was used to train the proposed system with topics of newsworthy tweets. A comparison with supervised Gaussian Na{\"i}ve Bayes (NB) showed that our semi-supervised system, using a small base of labeled data, outperforms Gaussian NB achieving an accuracy of 78.6\%. The performance of the unsupervised E–M depends on the initial values, and we achieved an F1 score of 80\% in one of our experiments.",

keywords = "Arabic, Expectation–maximization, Rumor detection, Semi-supervised, Twitter, Unsupervised",

author = "Alzanin, \{Samah M.\} and Azmi, \{Aqil M.\}",

note = "Publisher Copyright: {\textcopyright} 2019 Elsevier B.V.",

year = "2019",

month = dec,

day = "1",

doi = "10.1016/j.knosys.2019.104945",

language = "English",

volume = "185",

journal = "Knowledge-Based Systems",

issn = "0950-7051",

publisher = "Elsevier B.V.",

}

TY - JOUR

T1 - Rumor detection in Arabic tweets using semi-supervised and unsupervised expectation–maximization

AU - Alzanin, Samah M.

AU - Azmi, Aqil M.

PY - 2019/12/1

Y1 - 2019/12/1

N2 - With the continued development of social networks, the spreading of information has become faster than ever. Consequently, this has resulted in a problem with the reliability of the information, where any user can publish whatever he/she wants. Automated systems capable of detecting fake contents with similar striking speed as the information being disseminated are urgently required. Detecting rumors in Arabic language social networks has lagged behind the work on other languages, particularly in English. In this paper, we address the problem of detecting rumors in Arabic tweets. We used a set of features extracted from the user and the content. These features were analyzed to determine their significance. Semi-supervised expectation–maximization (E–M) was used to train the proposed system with topics of newsworthy tweets. A comparison with supervised Gaussian Naïve Bayes (NB) showed that our semi-supervised system, using a small base of labeled data, outperforms Gaussian NB achieving an accuracy of 78.6%. The performance of the unsupervised E–M depends on the initial values, and we achieved an F1 score of 80% in one of our experiments.

AB - With the continued development of social networks, the spreading of information has become faster than ever. Consequently, this has resulted in a problem with the reliability of the information, where any user can publish whatever he/she wants. Automated systems capable of detecting fake contents with similar striking speed as the information being disseminated are urgently required. Detecting rumors in Arabic language social networks has lagged behind the work on other languages, particularly in English. In this paper, we address the problem of detecting rumors in Arabic tweets. We used a set of features extracted from the user and the content. These features were analyzed to determine their significance. Semi-supervised expectation–maximization (E–M) was used to train the proposed system with topics of newsworthy tweets. A comparison with supervised Gaussian Naïve Bayes (NB) showed that our semi-supervised system, using a small base of labeled data, outperforms Gaussian NB achieving an accuracy of 78.6%. The performance of the unsupervised E–M depends on the initial values, and we achieved an F1 score of 80% in one of our experiments.

KW - Arabic

KW - Expectation–maximization

KW - Rumor detection

KW - Semi-supervised

KW - Twitter

KW - Unsupervised

UR - https://www.scopus.com/pages/publications/85070707601

U2 - 10.1016/j.knosys.2019.104945

DO - 10.1016/j.knosys.2019.104945

M3 - Article

AN - SCOPUS:85070707601

SN - 0950-7051

VL - 185

JO - Knowledge-Based Systems

JF - Knowledge-Based Systems

M1 - 104945

ER -

Rumor detection in Arabic tweets using semi-supervised and unsupervised expectation–maximization

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this