Deep Learning Driven Arabic Text to Speech Synthesizer for Visually Challenged People

Mrim M. Alnfiai; Nabil Almalki; Fahd N. Al-Wesabi; Mesfer Alduhayyem; Anwer Mustafa Hilal; Manar Ahmed Hamza

doi:10.32604/iasc.2023.034069

Deep Learning Driven Arabic Text to Speech Synthesizer for Visually Challenged People

Mrim M. Alnfiai
, Nabil Almalki
, Fahd N. Al-Wesabi
, Mesfer Alduhayyem
, Anwer Mustafa Hilal
, Manar Ahmed Hamza

Research output: Contribution to journal › Article › peer-review

7 Scopus citations

Abstract

Text-To-Speech (TTS) is a speech processing tool that is highly helpful for visually-challenged people. The TTS tool is applied to transform the texts into human-like sounds. However, it is highly challenging to accomplish the TTS outcomes for the non-diacritized text of the Arabic language since it has multiple unique features and rules. Some special characters like gemination and diacritic signs that correspondingly indicate consonant doubling and short vowels greatly impact the precise pronunciation of the Arabic language. But, such signs are not frequently used in the texts written in the Arabic language since its speakers and readers can guess them from the context itself. In this background, the current research article introduces an Optimal Deep Learning-driven Arab Text-to-Speech Synthesizer (ODLD-ATSS) model to help the visually-challenged people in the Kingdom of Saudi Arabia. The prime aim of the presented ODLD-ATSS model is to convert the text into speech signals for visually-challenged people. To attain this, the presented ODLD-ATSS model initially designs a Gated Recurrent Unit (GRU)-based prediction model for diacritic and gemination signs. Besides, the Buckwalter code is utilized to capture, store and display the Arabic texts. To improve the TSS performance of the GRU method, the Aquila Optimization Algorithm (AOA) is used, which shows the novelty of the work. To illustrate the enhanced performance of the proposed ODLD-ATSS model, further experimental analyses were conducted. The proposed model achieved a maximum accuracy of 96.35%, and the experimental outcomes infer the improved performance of the proposed ODLD-ATSS model over other DL-based TSS models.

Original language	English
Pages (from-to)	2639-2652
Number of pages	14
Journal	Intelligent Automation and Soft Computing
Volume	36
Issue number	3
DOIs	https://doi.org/10.32604/iasc.2023.034069
State	Published - 2023
Externally published	Yes

Keywords

Aquila optimizer
deep learning
gated recurrent unit
Saudi Arabia
visually challenged people

Access to Document

10.32604/iasc.2023.034069

Cite this

@article{3a93a49c8c404ce787fb9851fdff289e,

title = "Deep Learning Driven Arabic Text to Speech Synthesizer for Visually Challenged People",

abstract = "Text-To-Speech (TTS) is a speech processing tool that is highly helpful for visually-challenged people. The TTS tool is applied to transform the texts into human-like sounds. However, it is highly challenging to accomplish the TTS outcomes for the non-diacritized text of the Arabic language since it has multiple unique features and rules. Some special characters like gemination and diacritic signs that correspondingly indicate consonant doubling and short vowels greatly impact the precise pronunciation of the Arabic language. But, such signs are not frequently used in the texts written in the Arabic language since its speakers and readers can guess them from the context itself. In this background, the current research article introduces an Optimal Deep Learning-driven Arab Text-to-Speech Synthesizer (ODLD-ATSS) model to help the visually-challenged people in the Kingdom of Saudi Arabia. The prime aim of the presented ODLD-ATSS model is to convert the text into speech signals for visually-challenged people. To attain this, the presented ODLD-ATSS model initially designs a Gated Recurrent Unit (GRU)-based prediction model for diacritic and gemination signs. Besides, the Buckwalter code is utilized to capture, store and display the Arabic texts. To improve the TSS performance of the GRU method, the Aquila Optimization Algorithm (AOA) is used, which shows the novelty of the work. To illustrate the enhanced performance of the proposed ODLD-ATSS model, further experimental analyses were conducted. The proposed model achieved a maximum accuracy of 96.35\%, and the experimental outcomes infer the improved performance of the proposed ODLD-ATSS model over other DL-based TSS models.",

keywords = "Aquila optimizer, deep learning, gated recurrent unit, Saudi Arabia, visually challenged people",

author = "Alnfiai, \{Mrim M.\} and Nabil Almalki and Al-Wesabi, \{Fahd N.\} and Mesfer Alduhayyem and Hilal, \{Anwer Mustafa\} and Hamza, \{Manar Ahmed\}",

year = "2023",

doi = "10.32604/iasc.2023.034069",

language = "English",

volume = "36",

pages = "2639--2652",

journal = "Intelligent Automation and Soft Computing",

issn = "1079-8587",

publisher = "Tech Science Press",

number = "3",

}

TY - JOUR

T1 - Deep Learning Driven Arabic Text to Speech Synthesizer for Visually Challenged People

AU - Alnfiai, Mrim M.

AU - Almalki, Nabil

AU - Al-Wesabi, Fahd N.

AU - Alduhayyem, Mesfer

AU - Hilal, Anwer Mustafa

AU - Hamza, Manar Ahmed

PY - 2023

Y1 - 2023

N2 - Text-To-Speech (TTS) is a speech processing tool that is highly helpful for visually-challenged people. The TTS tool is applied to transform the texts into human-like sounds. However, it is highly challenging to accomplish the TTS outcomes for the non-diacritized text of the Arabic language since it has multiple unique features and rules. Some special characters like gemination and diacritic signs that correspondingly indicate consonant doubling and short vowels greatly impact the precise pronunciation of the Arabic language. But, such signs are not frequently used in the texts written in the Arabic language since its speakers and readers can guess them from the context itself. In this background, the current research article introduces an Optimal Deep Learning-driven Arab Text-to-Speech Synthesizer (ODLD-ATSS) model to help the visually-challenged people in the Kingdom of Saudi Arabia. The prime aim of the presented ODLD-ATSS model is to convert the text into speech signals for visually-challenged people. To attain this, the presented ODLD-ATSS model initially designs a Gated Recurrent Unit (GRU)-based prediction model for diacritic and gemination signs. Besides, the Buckwalter code is utilized to capture, store and display the Arabic texts. To improve the TSS performance of the GRU method, the Aquila Optimization Algorithm (AOA) is used, which shows the novelty of the work. To illustrate the enhanced performance of the proposed ODLD-ATSS model, further experimental analyses were conducted. The proposed model achieved a maximum accuracy of 96.35%, and the experimental outcomes infer the improved performance of the proposed ODLD-ATSS model over other DL-based TSS models.

AB - Text-To-Speech (TTS) is a speech processing tool that is highly helpful for visually-challenged people. The TTS tool is applied to transform the texts into human-like sounds. However, it is highly challenging to accomplish the TTS outcomes for the non-diacritized text of the Arabic language since it has multiple unique features and rules. Some special characters like gemination and diacritic signs that correspondingly indicate consonant doubling and short vowels greatly impact the precise pronunciation of the Arabic language. But, such signs are not frequently used in the texts written in the Arabic language since its speakers and readers can guess them from the context itself. In this background, the current research article introduces an Optimal Deep Learning-driven Arab Text-to-Speech Synthesizer (ODLD-ATSS) model to help the visually-challenged people in the Kingdom of Saudi Arabia. The prime aim of the presented ODLD-ATSS model is to convert the text into speech signals for visually-challenged people. To attain this, the presented ODLD-ATSS model initially designs a Gated Recurrent Unit (GRU)-based prediction model for diacritic and gemination signs. Besides, the Buckwalter code is utilized to capture, store and display the Arabic texts. To improve the TSS performance of the GRU method, the Aquila Optimization Algorithm (AOA) is used, which shows the novelty of the work. To illustrate the enhanced performance of the proposed ODLD-ATSS model, further experimental analyses were conducted. The proposed model achieved a maximum accuracy of 96.35%, and the experimental outcomes infer the improved performance of the proposed ODLD-ATSS model over other DL-based TSS models.

KW - Aquila optimizer

KW - deep learning

KW - gated recurrent unit

KW - Saudi Arabia

KW - visually challenged people

UR - https://www.scopus.com/pages/publications/85150883456

U2 - 10.32604/iasc.2023.034069

DO - 10.32604/iasc.2023.034069

M3 - Article

AN - SCOPUS:85150883456

SN - 1079-8587

VL - 36

SP - 2639

EP - 2652

JO - Intelligent Automation and Soft Computing

JF - Intelligent Automation and Soft Computing

IS - 3

ER -

Deep Learning Driven Arabic Text to Speech Synthesizer for Visually Challenged People

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this