MULTI-CLASS SPOKEN LANGUAGE DETECTION USING ARTIFICIAL INTELLIGENCE with FRACTAL AL-BIRUNI EARTH RADIUS OPTIMIZATION ALGORITHM

Najla I. Al-Shathry; Majdy M. Eltahir; Somia A. Asklany; Sami A. Al Ghamdi; Abdullah Almuhaimeed; Fuhid Alanazi; Abdelmoneim Ali Mohamed; Mohammed Rizwanullah

doi:10.1142/S0218348X25400547

MULTI-CLASS SPOKEN LANGUAGE DETECTION USING ARTIFICIAL INTELLIGENCE with FRACTAL AL-BIRUNI EARTH RADIUS OPTIMIZATION ALGORITHM

Najla I. Al-Shathry
, Majdy M. Eltahir
, Somia A. Asklany
, Sami A. Al Ghamdi
, Abdullah Almuhaimeed
, Fuhid Alanazi
, Abdelmoneim Ali Mohamed
, Mohammed Rizwanullah

Research output: Contribution to journal › Article › peer-review

Abstract

Spoken Language Identification (SLID) is the problem of categorizing the language spoken by a speaker in the audio clips. SLID is valuable in multi-language speech recognition systems, personalized voice assistants, and automated speech translation systems in call centers to automatically route calls to the language operator. A primary challenge is the language detection from audio with different noise levels and sampling rates, accurately and with a short delay. A further problem is to differentiate between short-duration languages. Previous research works have applied SLID's lexical, phonetic, phonotactic, and prosodic features. Spoken language detection using deep learning (DL) usually includes training RNN or CNN approaches on audio features such as spectrograms or MFCCs to categorize the language spoken in audio samples. Pioneering methodologies, such as CNN-RNN transformers or hybrids, can capture the spatial and temporal features for better performance. This paper presents a Multi-Class Spoken Language Detection using Artificial Intelligence with Fractal Al-Biruni Earth Radius Optimization (MCSLD-AIBER) technique. The MCSLD-AIBER technique mainly aims to identify the various classes of spoken languages. In the MCSLD-AIBER technique, the Constant-Q Transform (CQT) approach is applied to transform the speech signals. Additionally, the MCSLD-AIBER technique employs Inception with a Residual Network model for the feature extraction process. Moreover, the hyperparameters can be adjusted using the BER approach. A long short-term memory (LSTM) network can be utilized to identify multiple spoken languages. A set of experiments were involved to illustrate the efficient performance of the MCSLD-AIBER technique. The simulation outcomes indicated that the MCSLD-AIBER method performs optimally over other models.

Original language	English
Article number	2540054
Journal	Fractals
Volume	32
Issue number	9-10
DOIs	https://doi.org/10.1142/S0218348X25400547
State	Published - 2024
Externally published	Yes

Keywords

Artificial Intelligence
Complex Systems
Constant-Q Transform
Feature Extraction
Fractal Optimization
Hyperparameter Selection
Spoken Language Detection

Access to Document

10.1142/S0218348X25400547

Cite this

@article{f5bb7378bb5f463ea327a2298109ce68,

title = "MULTI-CLASS SPOKEN LANGUAGE DETECTION USING ARTIFICIAL INTELLIGENCE with FRACTAL AL-BIRUNI EARTH RADIUS OPTIMIZATION ALGORITHM",

abstract = "Spoken Language Identification (SLID) is the problem of categorizing the language spoken by a speaker in the audio clips. SLID is valuable in multi-language speech recognition systems, personalized voice assistants, and automated speech translation systems in call centers to automatically route calls to the language operator. A primary challenge is the language detection from audio with different noise levels and sampling rates, accurately and with a short delay. A further problem is to differentiate between short-duration languages. Previous research works have applied SLID's lexical, phonetic, phonotactic, and prosodic features. Spoken language detection using deep learning (DL) usually includes training RNN or CNN approaches on audio features such as spectrograms or MFCCs to categorize the language spoken in audio samples. Pioneering methodologies, such as CNN-RNN transformers or hybrids, can capture the spatial and temporal features for better performance. This paper presents a Multi-Class Spoken Language Detection using Artificial Intelligence with Fractal Al-Biruni Earth Radius Optimization (MCSLD-AIBER) technique. The MCSLD-AIBER technique mainly aims to identify the various classes of spoken languages. In the MCSLD-AIBER technique, the Constant-Q Transform (CQT) approach is applied to transform the speech signals. Additionally, the MCSLD-AIBER technique employs Inception with a Residual Network model for the feature extraction process. Moreover, the hyperparameters can be adjusted using the BER approach. A long short-term memory (LSTM) network can be utilized to identify multiple spoken languages. A set of experiments were involved to illustrate the efficient performance of the MCSLD-AIBER technique. The simulation outcomes indicated that the MCSLD-AIBER method performs optimally over other models.",

keywords = "Artificial Intelligence, Complex Systems, Constant-Q Transform, Feature Extraction, Fractal Optimization, Hyperparameter Selection, Spoken Language Detection",

author = "Al-Shathry, \{Najla I.\} and Eltahir, \{Majdy M.\} and Asklany, \{Somia A.\} and \{Al Ghamdi\}, \{Sami A.\} and Abdullah Almuhaimeed and Fuhid Alanazi and Mohamed, \{Abdelmoneim Ali\} and Mohammed Rizwanullah",

note = "Publisher Copyright: {\textcopyright} 2024 The Author(s).",

year = "2024",

doi = "10.1142/S0218348X25400547",

language = "English",

volume = "32",

journal = "Fractals",

issn = "0218-348X",

publisher = "World Scientific",

number = "9-10",

}

TY - JOUR

T1 - MULTI-CLASS SPOKEN LANGUAGE DETECTION USING ARTIFICIAL INTELLIGENCE with FRACTAL AL-BIRUNI EARTH RADIUS OPTIMIZATION ALGORITHM

AU - Al-Shathry, Najla I.

AU - Eltahir, Majdy M.

AU - Asklany, Somia A.

AU - Al Ghamdi, Sami A.

AU - Almuhaimeed, Abdullah

AU - Alanazi, Fuhid

AU - Mohamed, Abdelmoneim Ali

AU - Rizwanullah, Mohammed

PY - 2024

Y1 - 2024

N2 - Spoken Language Identification (SLID) is the problem of categorizing the language spoken by a speaker in the audio clips. SLID is valuable in multi-language speech recognition systems, personalized voice assistants, and automated speech translation systems in call centers to automatically route calls to the language operator. A primary challenge is the language detection from audio with different noise levels and sampling rates, accurately and with a short delay. A further problem is to differentiate between short-duration languages. Previous research works have applied SLID's lexical, phonetic, phonotactic, and prosodic features. Spoken language detection using deep learning (DL) usually includes training RNN or CNN approaches on audio features such as spectrograms or MFCCs to categorize the language spoken in audio samples. Pioneering methodologies, such as CNN-RNN transformers or hybrids, can capture the spatial and temporal features for better performance. This paper presents a Multi-Class Spoken Language Detection using Artificial Intelligence with Fractal Al-Biruni Earth Radius Optimization (MCSLD-AIBER) technique. The MCSLD-AIBER technique mainly aims to identify the various classes of spoken languages. In the MCSLD-AIBER technique, the Constant-Q Transform (CQT) approach is applied to transform the speech signals. Additionally, the MCSLD-AIBER technique employs Inception with a Residual Network model for the feature extraction process. Moreover, the hyperparameters can be adjusted using the BER approach. A long short-term memory (LSTM) network can be utilized to identify multiple spoken languages. A set of experiments were involved to illustrate the efficient performance of the MCSLD-AIBER technique. The simulation outcomes indicated that the MCSLD-AIBER method performs optimally over other models.

AB - Spoken Language Identification (SLID) is the problem of categorizing the language spoken by a speaker in the audio clips. SLID is valuable in multi-language speech recognition systems, personalized voice assistants, and automated speech translation systems in call centers to automatically route calls to the language operator. A primary challenge is the language detection from audio with different noise levels and sampling rates, accurately and with a short delay. A further problem is to differentiate between short-duration languages. Previous research works have applied SLID's lexical, phonetic, phonotactic, and prosodic features. Spoken language detection using deep learning (DL) usually includes training RNN or CNN approaches on audio features such as spectrograms or MFCCs to categorize the language spoken in audio samples. Pioneering methodologies, such as CNN-RNN transformers or hybrids, can capture the spatial and temporal features for better performance. This paper presents a Multi-Class Spoken Language Detection using Artificial Intelligence with Fractal Al-Biruni Earth Radius Optimization (MCSLD-AIBER) technique. The MCSLD-AIBER technique mainly aims to identify the various classes of spoken languages. In the MCSLD-AIBER technique, the Constant-Q Transform (CQT) approach is applied to transform the speech signals. Additionally, the MCSLD-AIBER technique employs Inception with a Residual Network model for the feature extraction process. Moreover, the hyperparameters can be adjusted using the BER approach. A long short-term memory (LSTM) network can be utilized to identify multiple spoken languages. A set of experiments were involved to illustrate the efficient performance of the MCSLD-AIBER technique. The simulation outcomes indicated that the MCSLD-AIBER method performs optimally over other models.

KW - Artificial Intelligence

KW - Complex Systems

KW - Constant-Q Transform

KW - Feature Extraction

KW - Fractal Optimization

KW - Hyperparameter Selection

KW - Spoken Language Detection

UR - https://www.scopus.com/pages/publications/85212565681

U2 - 10.1142/S0218348X25400547

DO - 10.1142/S0218348X25400547

M3 - Article

AN - SCOPUS:85212565681

SN - 0218-348X

VL - 32

JO - Fractals

JF - Fractals

IS - 9-10

M1 - 2540054

ER -

MULTI-CLASS SPOKEN LANGUAGE DETECTION USING ARTIFICIAL INTELLIGENCE with FRACTAL AL-BIRUNI EARTH RADIUS OPTIMIZATION ALGORITHM

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this