TY - JOUR
T1 - MULTI-CLASS SPOKEN LANGUAGE DETECTION USING ARTIFICIAL INTELLIGENCE with FRACTAL AL-BIRUNI EARTH RADIUS OPTIMIZATION ALGORITHM
AU - Al-Shathry, Najla I.
AU - Eltahir, Majdy M.
AU - Asklany, Somia A.
AU - Al Ghamdi, Sami A.
AU - Almuhaimeed, Abdullah
AU - Alanazi, Fuhid
AU - Mohamed, Abdelmoneim Ali
AU - Rizwanullah, Mohammed
N1 - Publisher Copyright:
© 2024 The Author(s).
PY - 2024
Y1 - 2024
N2 - Spoken Language Identification (SLID) is the problem of categorizing the language spoken by a speaker in the audio clips. SLID is valuable in multi-language speech recognition systems, personalized voice assistants, and automated speech translation systems in call centers to automatically route calls to the language operator. A primary challenge is the language detection from audio with different noise levels and sampling rates, accurately and with a short delay. A further problem is to differentiate between short-duration languages. Previous research works have applied SLID's lexical, phonetic, phonotactic, and prosodic features. Spoken language detection using deep learning (DL) usually includes training RNN or CNN approaches on audio features such as spectrograms or MFCCs to categorize the language spoken in audio samples. Pioneering methodologies, such as CNN-RNN transformers or hybrids, can capture the spatial and temporal features for better performance. This paper presents a Multi-Class Spoken Language Detection using Artificial Intelligence with Fractal Al-Biruni Earth Radius Optimization (MCSLD-AIBER) technique. The MCSLD-AIBER technique mainly aims to identify the various classes of spoken languages. In the MCSLD-AIBER technique, the Constant-Q Transform (CQT) approach is applied to transform the speech signals. Additionally, the MCSLD-AIBER technique employs Inception with a Residual Network model for the feature extraction process. Moreover, the hyperparameters can be adjusted using the BER approach. A long short-term memory (LSTM) network can be utilized to identify multiple spoken languages. A set of experiments were involved to illustrate the efficient performance of the MCSLD-AIBER technique. The simulation outcomes indicated that the MCSLD-AIBER method performs optimally over other models.
AB - Spoken Language Identification (SLID) is the problem of categorizing the language spoken by a speaker in the audio clips. SLID is valuable in multi-language speech recognition systems, personalized voice assistants, and automated speech translation systems in call centers to automatically route calls to the language operator. A primary challenge is the language detection from audio with different noise levels and sampling rates, accurately and with a short delay. A further problem is to differentiate between short-duration languages. Previous research works have applied SLID's lexical, phonetic, phonotactic, and prosodic features. Spoken language detection using deep learning (DL) usually includes training RNN or CNN approaches on audio features such as spectrograms or MFCCs to categorize the language spoken in audio samples. Pioneering methodologies, such as CNN-RNN transformers or hybrids, can capture the spatial and temporal features for better performance. This paper presents a Multi-Class Spoken Language Detection using Artificial Intelligence with Fractal Al-Biruni Earth Radius Optimization (MCSLD-AIBER) technique. The MCSLD-AIBER technique mainly aims to identify the various classes of spoken languages. In the MCSLD-AIBER technique, the Constant-Q Transform (CQT) approach is applied to transform the speech signals. Additionally, the MCSLD-AIBER technique employs Inception with a Residual Network model for the feature extraction process. Moreover, the hyperparameters can be adjusted using the BER approach. A long short-term memory (LSTM) network can be utilized to identify multiple spoken languages. A set of experiments were involved to illustrate the efficient performance of the MCSLD-AIBER technique. The simulation outcomes indicated that the MCSLD-AIBER method performs optimally over other models.
KW - Artificial Intelligence
KW - Complex Systems
KW - Constant-Q Transform
KW - Feature Extraction
KW - Fractal Optimization
KW - Hyperparameter Selection
KW - Spoken Language Detection
UR - http://www.scopus.com/inward/record.url?scp=85212565681&partnerID=8YFLogxK
U2 - 10.1142/S0218348X25400547
DO - 10.1142/S0218348X25400547
M3 - Article
AN - SCOPUS:85212565681
SN - 0218-348X
VL - 32
JO - Fractals
JF - Fractals
IS - 9-10
M1 - 2540054
ER -