Harris Hawks Sparse Auto-Encoder Networks for Automatic Speech Recognition System

Mohammed Hasan Ali; Mustafa Musa Jaber; Sura Khalil Abd; Amjad Rehman; Mazhar Javed Awan; Daiva Vitkutė-Adžgauskienė; Robertas Damaševičius; Saeed Ali Bahaj

doi:10.3390/app12031091

Harris Hawks Sparse Auto-Encoder Networks for Automatic Speech Recognition System

Mohammed Hasan Ali
, Mustafa Musa Jaber
, Sura Khalil Abd
, Amjad Rehman
, Mazhar Javed Awan
, Daiva Vitkutė-Adžgauskienė
, Robertas Damaševičius
, Saeed Ali Bahaj

Management Information Systems

Research output: Contribution to journal › Article › peer-review

35 Scopus citations

Abstract

Automatic speech recognition (ASR) is an effective technique that can convert human speech into text format or computer actions. ASR systems are widely used in smart appliances, smart homes, and biometric systems. Signal processing and machine learning techniques are incorporated to recognize speech. However, traditional systems have low performance due to a noisy environment. In addition to this, accents and local differences negatively affect the ASR system’s performance while analyzing speech signals. A precise speech recognition system was developed to improve the system performance to overcome these issues. This paper uses speech information from jim-schwoebel voice datasets processed by Mel-frequency cepstral coefficients (MFCCs). The MFCC algorithm extracts the valuable features that are used to recognize speech. Here, a sparse auto-encoder (SAE) neural network is used to classify the model, and the hidden Markov model (HMM) is used to decide on the speech recognition. The network performance is optimized by applying the Harris Hawks optimization (HHO) algorithm to fine-tune the network parameter. The fine-tuned network can effectively recognize speech in a noisy environment.

Original language	English
Article number	1091
Journal	Applied Sciences (Switzerland)
Volume	12
Issue number	3
DOIs	https://doi.org/10.3390/app12031091
State	Published - 1 Feb 2022

Keywords

Automatic speech recognition
Hidden Markov model
Mel-frequency cepstral coefficients
Natural language processing
Sparse auto-encoder neural network
Speech recognition

Access to Document

10.3390/app12031091

Cite this

@article{8a1d69a1dde640739ab51800dae81060,

title = "Harris Hawks Sparse Auto-Encoder Networks for Automatic Speech Recognition System",

abstract = "Automatic speech recognition (ASR) is an effective technique that can convert human speech into text format or computer actions. ASR systems are widely used in smart appliances, smart homes, and biometric systems. Signal processing and machine learning techniques are incorporated to recognize speech. However, traditional systems have low performance due to a noisy environment. In addition to this, accents and local differences negatively affect the ASR system{\textquoteright}s performance while analyzing speech signals. A precise speech recognition system was developed to improve the system performance to overcome these issues. This paper uses speech information from jim-schwoebel voice datasets processed by Mel-frequency cepstral coefficients (MFCCs). The MFCC algorithm extracts the valuable features that are used to recognize speech. Here, a sparse auto-encoder (SAE) neural network is used to classify the model, and the hidden Markov model (HMM) is used to decide on the speech recognition. The network performance is optimized by applying the Harris Hawks optimization (HHO) algorithm to fine-tune the network parameter. The fine-tuned network can effectively recognize speech in a noisy environment.",

keywords = "Automatic speech recognition, Hidden Markov model, Mel-frequency cepstral coefficients, Natural language processing, Sparse auto-encoder neural network, Speech recognition",

author = "Ali, \{Mohammed Hasan\} and Jaber, \{Mustafa Musa\} and Abd, \{Sura Khalil\} and Amjad Rehman and Awan, \{Mazhar Javed\} and Daiva Vitkutė-Ad{\v z}gauskienė and Robertas Dama{\v s}evi{\v c}ius and Bahaj, \{Saeed Ali\}",

note = "Publisher Copyright: {\textcopyright} 2022 by the authors. Licensee MDPI, Basel, Switzerland.",

year = "2022",

month = feb,

day = "1",

doi = "10.3390/app12031091",

language = "English",

volume = "12",

journal = "Applied Sciences (Switzerland)",

issn = "2076-3417",

publisher = "Multidisciplinary Digital Publishing Institute (MDPI)",

number = "3",

}

TY - JOUR

T1 - Harris Hawks Sparse Auto-Encoder Networks for Automatic Speech Recognition System

AU - Ali, Mohammed Hasan

AU - Jaber, Mustafa Musa

AU - Abd, Sura Khalil

AU - Rehman, Amjad

AU - Awan, Mazhar Javed

AU - Vitkutė-Adžgauskienė, Daiva

AU - Damaševičius, Robertas

AU - Bahaj, Saeed Ali

PY - 2022/2/1

Y1 - 2022/2/1

N2 - Automatic speech recognition (ASR) is an effective technique that can convert human speech into text format or computer actions. ASR systems are widely used in smart appliances, smart homes, and biometric systems. Signal processing and machine learning techniques are incorporated to recognize speech. However, traditional systems have low performance due to a noisy environment. In addition to this, accents and local differences negatively affect the ASR system’s performance while analyzing speech signals. A precise speech recognition system was developed to improve the system performance to overcome these issues. This paper uses speech information from jim-schwoebel voice datasets processed by Mel-frequency cepstral coefficients (MFCCs). The MFCC algorithm extracts the valuable features that are used to recognize speech. Here, a sparse auto-encoder (SAE) neural network is used to classify the model, and the hidden Markov model (HMM) is used to decide on the speech recognition. The network performance is optimized by applying the Harris Hawks optimization (HHO) algorithm to fine-tune the network parameter. The fine-tuned network can effectively recognize speech in a noisy environment.

AB - Automatic speech recognition (ASR) is an effective technique that can convert human speech into text format or computer actions. ASR systems are widely used in smart appliances, smart homes, and biometric systems. Signal processing and machine learning techniques are incorporated to recognize speech. However, traditional systems have low performance due to a noisy environment. In addition to this, accents and local differences negatively affect the ASR system’s performance while analyzing speech signals. A precise speech recognition system was developed to improve the system performance to overcome these issues. This paper uses speech information from jim-schwoebel voice datasets processed by Mel-frequency cepstral coefficients (MFCCs). The MFCC algorithm extracts the valuable features that are used to recognize speech. Here, a sparse auto-encoder (SAE) neural network is used to classify the model, and the hidden Markov model (HMM) is used to decide on the speech recognition. The network performance is optimized by applying the Harris Hawks optimization (HHO) algorithm to fine-tune the network parameter. The fine-tuned network can effectively recognize speech in a noisy environment.

KW - Automatic speech recognition

KW - Hidden Markov model

KW - Mel-frequency cepstral coefficients

KW - Natural language processing

KW - Sparse auto-encoder neural network

KW - Speech recognition

UR - https://www.scopus.com/pages/publications/85123062713

U2 - 10.3390/app12031091

DO - 10.3390/app12031091

M3 - Article

AN - SCOPUS:85123062713

SN - 2076-3417

VL - 12

JO - Applied Sciences (Switzerland)

JF - Applied Sciences (Switzerland)

IS - 3

M1 - 1091

ER -

Harris Hawks Sparse Auto-Encoder Networks for Automatic Speech Recognition System

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this