Abstract
Automatic speech recognition (ASR) is an effective technique that can convert human speech into text format or computer actions. ASR systems are widely used in smart appliances, smart homes, and biometric systems. Signal processing and machine learning techniques are incorporated to recognize speech. However, traditional systems have low performance due to a noisy environment. In addition to this, accents and local differences negatively affect the ASR system’s performance while analyzing speech signals. A precise speech recognition system was developed to improve the system performance to overcome these issues. This paper uses speech information from jim-schwoebel voice datasets processed by Mel-frequency cepstral coefficients (MFCCs). The MFCC algorithm extracts the valuable features that are used to recognize speech. Here, a sparse auto-encoder (SAE) neural network is used to classify the model, and the hidden Markov model (HMM) is used to decide on the speech recognition. The network performance is optimized by applying the Harris Hawks optimization (HHO) algorithm to fine-tune the network parameter. The fine-tuned network can effectively recognize speech in a noisy environment.
Original language | English |
---|---|
Article number | 1091 |
Journal | Applied Sciences (Switzerland) |
Volume | 12 |
Issue number | 3 |
DOIs | |
State | Published - 1 Feb 2022 |
Keywords
- Automatic speech recognition
- Hidden Markov model
- Mel-frequency cepstral coefficients
- Natural language processing
- Sparse auto-encoder neural network
- Speech recognition