TY - JOUR
T1 - Automated Image Captioning Using Sparrow Search Algorithm With Improved Deep Learning Model
AU - Arasi, Munya A.
AU - Alshahrani, Haya Mesfer
AU - Alruwais, Nuha
AU - Motwakel, Abdelwahed
AU - Ahmed, Noura Abdelaziz
AU - Mohamed, Abdullah
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2023
Y1 - 2023
N2 - Image captioning is a deep learning technique that intends to create and generate textual descriptions or captions for images. It integrates computer vision and natural language processing (NLP) to comprehend the visual content of an image and generate human-like descriptions. Deep learning (DL) based image captioning models can be trained on large-scale datasets, allowing them to generalize various types of images and generate captions that apply to a wide range of visual scenarios. By combining computer vision and natural language processing, DL-enabled image captioning models can understand both visual and textual information, which enables them to generate captions that not only describe the visual content but also incorporate contextual and semantic information. This study develops an Automated Image Captioning using Sparrow Search Algorithm with Improved Deep Learning (AIC-SSAIDL) technique. The major intention of the AIC-SSAIDL technique lies in the automated generation of textual captions for the input images. To accomplish this, the AIC-SSAIDL technique utilizes the MobileNetv2 model to generate feature descriptors of the input images and its hyperparameter tuning process takes place using SSA. For the image captioning process, the AIC-SSAIDL technique utilizes an attention mechanism with long short-term memory (AM-LSTM) network. Finally, the hyperparameter selection of the AM-LSTM model is performed by the fruit fly optimization (FFO) algorithm. A wide range of experiments has been conducted on benchmark data to depict the better performance of the AIC-SSAIDL method. The comprehensive result analysis highlighted the enhanced captioning results of the AIC-SSAIDL method with maximum CIDEr of 46.12, 61.89, and 137.45 on Flickr8k, Flickr30k, and MSCOCO datasets, respectively.
AB - Image captioning is a deep learning technique that intends to create and generate textual descriptions or captions for images. It integrates computer vision and natural language processing (NLP) to comprehend the visual content of an image and generate human-like descriptions. Deep learning (DL) based image captioning models can be trained on large-scale datasets, allowing them to generalize various types of images and generate captions that apply to a wide range of visual scenarios. By combining computer vision and natural language processing, DL-enabled image captioning models can understand both visual and textual information, which enables them to generate captions that not only describe the visual content but also incorporate contextual and semantic information. This study develops an Automated Image Captioning using Sparrow Search Algorithm with Improved Deep Learning (AIC-SSAIDL) technique. The major intention of the AIC-SSAIDL technique lies in the automated generation of textual captions for the input images. To accomplish this, the AIC-SSAIDL technique utilizes the MobileNetv2 model to generate feature descriptors of the input images and its hyperparameter tuning process takes place using SSA. For the image captioning process, the AIC-SSAIDL technique utilizes an attention mechanism with long short-term memory (AM-LSTM) network. Finally, the hyperparameter selection of the AM-LSTM model is performed by the fruit fly optimization (FFO) algorithm. A wide range of experiments has been conducted on benchmark data to depict the better performance of the AIC-SSAIDL method. The comprehensive result analysis highlighted the enhanced captioning results of the AIC-SSAIDL method with maximum CIDEr of 46.12, 61.89, and 137.45 on Flickr8k, Flickr30k, and MSCOCO datasets, respectively.
KW - Image captioning
KW - computer vision
KW - deep learning
KW - natural language processing
KW - sparrow search algorithm
UR - http://www.scopus.com/inward/record.url?scp=85173024444&partnerID=8YFLogxK
U2 - 10.1109/ACCESS.2023.3317276
DO - 10.1109/ACCESS.2023.3317276
M3 - Article
AN - SCOPUS:85173024444
SN - 2169-3536
VL - 11
SP - 104633
EP - 104642
JO - IEEE Access
JF - IEEE Access
ER -