Automated Image Captioning Using Sparrow Search Algorithm With Improved Deep Learning Model

Munya A. Arasi; Haya Mesfer Alshahrani; Nuha Alruwais; Abdelwahed Motwakel; Noura Abdelaziz Ahmed; Abdullah Mohamed

doi:10.1109/ACCESS.2023.3317276

Automated Image Captioning Using Sparrow Search Algorithm With Improved Deep Learning Model

Munya A. Arasi
, Haya Mesfer Alshahrani
, Nuha Alruwais
, Abdelwahed Motwakel
, Noura Abdelaziz Ahmed
, Abdullah Mohamed

Information Systems

Research output: Contribution to journal › Article › peer-review

11 Scopus citations

Abstract

Image captioning is a deep learning technique that intends to create and generate textual descriptions or captions for images. It integrates computer vision and natural language processing (NLP) to comprehend the visual content of an image and generate human-like descriptions. Deep learning (DL) based image captioning models can be trained on large-scale datasets, allowing them to generalize various types of images and generate captions that apply to a wide range of visual scenarios. By combining computer vision and natural language processing, DL-enabled image captioning models can understand both visual and textual information, which enables them to generate captions that not only describe the visual content but also incorporate contextual and semantic information. This study develops an Automated Image Captioning using Sparrow Search Algorithm with Improved Deep Learning (AIC-SSAIDL) technique. The major intention of the AIC-SSAIDL technique lies in the automated generation of textual captions for the input images. To accomplish this, the AIC-SSAIDL technique utilizes the MobileNetv2 model to generate feature descriptors of the input images and its hyperparameter tuning process takes place using SSA. For the image captioning process, the AIC-SSAIDL technique utilizes an attention mechanism with long short-term memory (AM-LSTM) network. Finally, the hyperparameter selection of the AM-LSTM model is performed by the fruit fly optimization (FFO) algorithm. A wide range of experiments has been conducted on benchmark data to depict the better performance of the AIC-SSAIDL method. The comprehensive result analysis highlighted the enhanced captioning results of the AIC-SSAIDL method with maximum CIDEr of 46.12, 61.89, and 137.45 on Flickr8k, Flickr30k, and MSCOCO datasets, respectively.

Original language	English
Pages (from-to)	104633-104642
Number of pages	10
Journal	IEEE Access
Volume	11
DOIs	https://doi.org/10.1109/ACCESS.2023.3317276
State	Published - 2023

Keywords

Image captioning
computer vision
deep learning
natural language processing
sparrow search algorithm

Access to Document

10.1109/ACCESS.2023.3317276

Cite this

@article{5ea4a3e1ecc04ef182eec2a5b9b50e21,

title = "Automated Image Captioning Using Sparrow Search Algorithm With Improved Deep Learning Model",

abstract = "Image captioning is a deep learning technique that intends to create and generate textual descriptions or captions for images. It integrates computer vision and natural language processing (NLP) to comprehend the visual content of an image and generate human-like descriptions. Deep learning (DL) based image captioning models can be trained on large-scale datasets, allowing them to generalize various types of images and generate captions that apply to a wide range of visual scenarios. By combining computer vision and natural language processing, DL-enabled image captioning models can understand both visual and textual information, which enables them to generate captions that not only describe the visual content but also incorporate contextual and semantic information. This study develops an Automated Image Captioning using Sparrow Search Algorithm with Improved Deep Learning (AIC-SSAIDL) technique. The major intention of the AIC-SSAIDL technique lies in the automated generation of textual captions for the input images. To accomplish this, the AIC-SSAIDL technique utilizes the MobileNetv2 model to generate feature descriptors of the input images and its hyperparameter tuning process takes place using SSA. For the image captioning process, the AIC-SSAIDL technique utilizes an attention mechanism with long short-term memory (AM-LSTM) network. Finally, the hyperparameter selection of the AM-LSTM model is performed by the fruit fly optimization (FFO) algorithm. A wide range of experiments has been conducted on benchmark data to depict the better performance of the AIC-SSAIDL method. The comprehensive result analysis highlighted the enhanced captioning results of the AIC-SSAIDL method with maximum CIDEr of 46.12, 61.89, and 137.45 on Flickr8k, Flickr30k, and MSCOCO datasets, respectively.",

keywords = "Image captioning, computer vision, deep learning, natural language processing, sparrow search algorithm",

author = "Arasi, \{Munya A.\} and Alshahrani, \{Haya Mesfer\} and Nuha Alruwais and Abdelwahed Motwakel and Ahmed, \{Noura Abdelaziz\} and Abdullah Mohamed",

note = "Publisher Copyright: {\textcopyright} 2013 IEEE.",

year = "2023",

doi = "10.1109/ACCESS.2023.3317276",

language = "English",

volume = "11",

pages = "104633--104642",

journal = "IEEE Access",

issn = "2169-3536",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Automated Image Captioning Using Sparrow Search Algorithm With Improved Deep Learning Model

AU - Arasi, Munya A.

AU - Alshahrani, Haya Mesfer

AU - Alruwais, Nuha

AU - Motwakel, Abdelwahed

AU - Ahmed, Noura Abdelaziz

AU - Mohamed, Abdullah

PY - 2023

Y1 - 2023

N2 - Image captioning is a deep learning technique that intends to create and generate textual descriptions or captions for images. It integrates computer vision and natural language processing (NLP) to comprehend the visual content of an image and generate human-like descriptions. Deep learning (DL) based image captioning models can be trained on large-scale datasets, allowing them to generalize various types of images and generate captions that apply to a wide range of visual scenarios. By combining computer vision and natural language processing, DL-enabled image captioning models can understand both visual and textual information, which enables them to generate captions that not only describe the visual content but also incorporate contextual and semantic information. This study develops an Automated Image Captioning using Sparrow Search Algorithm with Improved Deep Learning (AIC-SSAIDL) technique. The major intention of the AIC-SSAIDL technique lies in the automated generation of textual captions for the input images. To accomplish this, the AIC-SSAIDL technique utilizes the MobileNetv2 model to generate feature descriptors of the input images and its hyperparameter tuning process takes place using SSA. For the image captioning process, the AIC-SSAIDL technique utilizes an attention mechanism with long short-term memory (AM-LSTM) network. Finally, the hyperparameter selection of the AM-LSTM model is performed by the fruit fly optimization (FFO) algorithm. A wide range of experiments has been conducted on benchmark data to depict the better performance of the AIC-SSAIDL method. The comprehensive result analysis highlighted the enhanced captioning results of the AIC-SSAIDL method with maximum CIDEr of 46.12, 61.89, and 137.45 on Flickr8k, Flickr30k, and MSCOCO datasets, respectively.

AB - Image captioning is a deep learning technique that intends to create and generate textual descriptions or captions for images. It integrates computer vision and natural language processing (NLP) to comprehend the visual content of an image and generate human-like descriptions. Deep learning (DL) based image captioning models can be trained on large-scale datasets, allowing them to generalize various types of images and generate captions that apply to a wide range of visual scenarios. By combining computer vision and natural language processing, DL-enabled image captioning models can understand both visual and textual information, which enables them to generate captions that not only describe the visual content but also incorporate contextual and semantic information. This study develops an Automated Image Captioning using Sparrow Search Algorithm with Improved Deep Learning (AIC-SSAIDL) technique. The major intention of the AIC-SSAIDL technique lies in the automated generation of textual captions for the input images. To accomplish this, the AIC-SSAIDL technique utilizes the MobileNetv2 model to generate feature descriptors of the input images and its hyperparameter tuning process takes place using SSA. For the image captioning process, the AIC-SSAIDL technique utilizes an attention mechanism with long short-term memory (AM-LSTM) network. Finally, the hyperparameter selection of the AM-LSTM model is performed by the fruit fly optimization (FFO) algorithm. A wide range of experiments has been conducted on benchmark data to depict the better performance of the AIC-SSAIDL method. The comprehensive result analysis highlighted the enhanced captioning results of the AIC-SSAIDL method with maximum CIDEr of 46.12, 61.89, and 137.45 on Flickr8k, Flickr30k, and MSCOCO datasets, respectively.

KW - Image captioning

KW - computer vision

KW - deep learning

KW - natural language processing

KW - sparrow search algorithm

UR - https://www.scopus.com/pages/publications/85173024444

U2 - 10.1109/ACCESS.2023.3317276

DO - 10.1109/ACCESS.2023.3317276

M3 - Article

AN - SCOPUS:85173024444

SN - 2169-3536

VL - 11

SP - 104633

EP - 104642

JO - IEEE Access

JF - IEEE Access

ER -

Automated Image Captioning Using Sparrow Search Algorithm With Improved Deep Learning Model

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this