Image Caption Generation and Comprehensive Comparison of Image Encoders

Shitiz Gupta, Shubham Agnihotri, Deepasha Birla, Puneet Singh Lamb, Achin Jain, Thavavel Vaiyapuri

Research output: Contribution to journalArticlepeer-review

12 Scopus citations

Abstract

Image caption generation is a stimulating multimodal task. Substantial advancements have been made in the field of deep learning notably in computer vision and natural language processing. Yet, human-generated captions are still considered better, which makes it a challenging application for interactive machine learning. In this paper, we aim to compare different transfer learning techniques and develop a novel architecture to improve image captioning accuracy. We compute image feature vectors using different state-of-the-art transfer learning models which are fed into an Encoder-Decoder network based on Stacked LSTMs with soft attention, along with embedded text to generate high accuracy captions. We have compared these models on several benchmark datasets based on different evaluation metrics like BLEU and METEOR.

Original languageEnglish
Pages (from-to)42-55
Number of pages14
JournalFusion: Practice and Applications
Volume4
Issue number2
DOIs
StatePublished - 2021

Keywords

  • CNN (Convolutional Neural Network)
  • Image Captioning
  • RNN (Recurrent neural network)and LSTM (Long Short Term Memory)
  • Transfer Learning

Fingerprint

Dive into the research topics of 'Image Caption Generation and Comprehensive Comparison of Image Encoders'. Together they form a unique fingerprint.

Cite this