Image Caption Generation Using Scoring Based on Object Detection and Word2Vec

Tadanobu Misawa*, Nozomi Morizumi, Kazuya Yamashita

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Generating descriptive text from images, known as caption generation, is a noteworthy research field with potential applications, including aiding the visually impaired. Recently, numerous methods based on deep learning have been proposed. Previous methods learn the relationship between image features and captions on a large dataset of image-caption pairs. However, it is difficult to correctly learn all objects, object attributes, and relationships between objects. Therefore, occasionally incorrect captions are generated. For instance, captions about objects not included in the image are generated. In this study, we propose a scoring method using object detection and Word2Vec to output the correct caption for an object in the image. First, multiple captions are generated. Subsequently, object detection is performed, and the score is calculated using the resulting labels from object detection and the nouns extracted from each caption. Finally, the output is the caption with the highest score. Experimental evaluation of the proposed method on the Microsoft Common Objects in Context (MSCOCO) dataset demonstrates that the proposed method is effective in improving the accuracy of caption generation.

Original languageEnglish
Pages (from-to)2195-2204
Number of pages10
JournalSensors and Materials
Volume35
Issue number7
DOIs
StatePublished - 2023

Keywords

  • Word2Vec
  • deep learning
  • image caption generation
  • object detection
  • scoring

ASJC Scopus subject areas

  • Instrumentation
  • General Materials Science

Fingerprint

Dive into the research topics of 'Image Caption Generation Using Scoring Based on Object Detection and Word2Vec'. Together they form a unique fingerprint.

Cite this