A position-aware transformer for image captioning

Tools

Deng, Zelin and Zhou, Bo and He, Pei and Huang, Jianfeng and Alfarraj, Osama and Tolba, Amr (2021) A position-aware transformer for image captioning. Computers, Materials and Continua, 70 (1). pp. 2005-2021. ISSN 1546-2218 (https://doi.org/10.32604/cmc.2022.019328)

[thumbnail of Deng-etal-CMC-2021-A-position-aware-transformer-for-image-captioning]

Preview

Text. Filename: Deng_etal_CMC_2021_A_position_aware_transformer_for_image_captioning.pdf
Final Published Version
License:

Download (1MB)| Preview

Abstract

Image captioning aims to generate a corresponding description of an image. In recent years, neural encoder-decoder models have been the dominant approaches, in which the Convolutional Neural Network (CNN) and Long Short Term Memory (LSTM) are used to translate an image into a natural language description. Among these approaches, the visual attention mechanisms are widely used to enable deeper image understanding through fine-grained analysis and even multiple steps of reasoning. However, most conventional visual attention mechanisms are based on high-level image features, ignoring the effects of other image features, and giving insufficient consideration to the relative positions between image features. In this work, we propose a Position-Aware Transformer model with image-feature attention and position-aware attention mechanisms for the above problems. The image-feature attention firstly extracts multi-level features by using Feature Pyramid Network (FPN), then utilizes the scaled-dot-product to fuse these features, which enables our model to detect objects of different scales in the image more effectively without increasing parameters. In the position-aware attention mechanism, the relative positions between image features are obtained at first, afterwards the relative positions are incorporated into the original image features to generate captions more accurately. Experiments are carried out on the MSCOCO dataset and our approach achieves competitive BLEU-4, METEOR, ROUGE-L, CIDEr scores compared with some state-of-the-art approaches, demonstrating the effectiveness of our approach.

ORCID iDs

Deng, Zelin, Zhou, Bo, He, Pei, Huang, Jianfeng

, Alfarraj, Osama and Tolba, Amr;

Share and Export

Item metadata

Item type:	Article
ID code:	78274
Dates:	Date Event 7 September 2021 Published 16 June 2021 Accepted
Subjects:	Science > Mathematics > Electronic computers. Computer science
Department:	Faculty of Engineering > Design, Manufacture and Engineering Management > National Manufacturing Institute Scotland
Depositing user:	Pure Administrator
Date deposited:	27 Oct 2021 09:19
Last modified:	28 Feb 2025 01:35
Related URLs:	Scopus publication
URI:	https://strathprints.strath.ac.uk/id/eprint/78274

CORE (COnnecting REpositories)