Image-text dual model for small-sample image classification

Fangyi Zhu, Xiaoxu Li, Zhanyu Ma*, Guang Chen, Pai Peng, Xiaowei Guo, Jen-Tzung Chien, Jun Guo

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations


Small-sample classification is a challenging problem in computer vision and has many applications. In this paper, we propose an image-text dual model to improve the classification performance on small-sample dataset. The proposed dual model consists of two sub-models, an image classification model and a text classification model. After training the sub-models respectively, we design a novel method to fuse the two sub-models rather than simply combining the two models’ results. Our image-text dual model aims to utilize the text information to overcome the problem of training deep models on small-sample datasets. To demonstrate the effectiveness of the proposed dual model, we conduct extensive experiments on LabelMe and UIUC-Sports. Experimental results show that our model is superior to other models. In conclusion, our proposed model can achieve the highest image classification accuracy among all the referred models on LabelMe and UIUC-Sports.

Original languageEnglish
Title of host publicationComputer Vision - 2nd CCF Chinese Conference, CCCV 2017, Proceedings
EditorsLiang Wang, Xiang Bai, Jinfeng Yang, Qingshan Liu, Deyu Meng, Qinghua Hu, Ming-Ming Cheng
PublisherSpringer Verlag
Number of pages10
ISBN (Print)9789811073014
StatePublished - 1 Jan 2017
Event2nd Chinese Conference on Computer Vision, CCCV 2017 - Tianjin, China
Duration: 11 Oct 201714 Oct 2017

Publication series

NameCommunications in Computer and Information Science
ISSN (Print)1865-0929


Conference2nd Chinese Conference on Computer Vision, CCCV 2017


  • Deep convolutional neural network
  • Ensemble learning
  • Small-sample image classification

Fingerprint Dive into the research topics of 'Image-text dual model for small-sample image classification'. Together they form a unique fingerprint.

Cite this