Multi-label Image Ranking based on Deep Convolutional Features
Guanghui Song
1,2
Xiaogang Jin
†,1
Genlang Chen
2
1
College of Computer Science, Zhejiang University, Hangzhou, China
2
Ningbo Institute of Technology, Zhejiang University, Ningbo, China
{xiaogangj}@cise.zju.edu.cn
Yan Nie
College of Science and Technology, Ningbo University, Ningbo, China
Abstract
Multi-label image ranking has many important applica-
tions in the real world, and it includes two core issues: im-
age feature extraction approach and multi-label ranking al-
gorithm. The existing works are mainly focused on the im-
provement of multi-label ranking algorithm based on the
conventional visual features. Recently, image features ex-
tracted from the deep convolutional neural network have
achieved impressive performance for a variety of vision
tasks. Using these deep features as image representations
have gained more and more attention on multi-label ranking
problem. In this study, we evaluate the performance of the
deep features using two baseline multi-label ranking algo-
rithms. First, the deep convolutional neural network model
pre-trained on ImageNet is fine-tuned to the target dataset.
Second, the global deep features of raw image are extract-
ed from the fine-tuned model and serve as the input data
of ranking algorithms. Finally, experiments using the Tas-
mania Coral Point Count dataset demonstrate that the deep
features enhance the expression ability in comparison with
that of conventional visual features, and they can effectively
improve multi-label ranking performance.
1. Introduction
Multi-label images have been widely used in many ap-
plications, such as image retrieval, semantic annotation, and
other fields, because of the important practical significance
[9]. Most real-world images contain more than one object
of different categories. Using multi-label method to anno-
† Corresponding author
* Project supported by the National Natural Science Foundation of China
(Grant No.61379074), the Zhejiang Provincial Natural Science Foundation
of China (Grant No.LZ12F02003), and the Zhejiang Provincial Natural
Science Foundation of China (Grant No.LY15F020035)
tate the images can fully describe the original image con-
tent in comparison with that of single-label method. And
on this basis, label ranking can further reflect the seman-
tic information of multi-label images [3]. Multi-label im-
age ranking problem is a very challenging task, and it has
received considerable attention in computer vision recent-
ly. This problem consists of two parts: on the one hand,
the relevant labels are assigned to each image automatical-
ly, namely multi-label classification; on the other hand, a
proper ranking is predicted for the relevant labels, name-
ly label ranking [2]. The goal of multi-label ranking is to
learn a mapping from multiple instances of each image to
the ranking of the corresponding labels. Figure 1 shows the
single-label and multi-label images from different datasets.
We can see that the description of image content is incom-
plete using single-label method. However, the important
degree of multiple objects in an image can be obtained us-
ing multi-label ranking method.
To solve multi-label image ranking problem, image fea-
tures extraction approach and multi-label ranking algorith-
m are two important steps. Both of them have great in-
fluence on the performance of multi-label ranking [1]. In
previous studies, many methods are proposed to address
this challenging task from above two aspects. Most of
them are mainly focused on the improvement of multi-label
learning algorithm based on the conventional visual features
that serve as image representation [6,7]. Recently, the im-
age features extracted from the deep convolutional neural
network (CNN) have achieved impressive performance on
single-label image classification, which is also known as
the deep features [12]. These deep features can produce
a rich representation of the raw image by embedding them
to a fixed-length vector, such that this representation can
be used for a variety of vision tasks [10,11]. Especially
in some applications for generating image description, the
deep features based on object bounding boxes and multi-
ple instance learning approach are adopted, and they have
1