Zero-shot Metric Learning
Xinyi Xu , Huanhuan Cao , Yanhua Yang , Erkun Yang and Cheng Deng
∗
School of Electronic Engineering, Xidian University, Xian 710071, China
xyxu.xd@gmail.com, hhcao@stu.xidian.edu.cn, yanhyang@xidian.edu.cn,
{erkunyang, chdeng.xd}@gmail.com
Abstract
In this work, we tackle the zero-shot metric learn-
ing problem and propose a novel method abbre-
viated as ZSML, with the purpose to learn a dis-
tance metric that measures the similarity of unseen
categories (even unseen datasets). ZSML achieves
strong transferability by capturing multi-nonlinear
yet continuous relation among data. It is moti-
vated by two facts: 1) relations can be essentially
described from various perspectives; and 2) tradi-
tional binary supervision is insufficient to represent
continuous visual similarity. Specifically, we first
reformulate a collection of specific-shaped convo-
lutional kernels to combine data pairs and generate
multiple relation vectors. Furthermore, we design
a new cross-update regression loss to discover con-
tinuous similarity. Extensive experiments including
intra-dataset transfer and inter-dataset transfer on
four benchmark datasets demonstrate that ZSML
can achieve state-of-the-art performance.
1 Introduction
Metric learning aims to find appropriate similarity measure-
ments of data points, whose core intuition is to preserve the
distance between data points in embedding space. This topic
is of important practice due to its wide applications in many
related areas, such as face recognition
[
Guillaumin et al.,
2009
]
, clustering
[
Davis et al., 2007; Xing et al., 2003
]
, and
retrieval
[
Zhou et al., 2004
]
.
Euclidean distance is one of the most common similar-
ity metrics since it does not require priori information and
training process. However, unsatisfactory results may be
yielded as it treats all feature dimensions equally and inde-
pendently, thus fails to capture the idiosyncrasies of data. In
contrast, parametric Mahalanobis distance that can model the
different dimension importance, has been adopted in many
works. Some representative Mahalanobis approaches
[
Hoi et
al., 2006; Xing et al., 2003
]
project data linearly and mini-
mize Euclidean distance between positive pairs, while max-
imize it between negative pairs. Alternatively, one may also
∗
Corresponding author.
directly optimize the Mahalanobis metric for nearest neigh-
bor classification, among which representative works include,
but are not limited to, Neighborhood Component Analysis
(NCA)
[
Roweis et al., 2004
]
, Large Margin Nearest Neigh-
bor (LMNN)
[
Weinberger and Saul, 2009
]
, and Nearest Class
Mean (NCM)
[
Mensink et al., 2013
]
. Priori information plays
a pivotal role in the success of these metric learning schemes.
Therefore, unsatisfactory results can be produced when the
priori is not available.
In this paper, we are committed to a more challenging task:
zero-shot metric learning, whose ambition is to learn an effec-
tive metric for unseen categories and datasets. It claims that
the learned metric must measure the similarity without access
to the target data. Powerful transferability can be obtained by
capturing the multi-nonlinear and continuous relations, which
is consistent with the innate character of data. Particularly, we
first reformulates a set of specific-shaped convolutional ker-
nels to discover various kinds of relations. It is well known
that convolutional neural network (CNN) has great power in
feature embedding
[
Lecun et al., 1998; Donahue et al., 2013;
Toshev and Szegedy, 2014
]
, while in this paper it is employed
to reveal the correlation among data. Then, we design a cross-
update regression loss, which relax the binary supervision
employed on the positive pairs (PPs) and negative pairs (NPs)
to extend generalization capability. Specifically, we initial-
ize a coarse continuous label as a weak supervision of the
predicted similarity, and update the coarse label and the pre-
dicted similarity alternately till convergence. By doing so,
we can learn the similarity order and improve transferability.
To better demonstrate the superiority of ZSML, we present
multi-level transfer tasks, which covers transferring to unseen
category within one dataset (intra-dataset ZSML) and unseen
datasets (inter-dataset ZSML). In a nutshell, the main contri-
butions of our work can be summarized as follows:
• Departing from the traditional single and linear rela-
tion representation, we reformulate a family of specific-
shaped convolutional kernels which can capture the
multi-nonlinear relations among data points.
• We devise a cross-update regression loss for learning
continuous similarity to improve generalization capabil-
ity, which is verified in our empirical study.
• Extensive transfer experiments demonstrate that our
model can better measure the similarity of unseen cate-
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI-19)
3996