Asymmetric Co-Teaching for Unsupervised Cross-Domain
Person Re-Identification
Fengxiang Yang,
1∗
Ke Li,
3
Zhun Zhong,
1
Zhiming Luo,
2†
Xing Sun,
3†
Hao Cheng,
3
Xiaowei Guo,
3
Feiyue Huang,
3
Rongrong Ji,
1
Shaozi Li
1
1
Artificial Intelligence Department, Xiamen University, China
2
Post Doctoral Mobile Station of Information and Communication Engineering, Xiamen University, China
3
Tencent Youtu Lab, Shanghai, China
Abstract
Person re-identification (re-ID), is a challenging task due
to the high variance within identity samples and imaging
conditions. Although recent advances in deep learning have
achieved remarkable accuracy in settled scenes, i.e., source
domain, few works can generalize well on the unseen target
domain. One popular solution is assigning unlabeled target
images with pseudo labels by clustering, and then retrain-
ing the model. However, clustering methods tend to introduce
noisy labels and discard low confidence samples as outliers,
which may hinder the retraining process and thus limit the
generalization ability. In this study, we argue that by explic-
itly adding a sample filtering procedure after the clustering,
the mined examples can be much more efficiently used. To
this end, we design an asymmetric co-teaching framework,
which resists noisy labels by cooperating two models to se-
lect data with possibly clean labels for each other. Mean-
while, one of the models receives samples as pure as pos-
sible, while the other takes in samples as diverse as pos-
sible. This procedure encourages that the selected training
samples can be both clean and miscellaneous, and that the
two models can promote each other iteratively. Extensive
experiments show that the proposed framework can consis-
tently benefit most clustering based methods, and boost the
state-of-the-art adaptation accuracy. Our code is available at
https://github.com/FlyingRoastDuck/ACT AAAI20.
1 Introduction
Person re-identification (re-ID) (Sun et al. 2018; Zheng,
Yang, and Hauptmann 2016; Li, Zhu, and Gong 2018b) aims
to locate the target person in surveillance videos with a given
probe image. With the rapid evolution of deep learning mod-
els, the accuracy of person re-ID has been greatly boosted in
the public datasets. However, models trained on the source
domain often suffer from domain shifts, leading to a perfor-
mance decline on a different target domain.
To alleviate this issue, recent works (Zhong et al. 2019b;
Zhong et al. 2018b) make efforts on the unsupervised do-
∗
This work was done when Fengxiang Yang was an intern at
Youtu Lab (yangfx@stu.xmu.edu.cn).
†
Corresponding Author (zhiming.luo@xmu.edu.cn, winfred-
sun@tencent.com)
Copyright
c
2020, Association for the Advancement of Artificial
Intelligence (www.aaai.org). All rights reserved.
main adaptation (UDA), which aims to transfer the knowl-
edge from the labeled source domain to the unlabeled tar-
get domain. These works mainly lie in two aspects, distri-
bution aligning (Wei et al. 2018; Deng et al. 2018; Chang
et al. 2019; Lin et al. 2018; Wang et al. 2018) and tar-
get pseudo label discovering (Fan, Zheng, and Yang 2018;
Song et al. 2018; Li, Zhu, and Gong 2018a). The former
aims to reduce the distribution gap between domains in
a common space, such as image-level (Wei et al. 2018;
Deng et al. 2018) and attribute-level (Chang et al. 2019;
Lin et al. 2018; Wang et al. 2018) spaces. The latter attempts
to leverage the underlying relations among target samples
and predict pseudo labels for model retraining, e.g. assign-
ing pseudo labels based on clustering (Fan, Zheng, and Yang
2018; Song et al. 2018; Li, Zhu, and Gong 2018a) and k-
nearest neighbors (Zhong et al. 2019a; Yang et al. 2018).
Among them, clustering based methods have reported very
competitive accuracy for UDA in person re-ID. These meth-
ods usually employ an iterative process of predicting pseudo
identities for unlabeled target samples according to the clus-
ters and fine-tuning the model with those predicted samples.
Despite their promising results, clustering based methods
are restricted by two main drawbacks. On the one hand,
the clustering accuracy can not be guaranteed even using
the modern approaches, so that pseudo labels assigned by
clusters can be noisy. Training the model with noisy labels
that assigned to wrong identities will undoubtedly damage
the re-ID performance. On the other hand, most clustering
methods tend to leave low confidence samples as outliers
and do not assign cluster labels to them, e.g., DBSCAN (Es-
ter et al. 1996). These outliers are usually hard samples that
encounter high image variations. Without considering such
samples during training, the model may have a problem in
discriminating high variation testing samples. However, di-
rectly assigning them to the nearest cluster will bring more
noisy labels, hindering the retraining of the model.
Co-Teaching (CT) (Han et al. 2018) is a commonly used
algorithm for training model with noisy labels, which learns
two networks by feeding samples with small losses of one
network to another. However, most co-teaching frameworks
utilize symmetric inputs for both networks, which do not
effectively apply to the context of clustering based cross-
arXiv:1912.01349v1 [cs.CV] 3 Dec 2019