Unsupervised Domain Adaptation by Backpropagation
Yaroslav Ganin GANIN@SKOLTECH.RU
Victor Lempitsky LEMPITSKY@SKOLTECH.RU
Skolkovo Institute of Science and Technology (Skoltech), Moscow Region, Russia
Abstract
Top-performing deep architectures are trained on
massive amounts of labeled data. In the absence
of labeled data for a certain task, domain adap-
tation often provides an attractive option given
that labeled data of similar nature but from a dif-
ferent domain (e.g. synthetic images) are avail-
able. Here, we propose a new approach to do-
main adaptation in deep architectures that can
be trained on large amount of labeled data from
the source domain and large amount of unlabeled
data from the target domain (no labeled target-
domain data is necessary).
As the training progresses, the approach pro-
motes the emergence of “deep” features that are
(i) discriminative for the main learning task on
the source domain and (ii) invariant with respect
to the shift between the domains. We show that
this adaptation behaviour can be achieved in al-
most any feed-forward model by augmenting it
with few standard layers and a simple new gra-
dient reversal layer. The resulting augmented
architecture can be trained using standard back-
propagation.
Overall, the approach can be implemented with
little effort using any of the deep-learning pack-
ages. The method performs very well in a se-
ries of image classification experiments, achiev-
ing adaptation effect in the presence of big do-
main shifts and outperforming previous state-of-
the-art on Office datasets.
1. Introduction
Deep feed-forward architectures have brought impressive
advances to the state-of-the-art across a wide variety of
machine-learning tasks and applications. At the moment,
however, these leaps in performance come only when a
Proceedings of the 32
nd
International Conference on Machine
Learning, Lille, France, 2015. JMLR: W&CP volume 37. Copy-
right 2015 by the author(s).
large amount of labeled training data is available. At the
same time, for problems lacking labeled data, it may be
still possible to obtain training sets that are big enough for
training large-scale deep models, but that suffer from the
shift in data distribution from the actual data encountered
at “test time”. One particularly important example is syn-
thetic or semi-synthetic training data, which may come in
abundance and be fully labeled, but which inevitably have
a distribution that is different from real data (Liebelt &
Schmid, 2010; Stark et al., 2010; V
´
azquez et al., 2014; Sun
& Saenko, 2014).
Learning a discriminative classifier or other predictor in
the presence of a shift between training and test distribu-
tions is known as domain adaptation (DA). A number of
approaches to domain adaptation has been suggested in the
context of shallow learning, e.g. in the situation when data
representation/features are given and fixed. The proposed
approaches then build the mappings between the source
(training-time) and the target (test-time) domains, so that
the classifier learned for the source domain can also be ap-
plied to the target domain, when composed with the learned
mapping between domains. The appeal of the domain
adaptation approaches is the ability to learn a mapping be-
tween domains in the situation when the target domain data
are either fully unlabeled (unsupervised domain annota-
tion) or have few labeled samples (semi-supervised domain
adaptation). Below, we focus on the harder unsupervised
case, although the proposed approach can be generalized to
the semi-supervised case rather straightforwardly.
Unlike most previous papers on domain adaptation that
worked with fixed feature representations, we focus on
combining domain adaptation and deep feature learning
within one training process (deep domain adaptation). Our
goal is to embed domain adaptation into the process of
learning representation, so that the final classification de-
cisions are made based on features that are both discrim-
inative and invariant to the change of domains, i.e. have
the same or very similar distributions in the source and the
target domains. In this way, the obtained feed-forward net-
work can be applicable to the target domain without being
hindered by the shift between the two domains.
We thus focus on learning features that combine (i)