In this paper, to perform adaptation to the target domain
without source data, we first propose Local Structure Cluster-
ing (LSC), that clusters each target feature together with its
nearest neighbors. The motivation is that one target feature
should have similar prediction with its semantic close neigh-
bors. To keep source performance, we propose to use sparse
domain attention (SDA), applied to the output of the feature
extractor, activating different feature channels depending
on the particular domain. The source domain attention will
be used to regularize the gradient during target adaptation
to prevent forgetting of source information. With LSC and
SDA, the adapted model can achieve excellent performance
on both source and target domains. In the experiments, we
show that for target performance our method is on par with
or better than existing DA and SFDA methods on several
benchmarks, specifically achieving state-of-the-art perfor-
mance on VisDA (85.4%), while simultaneously keeping
good source performance. We also extend our method to
Continual Source-free Domain Adaptation, where there is
more than one target domain, further demonstrating the effi-
ciency of our method.
We summarize our contributions as follows:
•
We propose a new domain adaptation paradigm denoted
as Generalized Source-free Domain Adaptation (G-
SFDA), where the source-pretrained model is adapted
to target domains while keeping the performance on the
source domain, in the absence of source data.
•
We propose local structure clustering (LSC) to achieve
source-free domain adaptation, which utilizes local
neighbor information in feature space.
•
We propose Sparse domain attention (SDA) which acti-
vates different feature channels for different domains,
and regularizes the gradient of back propagation dur-
ing target adaptation to keep information of the source
domain.
•
In experiments, we show that where existing methods
suffer from forgetting and obtain bad performance on
the source domain, our method is able to maintain
source domain performance. Furthermore, when fo-
cusing on the target domain our method is on par with
or better than existing methods, especially we achieve
state-of-the-art target performance on VisDA.
2. Related Works
Here we discuss related domain adaptation settings.
Domain Adaptation.
Early domain adaptation methods
such as [
21
,
37
,
39
] adopt moment matching to align feature
distributions. Inspired by adversarial learning, DANN [
7
]
formulates domain adaptation as an adversarial two-player
game. CDAN [
22
] trains a deep networks conditioned on
several sources of information. DIRT-T [
35
] performs do-
main adversarial training with an added term that penalizes
violations of the cluster assumption. Domain adaptation has
also been tackled from other perspectives. MCD [
31
] adopts
prediction diversity between multiple learnable classifiers to
achieve local or category-level feature alignment between
source and target domains. DAMN [
3
] introduces a frame-
work where each domain undergoes a different sequence of
operations. AFN [
44
] shows that the erratic discrimination
of target features stems from much smaller norms than those
found in source features. SRDC [
38
] proposes to directly
uncover the intrinsic target discrimination via discriminative
clustering to achieve adaptation. The most relevant paper
to our LSC is DANCE [
29
], which is for universal domain
adaptation and based on neighborhood clustering. But they
are based on instance discrimination [
43
] between all fea-
tures, while our method applies consistency regularization
on only a few semantically close neighbors.
Source-free Domain Adaptation.
Normal domain adap-
tation methods require access to source data during adap-
tation. Recently, there are several methods investigating
source-free domain adaptation. USFDA [
14
] and FS [
15
]
explore the source-free universal DA [
48
] and open-set
DA [
32
], DECISION [
2
] is for multi-source DA. Related to
our work are SHOT [
20
] and 3C-GAN [
18
], both for close-
set DA. SHOT proposes to fix the source classifier and match
the target features to the fixed classifier by maximizing mu-
tual information and pseudo label. 3C-GAN synthesizes
labeled target-style training images based on conditional
GAN. Recently, BAIT [
46
] extends diverse classifier based
domain adaptation methods to also be applicable for SFDA.
Though achieving good target performance, these methods
cannot maintain source performance after adaptation. Other
than these methods, we aim to maintain source-domain per-
formance after adaptation.
Continual Domain Adaptation.
Continual learning
(CL) [
13
,
19
,
23
,
25
] specifically focuses on avoiding catas-
trophic forgetting when learning new tasks, but it is not
tailored for DA since new tasks in CL usually have labeled
data. Recently, a few works [
4
,
26
,
36
] have emerged that
aim to tackle the Continual Domain Adaptation (CDA) prob-
lem. [
4
] uses sample replay to avoid forgetting together with
domain adversarial training, [
26
] builds a domain relation
graph, and [
36
] builds a domain-specific memory buffer for
each domain to regularize the gradient on both target and
memory buffer. Although these methods achieve good per-
formance, they all demand access to source data. And [
16
]
is source-free but they focus on class incremental single
target domain adaptation where there is only one-shot la-
beled target data per class, while our method is related to