没有合适的资源?快使用搜索试试~ 我知道了~
无监督域自适应方法及多级UDA的研究进展
Gotta Adapt ’Em All: Joint Pixel and Feature-LevelDomain Adaptation for Recognition in the WildLuan Tran1∗ Kihyuk Sohn2Xiang Yu2Xiaoming Liu1Manmohan Chandraker2,31Michigan State University2NEC Labs America3UC San DiegoAbstractRecent developments in deep domain adaptation haveallowed knowledge transfer from a labeled source domainto an unlabeled target domain at the level of intermediatefeatures or input pixels. We propose that advantages may bederived by combining them, in the form of different insightsthat lead to a novel design and complementary propertiesthat result in better performance. At the feature level, in-spired by insights from semi-supervised learning, we proposea classification-aware domain adversarial neural networkthat brings target examples into more classifiable regions ofsource domain. Next, we posit that computer vision insightsare more amenable to injection at the pixel level. In partic-ular, we use 3D geometry and image synthesis based on ageneralized appearance flow to preserve identity across posetransformations, while using an attribute-conditioned Cycle-GAN to translate a single source into multiple target imagesthat differ in lower-level properties such as lighting. Besidesstandard UDA benchmark, we validate on a novel and aptproblem of car recognition in unlabeled surveillance imagesusing labeled images from the web, handling explicitly spec-ified, nameable factors of variation through pixel-level andimplicit, unspecified factors through feature-level adaptation.1. IntroductionDeep learning has made an enormous impact on many ap-plications in computer vision such as generic object recogni-tion [22, 44, 48, 17], fine-grained categorization [59, 21, 41],object detection [26, 27, 28, 36, 37], semantic segmenta-tion [6, 42] and 3D reconstruction [53, 52]. Much of itssuccess is attributed to the availability of large-scale labeledtraining data [8, 15]. However, this is hardly true in manypractical scenarios: since annotation is expensive, most dataremains unlabeled. Consider car recognition problem fromsurveillance images, where factors such as camera angle,distance, lighting or weather condition are different acrosslocations. It is not feasible to exhaustively annotate all theseimages. Meanwhile, there exists abundant labeled data from∗This work is done when L. Tran was an intern at NEC Labs America.source source ≠ target target feature vision insight semi-supervised learning insight pixel feature pixel source ≈ target FeaturePixel–CycleGANMKF+AC-CGAN(ours)–55.064.379.7DANN60.464.878.0DANN-CA (ours)75.877.784.2Table 1: Our framework for unsupervised domain adaptation atmultiple semantic levels: at feature-level, we bring insights fromsemi-supervised learning to obtain highly discriminative domain-invariant representations; at pixel-level, we leverage complementarydomain-specific vision insights e.g., geometry and attributes. Ourjoint pixel and feature-level DA demonstrates significant improve-ment over individual adaptation counterparts as well as other com-peting methods such as CyCADA (CycleGAN+DANN) [18] on carrecognition in surveillance domain under UDA setting. Please seeSection 5 for complete experimental analysis.web domain [21, 62, 12], but with very different image char-acteristics that precludes direct transfer of discriminativeCNN-based classifiers. For instance, web images mightbe from catalog magazines with professional lighting andground-level camera poses, while surveillance images canoriginate from cameras atop traffic lights with challenginglighting and weather conditions.Unsupervised domain adaptation (UDA) is a promisingtool to overcome the lack of labeled training data problem intarget domains. Several approaches aim to match distribu-tions between source and target domains at different levelsof representations, such as feature [57, 56, 11, 45, 31] orpixel levels [49, 43, 66, 3]. Certain adaptation challenges arebetter handled in the feature space, but feature-level DA isa black-box algorithm for which adding domain-specific in-sights during adaptation is more difficult than in pixel space.On the contrary, pixel space is much higher-dimensionaland the optimization problem is under-determined. How to12672effectively combine them has become an open challenge.In this work we address this challenge by leveraging com-plementary tools that are better-suited at each level (seefigure in Table 1). Specifically, we posit that feature-levelDA is more amenable to techniques from semi-supervisedlearning (SSL), while pixel-level DA allows domain-specificinsights from computer vision. In Section 3, we present ourfeature-level DA method called classification-aware domainadversarial neural network (DANN-CA) that jointly param-eterizes the classifier and domain discriminator inspired byan instance of SSL algorithm [40]. We show this to be a gen-eralization of DANN [11] to incorporate constraints (Fig. 1)that guide discriminator to easily find major modes corre-sponding to classes in the feature space, and in turn put targetexamples into more classifiable regions via adversarial loss.A challenge for pixel-level DA is to simultaneously trans-form source image properties at multiple semantic levels. InSection 4, we present pixel-level DA by image transforma-tions that make use of vision concepts to deal with differentvariation factors, such as photometric or geometric transfor-mations (Fig. 2),1 for recognition in surveillance domain. Tohandle low-level transformations, we propose an attribute-conditioned CycleGAN (AC-CGAN) that extends [66] togenerate multiple target images with different attributes. Tohandle high-level identity-preserving pose transformations,we use an appearance flow (AF) [65], an warping-basedimage synthesis tool. To reduce semantic gaps between syn-thetic and real images, we propose a generalization of AFwith 2D keypoints [25] as a domain bridge.In Section 5, we evaluate our framework on car recog-nition in surveillance images from the comprehensive cars(CompCars) dataset [62]. We define an experimental proto-col with web images as labeled source domain and surveil-lance images as unlabeled target domain. We explicitly han-dle nameable factors of variation such as pose and lightingthrough pixel-level DA, while other nuisance factors are han-dled by feature-level DA. As in Table 1, we achieve 84.20%accuracy, reducing error by 64.9% from a model trainedonly on the source domain. We present ablation studies todemonstrate the importance of each adaptation componentby extensively evaluating performances with various mix-tures of components. We further validate the effectivenessof our proposed feature-level DA methods on standard UDAbenchmarks, namely digits and traffic signs [11] and office-31 [38], achieving state-of-the-art recognition performance.In summary, the contributions of our work are:• A novel UDA framework that adapts at multiple seman-tic levels from feature to pixel, with complementaryinsights for each type of adaptation.• For feature-level DA, a connection of DANN to a semi-1Our framework is unsupervised DA in the sense that we don’t requirerecognition labels from the target domain for training, but it uses sideannotations to inject insights from vision concepts for pixel-level adaptation.supervised variant, motivating a novel regularization viaclassification-aware domain adversarial neural network.• For pixel-level DA, an attribute-conditioned CycleGANto translate a source image into multiple target imageswith different attributes, along with an warping-basedimage synthesization for identity-preserving pose trans-lations via a keypoint-based appearance flow.• A new experimental protocol on car recognition insurveillance domain, with detailed analysis of variousmodules and efficacy of our UDA framework.• State-of-the-art performance on standard UDA bench-marks, such as office-31 and digits, traffic signs adapta-tion tasks, with our feature-level DA method.Due to a large volume of our work, we put additional detailin Section S1–S6 of the supplementary material at www.nec-labs.com/˜mas/jointDA.2267302. 相关工作0无监督域自适应。根据域自适应的理论发展[ 2 , 1],一个主要的挑战是定义一个适当的度量来衡量域之间的差异。最大均值差异[ 29 , 57 , 9 , 56 , 47],它基于核函数来衡量差异,以及域对抗神经网络[ 11 , 4 ,3 , 45 , 46],它使用鉴别器来衡量差异,已经取得了成功。注意到UDA和SSL之间问题设置的相似性,已经有人尝试将SSL的思想结合起来。例如,熵最小化[ 14 ]已经被用于域对抗损失[ 30 ,31]之外。我们的特征级DA是建立在DANN上的,通过解决鉴别器在特征空间中发现模式的问题。我们的公式也与SSL紧密相连,我们解释了为什么熵最小化对于DANN是必要的。透视变换。以前的工作[ 61 , 23 , 51]提出了编码器-解码器网络来生成目标视点的输出图像。透视变换的对抗学习[ 54 , 55 , 63]已经在将视点与其他外观因素分离方面表现出良好的性能,但在非配对设置中仍然存在概念(例如类别标签)的切换。与其学习输出分布,[ 65 , 34]提出了一种基于变形的视点合成,通过估计像素级的流场。我们将其扩展到使用2D关键点[ 25]等合成到真实图像的域不变表示来改进对真实图像的泛化能力。图像到图像的翻译。随着GAN在图像生成方面的成功[13 , 35 ],条件GAN的变体[ 32]已经成功地应用于图像到图像的翻译问题,包括配对[ 19]和非配对[ 43 , 49 , 66 ]训练设置。我们的模型扩展了[ 66]的工作,用于非配对设置中的图像翻译,使用控制变量或视觉属性[ 60]生成多个输出。多级UDA。像素级和特征级的组合326740在[ 18]中尝试了适应性,但我们在几个重要方面有所不同。具体而言,我们进一步利用了SSL的见解,为特征级DA提供了新的正则化方法,同时利用GAN中的3D几何和基于属性的条件来同时处理高级姿态和低级光照变化。我们的实验包括对互补效益以及各种适应模块的有效性的详细研究。虽然[ 18]考虑了语义分割等问题,但我们研究了一个突出了在各个层次上进行适应的需求的汽车识别问题。我们还在标准UDA基准测试中展示了最先进的结果。03. 域对抗特征学习本节描述了一个分类感知的域对抗神经网络(图1(b)),通过联合参数化分类器和鉴别器,改进了域对抗神经网络[ 11 ]。符号表示。 令 X S , X T � X 为源域和目标域数据集,Y= { 1 , ..., N } 为类别标签集合。令 f : X → R K0将特征生成器,例如CNN,参数为 θ f ,将输入 x ∈ X映射为一个 K -维向量。03.1. 回顾:域对抗神经网络 域对抗训练 [ 11 ]的目标是通过使两个域的特征分布不可区分,将从标记的源域学习到的分类器适应到未标记的目标域。这是通过一个域鉴别器 D : R K → (0 , 1)来实现的,该鉴别器告诉我们两个域的特征是否仍然可区分。然后, f 被训练来混淆 D,同时正确地对源数据进行分类:0最大化 θ c {L C = E X S log C ( f, y ) } (1)0最大化 θ d {L D = E X S log(1 − D ( f )) + E X T log D ( f ) } (2)0max θf {LF = LC + λEXT log(1 − D(f))} (3)0C : R K × Y → (0,1)是一个类别得分函数,输出输入x属于N个类别中的类别y的概率,即C(f(x), y) = P(y|f(x);θc)。λ平衡分类和域对抗损失。参数{θc,θd}和{θf}使用随机梯度下降交替更新。03.2.分类感知对抗学习我们注意到,无监督域自适应的问题设置与半监督学习的问题设置没有区别,只要我们去除域的概念。受到GANs半监督学习公式[40,7]的启发,我们提出了一种新的域对抗学习目标,将分类器和鉴别器联合参数化如下:0max θc {LC = EXS log C(y) + EXT log C(N+1)} (4)0max θf {LF = EXS log C(y|Y) + λEXT log(1 − C(N+1))} (5)0模型#分类器#源#0CNN#鉴别器#(D=2)#目标#0共享#共享#0CNN#模型#分类器#源#0CNN#鉴别器#(D=1)#目标#0共享#0(a) DANN(基线)0CNN#源#0CNN#目标#0共享#0CNN#源#0CNN#目标#0共享#0分类器#(C=1,...,N#/#N+1)#0分类器#(C=N+1#/#N+1)#0共享#0分类器#(C=1,...,N#/#N)#0分类器#(C
下载后可阅读完整内容,剩余1页未读,立即下载
cpongm
- 粉丝: 5
- 资源: 2万+
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- Fisher Iris Setosa数据的主成分分析及可视化- Matlab实现
- 深入理解JavaScript类与面向对象编程
- Argspect-0.0.1版本Python包发布与使用说明
- OpenNetAdmin v09.07.15 PHP项目源码下载
- 掌握Node.js: 构建高性能Web服务器与应用程序
- Matlab矢量绘图工具:polarG函数使用详解
- 实现Vue.js中PDF文件的签名显示功能
- 开源项目PSPSolver:资源约束调度问题求解器库
- 探索vwru系统:大众的虚拟现实招聘平台
- 深入理解cJSON:案例与源文件解析
- 多边形扩展算法在MATLAB中的应用与实现
- 用React类组件创建迷你待办事项列表指南
- Python库setuptools-58.5.3助力高效开发
- fmfiles工具:在MATLAB中查找丢失文件并列出错误
- 老枪二级域名系统PHP源码简易版发布
- 探索DOSGUI开源库:C/C++图形界面开发新篇章
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功