注意力与自适应归一化：U-GAT-IT无监督图像转译方法

需积分: 12 165 浏览量更新于2024-07-09 收藏 9.27MB PDF 举报

本文档标题为 "U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation"，由 Junho Kim、Minjae Kim、Hyeonwoo Kang 和 Kwanghee Lee 等作者合作完成，他们分别来自 NCSOFT 和 Boeing Korea Engineering and Technology Center。该研究主要关注的是无监督图像到图像转换的方法，这是一种在端到端学习框架下提出的创新技术。论文的核心贡献是提出了一种新型的注意力模块（Attention Module）和自适应层实例归一化（Adaptive Layer-Instance Normalization, AdaLIN）功能。这些组件使得模型能够更加智能地处理不同类型的图像转换任务。传统的注意力机制在区分源域和目标域时依赖于辅助分类器生成的注意力地图，这有助于模型聚焦于重要的区域。然而，先前的方法往往难以处理领域之间的几何变化，U-GAT-IT模型则突破了这一限制，不仅适用于整体变化显著的图像，也适用于需要较大形状变化的情况。 AdaLIN 是一个关键创新，它允许模型通过学习参数灵活控制形状和纹理的变化，使其对不同数据集的适应性更强。相比于传统的归一化技术，AdaLIN具有更高的灵活性和动态调整能力，这在训练过程中能够更好地保留原始图像特征的同时，根据输入图像的具体情况进行转化。 U-GAT-IT方法提供了一种无监督图像翻译的新途径，其注意力机制和自适应归一化策略显著提升了模型的性能和适应性，使得在无需标注数据的情况下，能够更有效地进行跨领域图像转换，并且能够处理各种复杂的图像变换需求。这项工作对于计算机视觉领域的图像生成任务具有重要意义，特别是在无监督学习和跨模态图像转换的研究中。

Figure 1. The model architecture of U-GAT-IT. The detailed notations are described in Section 3.1.

Cycle loss To alleviate the mode collapse problem, we

apply a cycle consistency constraint to the generator. Given

an image x ∈ X

, after the sequential translations of x from

to X

and from X

to X

, the image should be success-

fully translated back to the original domain:

s→t

cycle

= E

x∼X

[|x − G

t→s

s→t

(x)))|

] (5)

Identity loss To ensure that the color distributions of in-

put image and output image are similar, we apply an iden-

tity consistency constraint to the generator. Given an image

x ∈ X

, after the translation of x using G

s→t

, the image

should not change.

s→t

identity

= E

x∼X

[|x − G

s→t

(x)|

] (6)

CAM loss By exploiting the information from the auxil-

iary classiﬁers η

and η

, given an image x ∈ {X

, X

s→t

and D

get to know where they need to improve or

what makes the most difference between two domains in

the current state:

s→t

cam

= −(E

x∼X

[log(η

(x))]

+ E

x∼X

[log(1 − η

(x))],

(7)

cam

= E

x∼X

[(η

(x))

]

+ E

x∼X

[log(1 − η

s→t

(x)))

]

(8)

Full objective Finally, we jointly train the encoders, de-

coders, discriminators, and auxiliary classiﬁers to optimize

the ﬁnal objective:

min

s→t

t→s

,η

max

,η

gan

cycle

+ λ

identity

+ λ

cam

(9)

where λ

= 1, λ

= 10, λ

= 1000. Here,

gan

= L

s→t

gan

+ L

t→s

gan

and the other losses are deﬁned in

the similar way (L

cycle

, L

identity

, and L

cam

)

4. Implementation

4.1. Network architecture

The encoder of the generator is composed of two convo-

lution layers with the stride size of two for down-sampling

and four residual blocks. The decoder of the generator con-

sists of four residual blocks and two up-sampling convolu-

tion layers with the stride size of one. Note that we use the

instance normalization for the encoder and AdaLIN for the

decoder, respectively. In general, LN does not perform bet-

ter than batch normalization in classiﬁcation problems [37].

Since the auxiliary classiﬁer is connected from the encoder

in the generator, to increase the accuracy of the auxiliary

classiﬁer we use the instance normalization(batch normal-

ization with a mini-batch size of 1) instead of the AdaLIN.

Spectral normalization [30] is used for the discriminator.

We employ two different scales of PatchGAN [15] for the

discriminator network, which classiﬁes whether local (70 x

70) and global (286 x 286) image patches are real or fake.

For the activation function, we use ReLU in the generator

and leaky-ReLU with a slope of 0.2 in the discriminator.

4.2. Training

All models are trained using Adam [19] with β

=0.5 and

=0.999. For data augmentation, we ﬂipped the images

horizontally with a probability of 0.5, resized them to 286 x

286, and random cropped them to 256 x 256. The batch size

is set to one for all experiments. We train all models with

a ﬁxed learning rate of 0.0001 until 500,000 iterations and

linearly decayed up to 1,000,000 iterations. We also use a

weight decay at rate of 0.0001. The weights are initialized

from a zero-centered normal distribution with a standard de-

viation of 0.02.

剩余16页未读，继续阅读

shuterlo

粉丝: 0
资源: 7

注意力与自适应归一化：U-GAT-IT无监督图像转译方法

voclone:U-GAT-IT背景注入的实现，用于分离语音转换GAN的噪声

UART接口.pdf

gat-linux-x86_64-3.1716.3.c.zip

gat-linux-x86_64-3.1420.1.c.zip

gat-win32-x86_64-3.1716.new.rar

gat-win32-x86_64-3.1837.5.c

MTK平台日志分析工具 gat-win32-x86_64-3.1809.4.c

MTK平台日志分析工具 gat-win32-x86_64-3.1716.3

gat_predictioin.zip

3-信息安全技术_网络安全保护等级定级指南(GAT_1389-2017)_解读.doc

最新资源