实时任意风格迁移：基于自适应实例正则化的创新方法

需积分: 47 73 浏览量更新于2024-09-07 1 收藏 7.91MB PDF 举报

"Adaptive Instance Normalization (AdaIN) 是一种实时任意风格迁移的创新方法，由Xun Huang和Serge Belongie等人在Cornell University的计算机科学与Cornell Tech部门提出。AdaIN层是该技术的核心，它通过匹配内容特征的均值和方差与风格特征的均值和方差，实现在快速神经网络中进行任意风格的迁移。这种方法不仅显著提高了风格转移的速度，而且克服了传统方法对固定风格集的依赖，能够适应任意新的风格，从而实现真正的实时风格转换功能。" 在深度学习领域，实例正则化（Instance Normalization）是一种用于图像处理的技术，它通过标准化每个输入实例的特征，减少内部协变量位移，从而提高模型的泛化能力。然而，传统的实例正则化不适用于风格迁移任务，因为它没有考虑到不同风格之间的差异。风格迁移是一种将图像内容与另一图像风格结合的技术，最初由Gatys等人提出的基于优化的方法实现，但这种方法迭代过程缓慢，不适合实时应用。为了加速风格迁移，研究人员提出了使用前馈神经网络的快速近似方法。尽管这些方法大大提升了速度，但它们通常局限于预定义的一组风格，无法灵活地适应新的风格。 AdaIN层的引入解决了这一问题。AdaIN不是简单地对所有特征进行标准化，而是动态地调整每个实例的标准化参数，使其与目标风格的特征统计信息相匹配。具体来说，AdaIN层计算风格图像的特征的均值μ_s和方差σ_s，然后在内容图像的特征上应用这些统计信息，即公式表示为：y = σ_s / σ_c * (x - μ_c) + μ_s，其中x是内容特征，μ_c和σ_c分别是内容特征的均值和方差，μ_s和σ_s则是风格特征的对应值，y是输出的标准化特征。这种自适应方法使得在保持高速运行的同时，风格迁移模型可以适应任意给定的风格图像，极大地扩展了风格迁移的应用范围，特别是在实时应用场景中，如手机应用或图形用户界面。通过AdaIN，用户可以即时看到内容图像被各种风格重新渲染的效果，而无需等待耗时的优化过程。总结起来，AdaIN层是实时任意风格迁移的关键，它通过动态调整实例正则化参数，实现了风格的快速适应和迁移，为深度学习在图像风格转换领域的应用开辟了新的道路。

0 1000 2000 3000 4000 5000

Iteration

Style Loss (×10

)

Batch Norm

Instance Norm

(a) Trained with original images.

0 1000 2000 3000 4000 5000

Iteration

Style Loss (×10

)

Batch Norm

Instance Norm

(b) Trained with contrast normalized images.

0 1000 2000 3000 4000 5000

Iteration

Style Loss (×10

)

Batch Norm

Instance Norm

Figure 1. To understand the reason for IN’s effectiveness in style transfer, we train an IN model and a BN model with (a) original images

in MS-COCO [36], (b) contrast normalized images, and (c) style normalized images using a pre-trained style transfer network [24]. The

improvement brought by IN remains signiﬁcant even when all training images are normalized to the same contrast, but are much smaller

when all images are (approximately) normalized to the same style. Our results suggest that IN performs a kind of style normalization.

(x) =

h=1

w=1

nchw

− µ

(x))

+  (6)

Another difference is that IN layers are applied at test

time unchanged, whereas BN layers usually replace mini-

batch statistics with population statistics.

3.3. Conditional Instance Normalization

Instead of learning a single set of afﬁne parameters γ

and β, Dumoulin et al. [11] proposed a conditional instance

normalization (CIN) layer that learns a different set of pa-

rameters γ

and β

for each style s:

CIN(x; s) = γ



x − µ(x)

σ(x)



+ β

(7)

During training, a style image together with its index

s are randomly chosen from a ﬁxed set of styles s ∈

{1, 2, ..., S} (S = 32 in their experiments). The con-

tent image is then processed by a style transfer network

in which the corresponding γ

and β

are used in the CIN

layers. Surprisingly, the network can generate images in

completely different styles by using the same convolutional

parameters but different afﬁne parameters in IN layers.

Compared with a network without normalization layers,

a network with CIN layers requires 2F S additional param-

eters, where F is the total number of feature maps in the

network [11]. Since the number of additional parameters

scales linearly with the number of styles, it is challenging to

extend their method to model a large number of styles (e.g.,

tens of thousands). Also, their approach cannot adapt to

arbitrary new styles without re-training the network.

4. Interpreting Instance Normalization

Despite the great success of (conditional) instance nor-

malization, the reason why they work particularly well for

style transfer remains elusive. Ulyanov et al. [52] attribute

the success of IN to its invariance to the contrast of the con-

tent image. However, IN takes place in the feature space,

therefore it should have more profound impacts than a sim-

ple contrast normalization in the pixel space. Perhaps even

more surprising is the fact that the afﬁne parameters in IN

can completely change the style of the output image.

It has been known that the convolutional feature statistics

of a DNN can capture the style of an image [16, 30, 33].

While Gatys et al. [16] use the second-order statistics as

their optimization objective, Li et al. [33] recently showed

that matching many other statistics, including channel-wise

mean and variance, are also effective for style transfer. Mo-

tivated by these observations, we argue that instance nor-

malization performs a form of style normalization by nor-

malizing feature statistics, namely the mean and variance.

Although DNN serves as a image descriptor in [16, 33], we

believe that the feature statistics of a generator network can

also control the style of the generated image.

We run the code of improved texture networks [52] to

perform single-style transfer, with IN or BN layers. As

expected, the model with IN converges faster than the BN

model (Fig. 1 (a)). To test the explanation in [52], we then

normalize all the training images to the same contrast by

performing histogram equalization on the luminance chan-

nel. As shown in Fig. 1 (b), IN remains effective, sug-

gesting the explanation in [52] to be incomplete. To ver-

ify our hypothesis, we normalize all the training images to

the same style (different from the target style) using a pre-

trained style transfer network provided by [24]. According

to Fig. 1 (c), the improvement brought by IN become much

smaller when images are already style normalized. The re-

maining gap can explained by the fact that the style nor-

malization with [24] is not perfect. Also, models with BN

trained on style normalized images can converge as fast as

models with IN trained on the original images. Our results

indicate that IN does perform a kind of style normalization.

Since BN normalizes the feature statistics of a batch of

剩余10页未读，继续阅读

暮云凌轩

粉丝: 1
资源: 2

实时任意风格迁移：基于自适应实例正则化的创新方法

WadaIN-VC:正式实现“具有权重自适应实例归一化的单发语音转换”

pytorch-AdaIN:非官方的pytorch实现“使用自适应实例规范化实时进行任意样式转换” [Huang +，ICCV2017]

Python-ArbitraryStyleTransferinRealtimewithAdaptiveInstanceNormalization

adaptation_systems:自适应系统配置理论中的舒适执行任务的脚本（0AL5524）

借助语义相关正则化促进零散学习

matlab偏差代码-domain-adaptation:具有随机期望最大化的域自适应

神经网络正则化的部分图推理_Partial Graph Reasoning for Neural Network Regular

Adaptive Image Denoising by Mixture Adaptation (EM Adaptation)：一种学习有效图像先验以进行图像去噪的 EM 自适应方法-matlab开发

2021-域自适应-医学图像分析-Domain Adaptation for Medical Image Analysis

几何感知的无监督域自适应_Geometry-Aware Unsupervised Domain Adaptation.pdf

最新资源