神经架构搜索实现超快速逼真风格迁移

需积分: 14 31 浏览量更新于2024-07-09 收藏 6.4MB PDF 举报

"本文介绍了通过神经架构实现超快速逼真风格迁移的研究，旨在解决在保持细节保真度和照片写实主义的同时，忠实于参考照片风格将内容照片转换的问题。研究提出了一个包括构建步骤（C-step）和修剪步骤（P-step）的方法，其中PhotoNet是一种密集自动编码器，用于构建照片写实风格网络，而修剪步骤则用于加速处理。" 《通过神经架构实现超快速逼真风格迁移》随着人工智能技术的发展，特别是深度学习在图像处理领域的广泛应用，风格迁移已经成为一个热门研究领域。风格迁移技术旨在将一张图片（内容图片）的风格应用到另一张图片（参考图片）上，同时保留内容图片的基本结构。然而，实现逼真的风格迁移是一项挑战，因为它需要在生成的图像中既保留参考风格又确保图像看起来像是相机拍摄的。传统的照片写实风格迁移算法往往依赖于后期处理来使生成的图像看起来更真实，但这通常会导致细节丢失或不自然的效果。如果去掉这些额外处理，算法的性能会大幅下降。针对这一问题，该论文提出了一种创新方法，通过构建和优化神经网络架构来实现快速且高质量的风格迁移。该方法分为两个主要步骤：构造步（C-step）和修剪步（P-step）。在C-step中，研究人员设计了一种名为PhotoNet的密集自动编码器。自动编码器是一种深度学习模型，能学习输入数据的压缩表示，并尝试重构原始数据。PhotoNet的独特之处在于其精心设计的预训练过程，这有助于捕捉到图像的细微细节，从而在风格迁移过程中保持高保真度。在P-step中，为了提高效率并降低计算成本，他们对构建的网络进行了修剪。网络修剪是一种优化技术，通过移除对最终结果影响较小的网络参数来减少模型复杂性，同时保持模型性能。这种策略使得生成逼真风格图像的过程不仅快速，而且能够在资源有限的设备上运行。此外，文章可能还详细探讨了实验结果，对比了传统方法和新方法在细节保留、照片写实感和执行速度上的表现。通过大量实验验证，新方法在多个评价指标上都优于现有技术，展示了其在实际应用中的巨大潜力。这篇论文提出的超快速逼真风格迁移方法通过神经架构搜索和网络优化，解决了以往算法存在的问题，为实时和高质量的风格迁移应用提供了新的解决方案。这种方法对于移动设备上的图像处理和实时艺术创作具有重大意义，为未来风格迁移技术的进一步发展奠定了基础。

VGG-19 Stage 1-5

BFA

Inverse VGG

INSL

WCT

(a) Multi-level stylization (PhotoWCT) (b) Multi-stylization on Decoder and INSLs (Ours)

(f) Vanilla + MS-Dec (e) Vanilla

(d) Style

(g) Vanilla + MS-Dec + MS-INSL

VGG-19 Stage 1 Inverse

WCT

VGG-19 Stage 1-2 Inverse

WCT

VGG-19 Stage 1-2 Inverse

WCT

VGG-19 Stage 1-3

Inverse

WCTVGG-19 Stage 1-3

Inverse

WCT

VGG-19 Stage 1-4

Inverse

WCTVGG-19 Stage 1-4

Inverse

WCT

VGG-19 Stage 1-5

Inverse

WCT

VGG-19 Stage 1-5

Inverse

WCT

Figure 4: Multi-stylization Comparison. (a) is the multi-

level stylization strategy used by WCT/PhotoWCT, which

adopts ﬁve distinct auto-encoders in cascade to make style

transfer. (b) is the architecture of our method. Please note

that (b) equals to the auto-encoder in the top blue box in

terms of computation cost. From (e) to (g), we progressively

apply style transfer modules (i.e. WCT) at the bottleneck,

decoder, and INSLs, where MS-Dec and MS-INSL denote

placing transfer module at decoder and INSLs respectively.

As demonstrated in (e-g), MS-Dec and MS-INSL enhance

style transfer effects without sacriﬁcing ﬁne details of the

content. Please see colors of leaves in (e-g).

(a) Input (b) Result by Concat (c) Result by Sum

𝐼

Figure 5: Comparison of “Concat” and “Sum”.

is that SCs placed at low-level layers of an auto-encoder

will short circuit and block the information stream ﬂow into

transfer modules work at the bottleneck. Interestingly, as

shown in Fig. 3 (e), we ﬁnd that WCT

also fails to make

stylization if turn their proposed High-Frequency Compo-

nents Skip Links (HFCS) on and disable the input region

masks. To solve this problem, we introduce the Instance

Normalized Skip Links (namely INSL) as a replacement of

the SC, which applies the Instance Normalization (Ulyanov,

Vedaldi, and Lempitsky 2016) at skip connections. We ﬁnd

that INSL can alleviate the short circuit phenomenon and

strengthen the detail preservation and distortion elimination

abilities of photorealistic style transfer networks. Please re-

fer to Fig 3 (f) for the result produced with INSLs.

Multi-stylization. Multi-stylization means make style trans-

fer repeatedly. As shown in Fig. 4 (a), WCT and PhotoWCT

adopt a strategy called multi-level stylization. They train

ﬁve auto-encoders and make stylization for ﬁve rounds in

(a) Input (b) Result by Upsampling (c) Result by Unpooling

Figure 6: Comparison of “Upsampling” and “Unpool-

ing”.

(a) Input (b) Use AdaIN (c) Use WCT

Figure 7: Comparison of using AdaIN and WCT as trans-

fer module. Using WCT as transfer module (c) achieves

more faithful photorealistic stylization effects against using

AdaIN (b).

a coarse-to-ﬁne manner. Instead of that, WCT

proposes

progressive stylization, which uses a single round auto-

encoder but progressively executes style transfer modules

multi times at every part of the auto-encoder. Following

WCT

, we adopt a single-round multi stylization strategy

but only transfer features at the decoder and INSLs. Fig. 4

(b) illustrates our strategy. As demonstrated in Fig. 4 (e-g),

MS-Dec and MS-INSL can signiﬁcantly improve the pro-

duced results in terms of stylization effects. Moreover, ap-

plying style transfer modules at INSLs (Fig. 4 (g)) can fur-

ther eliminate the short circuit phenomenon caused by SC

and strengthen the stylization effects.

Concat v.s. Sum. The choice of “concat” and “sum” opera-

tors when using skip links is a factor that may inﬂuence the

performance of auto-encoders. However, we ﬁnd that using

“concat” generally has no speciﬁc difference against using

“sum” except little style ﬂuctuation. Please refer to Fig. 5

(b) (c) for comparison.

Upsampling v.s. Unpooling. PhotoWCT argues that the un-

pooling tends to make the network produce fewer distor-

tions. However, we ﬁnd that these two operators produce al-

most the same results in our settings. Please refer to Fig. 6

(b) (c) for comparison.

WCT v.s. AdaIN. WCT and AdaIN are two widely used

transfer modules that come from artistic style transfer. As

demonstrated in Fig. 7 (b) (c), WCT can produces more

faithful transfer results. We think this is because AdaIN need

to work with the auto-encoder trained in a more complicated

way. However, we just train the decoder to reconstruct im-

ages to facilitate the following pruning step.

C-Step

Based on the analysis on architecture components that have

signiﬁcant inﬂuence on photorealistic style transfer effects,

we construct an auto-encoder named PhotoNet.

剩余19页未读，继续阅读

DeepLearning小舟

粉丝: 2391
资源: 57

神经架构搜索实现超快速逼真风格迁移

Direct observation of the ultrafast energy transfer in a porphyrin and ruthenium dyad

UltraFast Design Methodology Guide for the Vivado Design Suite.pdf

ffmpeg 编码 ultrafast 码率

ffmepg 转码 ultrafast 码率

ultrafast-lane-detection原理

ffmepg 转码 ultrafast

vivado的ultrafast

ffmpeg c++ ultrafast

Ultrafast, SiGe, Open-Collector HVDS Clock/Data Buffer是什么芯片

"-preset", "ultrafast", "-tune", "zerolatency", "-f", "mpegts",参数使用

最新资源