软门翘曲GAN：解决几何变化的姿势引导人像合成

155 浏览量更新于2024-08-26 收藏 2.34MB PDF 举报

"这篇研究论文提出了一种名为软门翘曲生成对抗网络（Warping-GAN）的新方法，专门用于解决在大型几何变换下的图像合成问题，特别是针对以任意姿势为条件的人像合成。在人像合成领域，现有的生成模型在处理复杂的姿势变换时往往表现不佳，因为它们不能有效地处理重度遮挡、不同视角和外观显著变化等挑战。Warping-GAN采用两阶段策略，首先合成目标姿势的分割图，然后利用软门翘曲块进行特征级映射，将原始图像的纹理渲染到生成的分割图上。这种方法能够适应不同的变形程度，并且其轻量级和灵活性使得软门翘曲块可以融入任何网络架构。实验结果表明，Warping-GAN在人体感知和定量评估中表现出优越性能，超过了现有的所有方法，在两个大型数据集上的表现尤为突出。" 本文的核心创新点在于软门翘曲块（Soft-Gated Warping Blocks）。这些块通过引入软门机制，能够更好地处理几何变化和空间位移，特别是在处理人体图像时的复杂姿态转换。软门机制允许在保持高级结构约束的同时，进行更灵活的特征映射。第一阶段，网络生成目标姿势的分割图，定义了各个身体部位的空间布局。第二阶段，软门翘曲块在特征级别工作，学习如何将原始图像的纹理准确地映射到由第一阶段生成的分割图上，从而实现高保真度的图像合成。 Warping-GAN的这种两阶段方法解决了传统模型中的局限性，尤其是在处理大规模几何变换时，能有效应对重度遮挡和不同视角带来的挑战。同时，由于其设计的通用性，软门翘曲块可以作为模块集成到其他网络中，增强了模型的适用性。实验部分，论文通过与现有方法的对比以及人体感知评估，验证了Warping-GAN在人像姿势引导的图像合成任务上的优越性能。这项工作对于理解如何在图像合成中处理复杂的几何变换提供了新的见解，并为未来的研究提供了一个强大而灵活的工具。通过克服当前模型的局限，Warping-GAN有望在人像编辑、虚拟试衣、动画和游戏等领域产生广泛的影响。

and quantitatively, especially for large pose variation. Additionally, human perceptual study further

indicates the superiority of our model that achieves remarkably higher scores compared to other

methods with more realistic generated results.

2 Relation Works

Image Synthesis

. Driven by remarkable results of GANs [

], lots of researchers leveraged GANs to

generate images [

]. DCGANs [

] introduced an unsupervised learning method to effectively

generate realistic images, which combined convolutional neural networks (CNNs) with GANs.

Pix2pix [

] exploited a conditional adversarial networks (CGANs) [

] to tackle the image-to-image

translation tasks, which learned the mapping from condition images to target images. CycleGAN [

DiscoGAN [

], and DualGAN [

] each proposed an unsupervised method to generate the image

from two domains with unlabeled images. Furthermore, StarGAN [

] proposed a uniﬁed model for

image-to-image transformations task towards multiple domains, which is effective on young-to-old,

angry-to-happy, and female-to-male. Pix2pixHD [

] used two different scales residual networks

to generate the high-resolution images by two steps. These approaches are capable of learning to

generate realistic images, but have limited scalability in handling posed-based person synthesis,

because of the unseen target poses and the complex conditional appearances. Unlike those methods,

we proposed a novel Soft-Gated Warping-GAN that pays attention to pose alignment in deep feature

space and deals with textures rendering on the region-level for synthesizing person images.

Person Image Synthesis

. Recently, lots of studies have been proposed to leverage adversarial

learning for person image synthesis. PG2 [

] proposed a two-stage GANs architecture to synthesize

the person images based on pose keypoints. BodyROI7 [

] applied disentangle and restructure

methods to generate person images from different sampling features. DSCF [

] introduced a special

U-Net [

] structure with deformable skip connections as a generator to synthesize person images

from decomposed and deformable images. AUNET [

] presented a variational U-Net for generating

images conditioned on a stickman (more artiﬁcial pose information), manipulating the appearance

and shape by a variational Autoencoder. Skeleton-Aided [

] proposed a skeleton-aided method

for video generation with a standard pix2pix [

] architecture, generating human images base on

poses. [

] proposed a modular GANs, separating the image into different parts and reconstructing

them by target pose. [

] essentially used CycleGAN [

] to generate person images, which applied

conditioned bidirectional generators to reconstruct the original image by the pose. VITON [

] used

a coarse-to-ﬁne strategy to transfer a clothing image into a ﬁxed pose person image. CP-VTON [

]

learns a thin-plate spline transformation for transforming the in-shop clothes into ﬁtting the body

shape of the target person via a Geometric Matching Module (GMM). However, all methods above

share a common problem, ignoring the deep feature maps misalignment between the condition and

target images. In this paper, we exploit a Soft-Gated Warping-GAN, including a pose-guided parser to

generate the target parsing, which guides to render textures on the speciﬁc part segmentation regions,

and a novel warping-block to align the image features, which produces more realistic-look textures

for synthesizing high-quality person images conditioned on different poses.

3 Soft-Gated Warping-GAN

Our goal is to change the pose of a given person image to another while keeping the texture

details, leveraging the transformation mapping between the condition and target segmentation maps.

We decompose this task into two stages: pose-guided parsing and Warping-GAN rendering. We

ﬁrst describe the overview of our Soft-Gated Warping-GAN architecture. Then, we discuss the

pose-guided parsing and Warping-GAN rendering in details, respectively. Next, we present the

warping-block design and the pipeline for estimating transformation parameters and warping images,

which beneﬁts to generate realistic-looking person images. Finally, we give a detailed description of

the synthesis loss functions applied in our network.

3.1 Network Architectures

Our pipeline is a two-stage architecture for pose-guided parsing and Warping-GAN rendering

respectively, which includes a human parsing parser, a pose estimator, and the afﬁne [

]/TPS [

]

(Thin-Plate Spline) transformation estimator. Notably, we make the ﬁrst attempt to estimate the

剩余10页未读，继续阅读

weixin_38731479

粉丝: 3
资源: 916

软门翘曲GAN：解决几何变化的姿势引导人像合成

iPERCore:注意液体翘曲GAN

图像自动翘曲和拼接

氮化铝烧结翘曲机理分析

视觉slam计算翘曲损失的作用，以及什么是翘曲损失和翘曲损失对图像的影响

时间翘曲算法如何应用在气体分类中

“氮化铝烧结翘曲“参考文献

通过动态时间翘曲算法度量出来的时间序列的相似度有什么用

package warpage measurement

rmse matlab APAP

调Q和锁模技术区别与联系

最新资源