无监督语义解析转换驱动的人像图像生成

201 浏览量更新于2024-08-26 收藏 1.17MB PDF 举报

"这篇研究论文探讨了无监督的人体图像生成技术，特别是处理非刚性变形的挑战。通过引入语义解析转换，论文提出了一种新的方法，将困难的直接映射分解为两个更容易处理的子任务：语义解析转换和外观生成。这种方法首先使用语义生成网络来变换语义解析图，简化非刚性变形的学习。其次，一个外观生成网络学习合成具有语义意识的纹理。最后，证明了端到端训练框架可以进一步细化语义地图并优化最终结果。这种方法适用于其他语义感知的人体图像生成任务，如服装纹理转移和条件生成。" 在这篇研究论文中，作者们聚焦于无监督的人体图像生成，这是一个由于人体非刚性变形而极具挑战性的领域。以往的方法通常尝试直接学习人体之间的硬性映射，但这种方法往往难以处理复杂的变形问题。论文的创新之处在于提出了一个新的策略，将这个映射过程分解为两部分。第一部分是语义解析转换。研究人员设计了一个语义生成网络，它的目标是将不同的人体姿态的语义解析图（即表示人体部位和衣物的二值图）进行转换。这种转换有助于简化非刚性变形的学习，因为它允许网络逐步理解和处理人体各部分的相对位置和形状变化。第二部分是外观生成。在这个阶段，一个外观生成网络被用来根据转换后的语义解析图生成具有语义意识的纹理。这意味着生成的图像不仅在结构上符合输入的姿势，而且在视觉细节和纹理上也与输入的语义信息一致，比如衣物的颜色和图案。为了进一步提升生成图像的质量和语义准确性，作者们展示了他们的框架可以通过端到端的方式进行训练。这样，模型能够自我调整，不断优化语义地图的生成，并改进最终的图像生成结果。此外，论文还指出，该方法的通用性使其适用于其他与语义相关的任务，例如将一种服装的纹理转移到另一种姿态的人体上，或者在特定条件下生成满足特定语义约束的图像。这表明，这项工作对于推动无监督的人体图像生成和语义理解在虚拟试衣、游戏设计、视频编辑等领域的应用具有重要意义。

Unsupervised Person Image Generation with Semantic Parsing Transformation

Sijie Song

, Wei Zhang

, Jiaying Liu

1∗

, Tao Mei

Institute of Computer Science and Technology, Peking University, Beijing, China

JD AI Research, Beijing, China

Abstract

In this paper, we address unsupervised pose-guided per-

son image generation, which is known challenging due to

non-rigid deformation. Unlike previous methods learning a

rock-hard direct mapping between human bodies, we pro-

pose a new pathway to decompose the hard mapping in-

to two more accessible subtasks, namely, semantic pars-

ing transformation and appearance generation. Firstly, a

semantic generative network is proposed to transform be-

tween semantic parsing maps, in order to simplify the non-

rigid deformation learning. Secondly, an appearance gen-

erative network learns to synthesize semantic-aware tex-

tures. Thirdly, we demonstrate that training our frame-

work in an end-to-end manner further reﬁnes the semantic

maps and ﬁnal results accordingly. Our method is gener-

alizable to other semantic-aware person image generation

tasks, e.g., clothing texture transfer and controlled image

manipulation. Experimental results demonstrate the supe-

riority of our method on DeepFashion and Market-1501

datasets, especially in keeping the clothing attributes and

better body shapes.

1. Introduction

Pose-guided image generation has attracted great atten-

tions recently, which is to change the pose of the person im-

age to a target pose, while keeping the appearance details.

This topic is of great importance in fashion and art domains

for a wide range of applications from image / video editing,

person re-identiﬁcation to movie production.

With the development of deep learning and generative

model [

8], many researches have been devoted to pose-

guided image generation [

19, 21, 5, 27, 26, 1, 20]. Initial-

ly, this problem is explored under the fully supervised set-

ting [

19, 27, 26, 1]. Though promising results have been p-

resented, their training data has to be composed with paired

images (i.e., same person in the same clothing but in differ-

ent poses). To tackle this data limitation and enable more

∗

Corresponding author. This work was done at JD AI Research.

Our project is available at

https://github.com/SijieSong/

person_generation_spt.git

Figure 1: Visual results of different methods on DeepFash-

ion [

18]. Compared with PG

[19], Def-GAN [27], and

UPIS [

21], our method successfully keeps the clothing at-

tributes (e.g., textures) and generates better body shapes

(e.g., arms).

ﬂexible generation, more recent efforts have been devot-

ed to learning the mapping with unpaired data [

21, 5, 20].

However without “paired” supervision, results in [

21] are

far from satisfactory due to the lack of supervision. Dis-

entangling image into multiple factors (e.g., background /

foreground, shape / appearance) is explored in [

20, 5]. But

ignoring the non-rigid human-body deformation and cloth-

ing shapes leads to compromised generation quality.

Formally, the key challenges of this unsupervised task

are in three folds. First, due to the non-rigid nature of hu-

man body, transforming the spatially misaligned body-parts

is difﬁcult for current convolution-based networks. Sec-

ond, clothing attributes, e.g., sleeve lengths and textures,

are generally difﬁcult to preserve during generation. How-

ever, these clothing attributes are crucial for human visual

perception. Third, the lack of paired training data gives little

clue in establishing effective training objectives.

To address these aforementioned challenges, we propose

to seek a new pathway for unsupervised person image gen-

eration. Speciﬁcally, instead of directly transforming the

person image, we propose to transform the semantic parsing

between poses. On one hand, translating between person

2357

下载后可阅读完整内容，剩余9页未读，立即下载

weixin_38713203

粉丝: 11
资源: 942

无监督语义解析转换驱动的人像图像生成

domain-transfer-network, 无监督交叉域图像生成的TensorFlow实现.zip

dSRVAE:通过CVPR2020中的变分自动编码器实现无监督的真实图像超分辨率

深度解析：图像caption技术路线与应用改进

深度学习进阶：图像语义分割的深度学习方法

图像处理聚类魔法：图像分割与识别技术全解析

揭秘语义分割的奥秘：深度解析算法、挑战与机遇

数字图像处理技术简介与基础原理解析

图像处理入门：卷积神经网络基础解析

【深度解析】：图像识别算法的5大核心原理，专家级教程！

【图像识别技术深度解析】：掌握从原理到应用的全攻略

最新资源