DragGAN：交互式点操作生成图像操纵技术

需积分: 5 105 浏览量更新于2024-08-03 收藏 11.5MB PDF 举报

"Drag Your GAN Interactive Point-based Manipulation Generative" 该资源主要涉及的是人工智能领域中的图像生成与编辑技术，具体来说是关于一种名为DragGAN的交互式点基操纵生成图像的方法。DragGAN允许用户通过直观地在生成的GAN（生成对抗网络）图像上点击控制点（把手点）和目标点，实现对图像内容的精确编辑。这一技术显著提升了用户在图像生成过程中的参与度和控制力，使得非专业人员也能进行复杂的图像修改。在传统的GAN模型中，生成的图像往往无法直接进行精确的局部修改。而DragGAN的出现解决了这个问题，它引入了一种灵活的点基操纵机制，用户只需在图像上标识出需要移动的控制点（红色）和期望达到的位置（蓝色），系统将自动处理这些点的平滑移动，确保它们精确到达目标位置。此外，用户还可以绘制一个可变形区域的遮罩（明亮区域），以保持图像中未被选中区域的不变性，这使得用户能够独立调整图像的各种空间属性，如姿态、形状和表情等。论文的作者来自多个知名研究机构，包括德国马克斯·普朗克信息研究所、美国麻省理工学院CSAIL、谷歌AR/VR部门以及美国宾夕法尼亚大学等，这表明这项研究具有高度的专业性和前沿性。对于人工智能行业的学生、程序员、产品经理以及从业者来说，这类研究资料对于理解当前AI技术的发展趋势、提升自身技能和进行创新研究都是非常有价值的。通过学习DragGAN的方法，可以深入理解如何将用户的交互输入有效地融入到生成模型中，以及如何在生成图像的多维特征空间中实现精确操控。这对于进一步开发更智能、更易用的图像编辑工具，或是在虚拟现实、增强现实、电影特效等领域应用有重要的启示作用。同时，这种交互式的图像生成方法也可能推动AI在设计、艺术、娱乐等领域的应用，让人工智能技术更加贴近用户需求，提高用户体验。 DragGAN是一种革命性的技术，它简化了对GAN生成图像的编辑过程，为用户提供了一种直观且强大的图像操纵手段。对于想要深入了解和研究人工智能，特别是图像生成与编辑技术的人来说，这是一个不可或缺的参考资料。

Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold SIGGRAPH ’23 Conference Proceedings, August 6–10, 2023, Los Angeles, CA, USA

Controllability using Unconditional GANs. Several methods have

been proposed for editing unconditional GANs by manipulating the

input latent vectors. Some approaches nd meaningful latent direc-

tions via supervised learning from manual annotations or prior 3D

models [Abdal et al

2021; Leimkühler and Drettakis 2021; Patashnik

et al

2021; Shen et al

2020; Tewari et al

2020]. Other approaches

compute the important semantic directions in the latent space in

an unsupervised manner [Härkönen et al

2020; Shen and Zhou

2020; Zhu et al

2023]. Recently, the controllability of coarse object

position is achieved by introducing intermediate “blobs" [Epstein

et al

2022] or heatmaps [Wang et al

2022b]. All of these approaches

enable editing of either image-aligned semantic attributes such as

appearance, or coarse geometric attributes such as object position

and pose. While Editing-in-Style [Collins et al

2020] showcases

some spatial attributes editing capability, it can only achieve this by

transferring local semantics between dierent samples. In contrast

to these methods, our approach allows users to perform ne-grained

control over the spatial attributes using point-based editing.

GANWarping [Wang et al

2022a] also use point-based editing,

however, they only enable out-of-distribution image editing. A few

warped images can be used to update the generative model such

that all generated images demonstrate similar warps. However, this

method does not ensure that the warps lead to realistic images.

Further, it does not enable controls such as changing the 3D pose

of the object. Similar to us, UserControllableLT [Endo 2022] en-

ables point-based editing by transforming latent vectors of a GAN.

However, this approach only supports editing using a single point

being dragged on the image and does not handle multiple-point

constraints well. In addition, the control is not precise, i.e., after

editing, the target point is often not reached.

3D-aware GANs. Several methods modify the architecture of the

GAN to enable 3D control [Chan et al

2022, 2021; Chen et al

2022;

Gu et al

2022; Pan et al

2021; Schwarz et al

2020; Tewari et al

2022; Xu et al

2022]. Here, the model generates 3D representations

that can be rendered using a physically-based analytic renderer.

However, unlike our approach, control is limited to global pose or

lighting.

Diusion Models. More recently, diusion models [Sohl-Dickstein

et al

2015] have enabled image synthesis at high quality [Ho et al

2020; Song et al

2020, 2021]. These models iteratively denoise a

randomly sampled noise to create a photorealistic image. Recent

models have shown expressive image synthesis conditioned on text

inputs [Ramesh et al

2022; Rombach et al

2021; Saharia et al

2022].

However, natural language does not enable ne-grained control

over the spatial attributes of images, and thus, all text-conditional

methods are restricted to high-level semantic editing. In addition,

current diusion models are slow since they require multiple denois-

ing steps. While progress has been made toward ecient sampling,

GANs are still signicantly more ecient.

2.2 Point Tracking

To track points in videos, an obvious approach is through optical

ow estimation between consecutive frames. Optical ow estimation

is a classic problem that estimates motion elds between two images.

Conventional approaches solve optimization problems with hand-

crafted criteria [Brox and Malik 2010; Sundaram et al

2010], while

deep learning-based approaches started to dominate the eld in

recent years due to better performance [Dosovitskiy et al

2015;

Ilg et al

2017; Teed and Deng 2020]. These deep learning-based

approaches typically use synthetic data with ground truth optical

ow to train the deep neural networks. Among them, the most

widely used method now is RAFT [Teed and Deng 2020], which

estimates optical ow via an iterative algorithm. Recently, Harley

et al. [2022] combines this iterative algorithm with a conventional

“particle video” approach, giving rise to a new point tracking method

named PIPs. PIPs considers information across multiple frames and

thus handles long-range tracking better than previous approaches.

In this work, we show that point tracking on GAN-generated

images can be performed without using any of the aforementioned

approaches or additional neural networks. We reveal that the fea-

ture spaces of GANs are discriminative enough such that tracking

can be achieved simply via feature matching. While some previous

works also leverage the discriminative feature in semantic segmen-

tation [Tritrong et al

2021; Zhang et al

2021], we are the rst to

connect the point-based editing problem to the intuition of discrim-

inative GAN features and design a concrete method. Getting rid of

additional tracking models allows our approach to run much more

eciently to support interactive editing. Despite the simplicity of

our approach, we show that it outperforms the state-of-the-art point

tracking approaches including RAFT and PIPs in our experiments.

3 METHOD

This work aims to develop an interactive image manipulation method

for GANs where users only need to click on the images to dene

some pairs of (handle point, target point) and drive the handle points

to reach their corresponding target points. Our study is based on

the StyleGAN2 architecture [Karras et al

2020]. Here we briey

introduce the basics of this architecture.

StyleGAN Terminology. In the StyleGAN2 architecture, a 512 di-

mensional latent code

𝒛 ∈ N (

, 𝑰 )

is mapped to an intermediate

latent code

𝒘 ∈ R

512

via a mapping network. The space of

𝒘

is com-

monly referred to as

𝒘

is then sent to the generator

𝐺

to produce

the output image

I = 𝐺 (𝒘)

. In this process,

𝒘

is copied several times

and sent to dierent layers of the generator

𝐺

to control dierent

levels of attributes. Alternatively, one can also use dierent

𝒘

for

dierent layers, in which case the input would be

𝒘 ∈ R

𝑙×512

= W

where

𝑙

is the number of layers. This less constrained

space is

shown to be more expressive [Abdal et al

2019]. As the generator

𝐺

learns a mapping from a low-dimensional latent space to a much

higher dimensional image space, it can be seen as modelling an

image manifold [Zhu et al. 2016].

3.1 Interactive Point-based Manipulation

An overview of our image manipulation pipeline is shown in Fig. 2.

For any image

I ∈ R

3×𝐻×𝑊

generated by a GAN with latent code

𝒘

, we allow the user to input a number of handle points

{𝒑

𝑖

(𝑥

𝑝,𝑖

, 𝑦

𝑝,𝑖

)|𝑖 =

, ..., 𝑛}

and their corresponding target points

{𝒕

𝑖

(𝑥

𝑡,𝑖

, 𝑦

𝑡,𝑖

)|𝑖 =

, ..., 𝑛}

(i.e., the corresponding target point of

𝒑

𝑖

𝒕

𝑖

). The goal is to move the object in the image such that the

剩余10页未读，继续阅读

TechLeadKrisChang

粉丝: 4w+
资源: 246

DragGAN：交互式点操作生成图像操纵技术

Drag Your GAN: Interactive Point-based Manipulation on the Gener

drag-drop-folder-tree(功能强大的动态树)

基于HTML5 拖拽接口(Drag and drag-and-drop interfaces based on HTML5

drag-drop-folder-tree.rar_Tree 菜单_drag drop java_drag-drop-fold

AS3 Drag Manager ---- 拖动管理器

Drag U from PC-----linux下锁定键盘鼠标，bash实现

drag-and-drop-html-javascript

drag-and-drop-audio-player

draft-js-drag-n-drop-upload-plugin

Drag-Drop-React-TaskList

最新资源