没有合适的资源?快使用搜索试试~ 我知道了~
首页深度视觉类比生成:神经网络驱动的图像转换
"深度视觉类比制作(Deep Visual Analogy-Making)是一项重要的计算机视觉任务,它涉及到理解图像内容、关联图像以及生成相关的图像。近期,深度卷积神经网络在预测图像标签、注释和描述方面取得了显著突破,但生成高质量的图像仍然是一个挑战。本论文的主要贡献在于开发了一种端到端训练的新型深度网络,旨在执行视觉类比任务,即根据一对相关的示例图像来转换查询图像。 视觉类比问题的核心在于同时准确识别视觉关系并相应地生成变形后的图像。为了实现这一点,研究者借鉴了自然语言处理中的最新进展,提出将图像映射到一个神经嵌入空间,这个空间中的向量运算(如减法和加法)能够简化类比推理过程。具体来说,该模型通过学习图像与嵌入向量的映射,使得类比推理变得直观且易于处理。 实验结果显示,我们的模型在多个视觉类比数据集上表现出色,能够在保持原有图像特征的同时,准确地进行视觉关系的转化,生成出符合类比逻辑的新图像。这不仅提升了图像生成的质量,也为深入理解图像间的复杂关系提供了新的途径。这项工作对于推动计算机视觉技术的发展,尤其是生成式模型在图像生成领域的应用具有重要意义,未来可能被应用于艺术创作、虚拟现实、图像检索等领域,从而极大地增强人工智能系统的图像理解能力。"
资源详情
资源推荐
Deep Visual Analogy-Making
Scott Reed Yi Zhang Yuting Zhang Honglak Lee
University of Michigan, Ann Arbor, MI 48109, USA
{reedscot,yeezhang,yutingzh,honglak}@umich.edu
Abstract
In addition to identifying the content within a single image, relating images and
generating related images are critical tasks for image understanding. Recently,
deep convolutional networks have yielded breakthroughs in predicting image la-
bels, annotations and captions, but have only just begun to be used for generat-
ing high-quality images. In this paper we develop a novel deep network trained
end-to-end to perform visual analogy making, which is the task of transforming a
query image according to an example pair of related images. Solving this problem
requires both accurately recognizing a visual relationship and generating a trans-
formed query image accordingly. Inspired by recent advances in language mod-
eling, we propose to solve visual analogies by learning to map images to a neural
embedding in which analogical reasoning is simple, such as by vector subtraction
and addition. In experiments, our model effectively models visual analogies on
several datasets: 2D shapes, animated video game sprites, and 3D car models.
1 Introduction
Humans are good at considering “what-if?” questions about objects in their environment. What if
this chair were rotated 30 degrees clockwise? What if I dyed my hair blue? We can easily imagine
roughly how objects would look according to various hypothetical questions. However, current
generative models of images struggle to perform this kind of task without encoding significant prior
knowledge about the environment and restricting the allowed transformations.
Infer Relationship
Transform query
Figure 1: Visual analogy making concept. We learn
an encoder function f mapping images into a space
in which analogies can be performed, and a decoder
g mapping back to the image space.
Often, these visual hypothetical questions
can be effectively answered by analogi-
cal reasoning.
1
Having observed many
similar objects rotating, one could learn
to mentally rotate new objects. Having
observed objects with different colors (or
textures), one could learn to mentally re-
color (or re-texture) new objects.
Solving the analogy problem requires the
ability to identify relationships among im-
ages and transform query images accord-
ingly. In this paper, we propose to solve the problem by directly training on visual analogy comple-
tion; that is, to generate the transformed image output. Note that we do not make any claim about
how humans solve the problem, but we show that in many cases thinking by analogy is enough to
solve it, without exhaustively encoding first principles into a complex model.
We denote a valid analogy as a 4-tuple A : B :: C : D, often spoken as “A is to B as C is to D”. Given
such an analogy, there are several questions one might ask:
• A ? B :: C ? D - What is the common relationship?
• A : B ? C : D - Are A and B related in the same way that C and D are related?
• A : B :: C : ? - What is the result of applying the transformation A : B to C?
1
See [2] for a deeper philosophical discussion of analogical reasoning.
1
下载后可阅读完整内容,剩余8页未读,立即下载
kaichu2
- 粉丝: 850
- 资源: 71
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 李兴华Java基础教程:从入门到精通
- U盘与硬盘启动安装教程:从菜鸟到专家
- C++面试宝典:动态内存管理与继承解析
- C++ STL源码深度解析:专家级剖析与关键技术
- C/C++调用DOS命令实战指南
- 神经网络补偿的多传感器航迹融合技术
- GIS中的大地坐标系与椭球体解析
- 海思Hi3515 H.264编解码处理器用户手册
- Oracle基础练习题与解答
- 谷歌地球3D建筑筛选新流程详解
- CFO与CIO携手:数据管理与企业增值的战略
- Eclipse IDE基础教程:从入门到精通
- Shell脚本专家宝典:全面学习与资源指南
- Tomcat安装指南:附带JDK配置步骤
- NA3003A电子水准仪数据格式解析与转换研究
- 自动化专业英语词汇精华:必备术语集锦
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功