没有合适的资源?快使用搜索试试~ 我知道了~
83560从2D样本学习神经3D纹理空间0p.henzler@cs.ucl.ac.uk Niloy J. Mitra 1 , 20n.mitra@cs.ucl.ac.uk Tobias Ritschel 10t.ritschel@ucl.ac.uk01伦敦大学学院2 Adobe研究0摘要0我们提出了一种具有多样性、视觉保真度和高计算效率的2D和3D自然纹理的生成模型。这得益于一系列方法,这些方法扩展了经典随机过程纹理(Perlin噪声)的思想,以学习、深度和非线性。关键思想是一个硬编码的、可调节的和可微分的步骤,将多个变换后的随机2D或3D场输入到可以在无限域上进行采样的MLP中。我们的模型编码了来自多样纹理集的所有示例,而无需为每个示例重新训练。应用包括纹理插值和从2D示例学习3D纹理。项目网站:https://geometry.cs.ucl.ac.uk/projects/2020/neuraltexture。01. 引言0纹理是2D或3D空间中属性的随机变化,应用于图像理解和合成。本文提出了一种自然纹理的生成模型。以前的纹理模型要么只捕捉单个示例(例如木材),要么处理外观在空间上的非随机(平稳)变化:椅子上的哪个位置应该有木材颜色?哪个位置应该是布料?哪个位置是金属?我们的工作结合了这两种互补的观点。0需求我们设计这个方法族时考虑了几个要求:完整性、生成性、紧凑性、插值、无限域、多样性、无限缩放和高速度。如果每个自然纹理都在该嵌入中具有紧凑的代码z,那么纹理空间是完整的。为了具有生成性,每个纹理代码都应映射到一个有用的纹理。这对于直观设计非常重要,用户操作纹理代码并期望结果是一个纹理。如果代码是低维的,则可以实现紧凑性。我们还要求该方法提供插值:纹理0图1.我们的方法允许将随意捕捉的2D纹理(蓝色)映射到潜在的纹理代码,并支持插值(从蓝色到红色)、投影或体积纹理的合成。0在坐标z1和z2之间生成的纹理代码也应该是有效的。这对于设计或将纹理代码存储到(低分辨率的)2D图像、3D体积或网格顶点中并希望进行插值时非常重要。前四个点对于生成建模是典型的;同时满足更多纹理特定要求(随机性、效率)是我们的主要贡献。首先,我们希望支持无限域:保持纹理代码e固定,我们希望能够查询该纹理,以便任何位置x周围的补丁具有示例的统计信息。这对于在图形应用程序中查询纹理非常重要,以获得扩展的虚拟世界,例如足球场上的草地,它扩展了纹理的大小。其次,为了视觉保真度,纹理的统计信息应与示例相似。VGG激活的Gram矩阵是用于衡量这种相似性的一种已建立的度量标准[5]。第三,无限缩放意味着每个纹理都应该在各种尺度上有变化,而不限于可以保存在内存中的任何固定分辨率。这是为了放大几何细节并欣赏细微的变化,例如木纹等。在实践中,我们受到限制83570我们训练的示例的频率内容,但该方法不应对各个尺度施加任何限制。最后,我们的目标是计算效率:纹理需要在任何维度上都可以查询,而不需要消耗过多的内存或时间。理想情况下,它在内存和并行性方面都是恒定的。这排除了简单的卷积神经网络,因为它们在内存消耗方面对3D不利。02. 前期工作0使用多尺度随机的方式捕捉自然变化的方法有着悠久的历史[14]。使噪声在图形学和视觉中有用是由Perlin在1995年的工作[17]引起的。在这里,纹理是通过在不同频率上计算噪声并与线性权重混合来生成的。一个关键的好处是这种噪声可以在2D和3D中进行评估,因此在许多图形应用中很受欢迎。计算机视觉通常通过示例生成纹理,例如通过非参数采样[4]、矢量量化[25]、优化[12]或最近邻域合成(PatchMatch[2])来实现。这些方法通常在实现空间和时间上保持一致性以及对细节的可扩展性方面仍然存在挑战。这些经典方法满足了Julesz[9]所述的人类纹理感知的要求:纹理是一幅充满特征的图像,在某种表示中具有相同的统计特性。当表示变得学习时,质量的下一个水平是通过内部激活的VGG网络[22]。神经风格转移[5]研究了这些特征的统计特性,特别是它们的Gram矩阵。通过对像素值进行优化,这些方法可以产生具有所需纹理属性的图像。如果这些属性是以现有图像结构为条件的,该过程被称为风格转移。VGG还用于基于优化的多尺度纹理合成[20]。这些方法对每个单独的示例都需要进行优化。Ulyanov等人[23]和Johnson等人[8]提出了直接生成纹理而无需优化的网络。虽然现在网络生成了纹理,但仍然局限于一个示例,并且没有展示出多样性。然而,这些方法也使用了不同分辨率的噪声[17],这也是我们工作的灵感来源。后续工作[24]通过引入显式的多样性项来解决了这个困难,即要求批次中的所有结果都不同。不幸的是,这经常会引入亮度的中频振荡,这些振荡在VGG看来是可以接受的,而不是产生真正的多样性。在我们的工作中,我们通过将网络的输入限制为仅具有随机值来实现多样性,即通过构造多样性。关于“纹理”一词存在一定的混淆。在人类视觉[9]和计算机视觉文献[4, 6]中,它专指随机变化。在0计算机图形学,例如OpenGL,“纹理”可以模拟颜色的随机和非随机变化。例如,Visual Object Networks[29]生成了一个体素表示的形状和漫反射反照率,并将局部颜色外观,例如汽车的车轮是黑色,轮缘是银色等,称为“纹理”。类似地,Oechsle等人[16]和Saito等人[19]使用隐式函数来模拟超过体素分辨率的细节外观变化。我们的比较将显示出,处理非随机纹理变化空间的方法[16,29]不适用于模拟随机外观。我们的工作是朝着学习随机和非随机纹理空间的方向取得进展。一些工作使用对抗性训练来捕捉纹理的本质[21,3],包括非平稳情况[28]甚至在单个图像内[21]。特别是Style-GAN[10]通过在对抗性训练中转换噪声来生成具有细节的图像。我们避免了对抗性训练的挑战,而是训练一个神经网络来匹配VGG统计数据。Aittala等人[1]扩展了Gatsy等人2015年[5]的方法,不仅可以生成颜色,还可以从单个2D示例生成2DBRDF模型参数图的集合。我们的方法与这种方法兼容,例如可以生成3D凹凸、镜面等图,但输入为2D。无论如何,在图形学或视觉中,没有任何一个纹理工作[17, 5, 23, 4, 2, 26,27]生成纹理空间,而所有工作都是在单个纹理上进行,而那些在示例空间上工作的方法[29,16]则不会生成随机纹理。我们的工作弥补了这一空白,创建了一个随机纹理空间。然而,图形学界已经研究了生成纹理空间[15],我们从深度学习的角度重新审视了这个方法。他们的方法将所有示例对彼此进行变形,并构建一个图,其中边缘在有证据表明变形成功时是有效的插值。为了在它们之间进行混合,进行直方图调整。因此,示例之间的插值不是从一个直接到另一个的直线路径,而是沿着有效观察的遍历。类似地,我们的方法也可以在潜在空间插值中构建有效路径。最后,所有这些方法都要求在使用纹理的相同空间中学习纹理,而我们的方法可以在任何维度和跨维度中操作,包括仅从2D观察[11]或切片[18]生成程序化的3D实体纹理的重要情况。0总结 Tbl. 1描绘了现有技术的状态。行列出不同的方法,列出了每种方法的不同方面。如果能够生成多个示例,则该方法是“多样性”的。MLP[16]不具备多样性,因为绝对位置会导致过拟合。如果能够在所有尺度上生成特征,则称该方法具有“细节”。CNN没有细节,因为特别是在3D中,它需要在内存中表示整个域,而MLP和我们的方法则具有。DiverseDetailsSpeed3DQualitySpace2D-to-3• Perlinperlin✓ ✓ ✓ ✓ ✕ ✕ ✕• Perlin + transform perlinT ✓ ✓ ✓ ✓ ✕ ✕ ✕• CNNcnn✕ ✕ ✕ ✕ ✓ ✕ ✕• CNN + diversitycnnD✓ ✕ ✕ ✕ ✕ ✕ ✕• MLPmlp✕ ✕ ✓ ✓ ✕ ✕ ✓• Ours + positionoursP✕ ✓ ✓ ✓ ✕ ✓ ✓• Ours - transformoursNoT ✕ ✕ ✓ ✓ ✓ ✓ ✓• Oursours✓ ✓ ✓ ✓ ✓ ✓ ✓point operations. “Speed” refers to computational efficiency.Due to high bandwidth and lacking data parallelism, a CNN,in particular in 3D, is less efficient than ours. This preventsapplication to “3D”. “Quality” refers to visual fidelity, a sub-jective property. CNN, MLP and ours achieve this, but Perlinis too simple a model. CNN with diversity [24] have decentquality, but a step back from [23]. Our approach creates a“Space” of a class of textures, while all others only work withsingle exemplars. Finally, our approach allows to learn froma single 2D observation i. e., 2D-to-3D. MLP [16] also learnfrom 2D images, but have multiple images of one exemplar,and pixels are labeled with depth.3. OverviewOur approach has two steps. The first embeds the ex-emplar into a latent space using an encoder. The secondprovides sampling at any position by reading noise fields atthat position and combining them using a learned mappingto match the exemplar statistics. We now detail both steps.inference changing the seed ξ and keeping the texture codee will yield diverse textures.===a)b)c)2D exemplar3D result3D result slicesxyzFigure 5.Sliced loss for learning 3D procedural textures from2D exemplars: Our method, as it is non-convolutional, can samplethe 3D texture (a) at arbitrary 3D positions. This enables to alsosample arbitrary 2D slices (b). For learning, this allows to simplyslice 3D space along the three major axes (red, yellow, blue) andask each slice to have the same VGG statistics as the exemplar (c).The loss is the L2 distance of Gram matrix of VGG fea-ture activations [5, 8, 24, 23, 1] of the patches Pe and Ps.If the source and target domain are the same (synthesizing2D textures from 2D exemplars) the slicing operation is theidentity. However, it also allows for the important conditionin which the target domain has more dimensions than thesource domain, such as learning 3D from 2D exemplars.Spaces-ofOur method can be used to either fit a singleexemplar or an entire space of textures. In the single mode,we directly optimize for the trainable parameters θ = {θd}of the decoder. When learning the entire space of textures,the full cascade of encoder g, translator h and sampler sparameters are trained, i. e., θ = {θg, θh, θd} jointly.4. Learning stochastic space coloringHere we will introduce different implementations of sam-plers s: Rn → R3 which “color” 2D or 3D space at positionx. We discuss pros and cons with respect to the requirementsfrom the introduction, ultimately leading to our approach.Perlinnoise is a simple and effective method to generatenatural textures in 2D or 3D [17], defined ass(x|z) =m�i=1noise(2i−1x, ξi) ⊗ wi,(1)where h(z) = {w1, w2, . . .} are the RGB weights for mdifferent noise functions noisei which return bilinearly-sampled RGB values from an integer grid. ⊗ is channel-wisemultiplication. Here, e is a list of all linear per-layer RGBweights e. g., an 8×3 vector for the m = 8 octaves we use.This is a simple latent code, but we will see increasinglycomplex ones later. Also our encoder g is designed such thatit can cater to all decoders, even Perlin noise i. e., we canalso create a space of textures with a Perlin noise back-end.Coordinates x are multiplied by factors of two (octaves),so with increasing i, increasingly smooth noises are com-bined. This is motivated well in the spectra of natural signals[14, 17], but also limiting. Perlin’s linear scaling allows thenoise to have different colors, yet no linear operation canreshape a distribution to match a target. Our work seeks toovercome these two limitations, but tries to retain the desir-able properties of Perlin noise: simplicity and computationalefficiency as well as generalization to 3D.Transformed Perlinrelaxes the scaling by powers of twos(x|z) =m�i=1noise(Ti2i−1x, ξi) ⊗ wi(2)by allowing each noise i to be independently scaledby its own transformation matrix Ti since h(z)={w1, T1, w2, T2, . . .}. Please note, that the choice of noisefrequency is now achieved by scaling the coordinates read-ing the noise. This allows to make use of anisotropic scalingfor elongated structures, different orientations or multiplerandom inputs at the same scale.CNNutilizes the same encoder g as our approach to gen-erate a texture code that is fed in combination with noise toa convolutional decoder similar to [24].s(x|z) = cnn(x|e, noise(ξ))(3)The CNN is conditioned on e without additional translation.Their visual quality is stunning, CNNs are powerful andthe loss is able to capture perceptually important texturefeatures, hence CNNs are a target to chase for us in 2D interms of quality. However, there are two main limitations ofthis approach we seek to lift: efficiency and diversity.CNNs do not scale well to 3D in high resolutions. Tocompute intermediate features at x, they need to have accessto neighbors. While this is effective and output-sensitive in2D, it is not in 3D: we need results for 2D surfaces embeddedin 3D, and do so in spatial high resolution (say 1024×1024),but this requires CNNs to compute a full 3D volume withthe same order of pixels. While in 2D partial outputs can beachieved with sliding windows, it is less clear how to slide awindow in 3D, such that it covers all points required to coverall 3D points that are part of the visible surface.The second issue is diversity: CNNs are great for produc-ing a re-synthesis of the input exemplar, but it has not beendemonstrated that changing the seed ξ will lead to variationin the output in most classic works [23, 8] and in classic styletransfer [5] diversity is eventually introduced due to the ran-domness in SGD. Recent work by Ulyanov and colleagues[24] explicitly incentivizes diversity in the loss. The mainidea is to increase the pixel variance inside all exemplars8359produced in one batch. Regrettably, this often is achieved bymerely shifting the same one exemplar slightly spatially orintroducing random brightness fluctuations.MLPmaps a 3D coordinate to appearance:s(x|z) = mlp(x|e)(4)where h(z) = e. Texture-fields [16] have used this approachto produce what they call “texture”, detailed and high-qualityappearance decoration of 3D surfaces, but what was prob-ably not intended is to produce diversity or any stochasticresults. At least, there is no parameter that introduces anyrandomness, so all results are identical. We took inspirationin their work, as it makes use of 3D point operations, thatdo not require accessing any neighbors and no intermedi-ate storage for features in any dimensions, including 3D. Ithence reduces bandwidth compared to CNN, is perfectlydata-parallel and scalable. The only aspect missing to makeit our colorization operator, required to create a space andevolve from 2D exemplars to 3D textures, is stochasticity.Ourscombines the noise from transformed Perlin forstochasticity, the losses used in style and texture synthe-sis CNNs for quality as well as the point operations in MLPsfor efficiency as follows:s(x|z) = f(noise(T1 20x, ξ1 ), . . . ,noise(Tm2m−1x, ξm)|e)(5)Different from MLPs that take the coordinate x as input,position itself is hidden. Instead of position, we take mul-tiple copies of spatially smooth noise noise(x) as input,with explicit control of how the noise is aligned in space ex-pressed by the transformations T. Hence, the MLP requiresto map the entire distribution of noise values such that itsuits the loss, resulting in build-in diversity. We chose num-ber of octaves m to be 8, i. e., the transformation matricesT1, . . . , Tm require 8 × 4 = 32 values in 2D. The texturecode size e is 64 and the compact code z is 8. The decoderf consists of four stacked linear layers, with 128 units eachfollowed by ReLUs. The last layer is 3-valued RGB.Non-stochastic ablationseeks to investigate what hap-pens if we do not limit our approach to random variables,but also provide access to deterministic information x:s(x|z) = f(x,noise(20x, ξ1 ), . . . ,noise(2m−1x, ξm)|e)(6)is the same as MLP, but with access to noise. We will seethat this effectively removes diversity.Non-transformed ablationevaluates, if our method wereto read only from multi-scale noise without control over howit is transformed. Its definitions(x|z) = f(noise(20x, ξ1 ), . . . ,noise(2m−1x, ξm)|e)(7)5. EvaluationOur evaluation covers qualitative (Sec. 5.2) and quantita-tive (Sec. 5.3) aspects as well as a user study (Sec. 5.4).5.1. ProtocolWe suggest a data set that for which we explore the rela-tion of different methods, according to different metrics toquantify texture similarity and diversity.Data setOur data set contains four classes (WOOD, MAR-BLE, GRASS and RUST) of 2D textures, acquired from inter-net image sources. Each class contains 100 images.MethodsWe compare eight different methods that arecompetitors, ablations and ours.As five competitors we study variants of Perlin noise,CNNs and MLPs. perlin implements Perlin noise (Eq. 1,[17]) and perlinT our variant extending it by a lineartransformation (Eq. 2). Next, cnn is a classic TextureNet[23] and cnnD the extension to incentivise diversity ([24],Eq. 3). mlp uses an MLP following Eq. 4.We study three ablations. First, we compare to oursPthat is our method, but with the absolute position as inputand no transform. Second, oursNoT omits the absoluteposition as input and transformation but still uses Perlin’soctaves (Eq. 7). The final method is ours method (Eq. 5).MetricsWe evaluate methods in respect to three metrics:similarity and diversity and a joint measure, success.Similarity is high, if the result produced has the samestatistics as the exemplar in terms of L2 differences of VGGGram matrices. This is identical to the loss used. Similarityis measured on a single exemplar.Diversity is not part of the loss, but can be measuredon a set of exemplars produced by a method. We measurediversity by looking at the VGG differences between allpairs of results in a set produced for a different randomseed. Note, that this does not utilize any reference. Diversityis maximized by generating random VGG responses, yetwithout similarity.Success of the entire method is measured as the product ofdiversity and the maximum style error minus the style error.We apply this metric, as it combines similarity and diversitythat are conflicting goals we jointly want to maximize.8360Similarity error(less is better)Average20030030k20030030k20030030k20030030kDiversity(more is better)WoodMarbleGrassperlinperlinTcnncnnDmlpoursPoursNoToursSuccess(more is better)Figure 6. Quantitative evaluation. Each plot shows the histogram of a quantity (from top to bottom: success, style error and diversity) fordifferent data sets (from left to right: all space together, WOOD, MARBLE, GRASS). For a discussion, see the last paragraph in Sec. 5.2.Memory and speed are measured at a resolution of 128pixels/voxels on an Nvidia Titan Xp.5.2. Quantitative resultsTable 2. Efficiency in terms of compute time and memory usage in2D and 3D (columns) for different methods (rows).MethodTimeMemory2D3D2D3Dperlin •0.18 ms0.18 ms65 k16 MperlinT •0.25 ms0.25 ms65 k16 Mcnn •1.45 ms551.59 ms8,000 k646 McnnD •1.45 ms551.59 ms8,000 k646 Mmlp •1.43 ms1.43 ms65 k16 MoursP •1.44 ms1.44 ms65 k16 MoursNoT •1.24 ms1.24 ms65 k16 Mours •1.55 ms1.50 ms65 k16 MTime 3D [ms]Time 2D [ms]Memory 2D [log KB] Memory 3D [log KB] 1.505000100010100100001000001100100010EfficiencyWe first look at computational efficiency inTbl. 2. We see that our method shares the speed and memoryefficiency with Perlin noise and MLPs / Texture Fields [16].Using a CNN [23, 24] to generate 3D textures as volumesis not practical in terms of memory, even at
下载后可阅读完整内容,剩余1页未读,立即下载
cpongm
- 粉丝: 5
- 资源: 2万+
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- Java集合ArrayList实现字符串管理及效果展示
- 实现2D3D相机拾取射线的关键技术
- LiveLy-公寓管理门户:创新体验与技术实现
- 易语言打造的快捷禁止程序运行小工具
- Microgateway核心:实现配置和插件的主端口转发
- 掌握Java基本操作:增删查改入门代码详解
- Apache Tomcat 7.0.109 Windows版下载指南
- Qt实现文件系统浏览器界面设计与功能开发
- ReactJS新手实验:搭建与运行教程
- 探索生成艺术:几个月创意Processing实验
- Django框架下Cisco IOx平台实战开发案例源码解析
- 在Linux环境下配置Java版VTK开发环境
- 29街网上城市公司网站系统v1.0:企业建站全面解决方案
- WordPress CMB2插件的Suggest字段类型使用教程
- TCP协议实现的Java桌面聊天客户端应用
- ANR-WatchDog: 检测Android应用无响应并报告异常
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功