3D深度学习中的卷积与稀疏点采样数据表示相结合的4-RoSy表面卷积核方法

191 浏览量更新于2023-10-19 收藏 14.1MB PDF 举报

身份认证购VIP最低享 7 折!

领优惠券(最高得80元）

TextureNet: Consistent Local Parametrizations for Learning fromHigh-Resolution Signals on MeshesJingwei Huang1Haotian Zhang1Li Yi1Thomas Funkhouser2,3Matthias Nießner4Leonidas Guibas1,51Stanford University2Princeton University3Google4Technical University of Munich5Facebook AI Research3D Textured ModelOrientation FieldHigh-res Patch High-res NetworkFeatureSampled PointsGeodesic PatchLabelTextureNetSemantic SegmentationFigure 1: TextureNet takes as input a 3D textured mesh. The mesh is parameterized with a consistent 4-way rotationallysymmetric (4-RoSy) ﬁeld, which is used to extract oriented patches from the texture at a set of sample points. Networks of4-RoSy convolutional operators extract features from the patches and used for 3D semantic segmentation.AbstractWe introduce, TextureNet, a neural network architec-ture designed to extract features from high-resolution sig-nals associated with 3D surface meshes (e.g., color texturemaps). The key idea is to utilize a 4-rotational symmetric (4-RoSy) ﬁeld to deﬁne a domain for convolution on a surface.Though 4-RoSy ﬁelds have several properties favorable forconvolution on surfaces (low distortion, few singularities,consistent parameterization, etc.), orientations are ambigu-ous up to 4-fold rotation at any sample point. So, we intro-duce a new convolutional operator invariant to the 4-RoSyambiguity and use it in a deep network to extract featuresfrom high-resolution signals on geodesic neighborhoods ofa surface. In comparison to alternatives, such as PointNet-based methods which lack a notion of orientation, the co-herent structure given by these neighborhoods results in sig-niﬁcantly stronger features. As an example application, wedemonstrate the beneﬁts of our architecture for 3D semanticsegmentation of textured 3D meshes. The results show thatour method outperforms all existing methods on the basisof mean IoU by a signiﬁcant margin in both geometry-only(6.4%) and RGB+Geometry (6.9-8.2%) settings.1. IntroductionIn recent years, there has been tremendous progress inRGB-D scanning methods that allow reliable tracking andreconstruction of 3D surfaces using hand-held, consumer-grade devices [8, 18, 27, 28, 41, 21, 11]. Though thesemethods are now able to reconstruct high-resolution tex-tured 3D meshes suitable for visualization, understandingthe 3D semantics of the scanned scenes is still a relativelyopen research problem.There has been a lot of recent work on semantic seg-mentation of 3D data using convolutional neural networks(CNNs). Typically, features extracted from the scanned in-puts (e.g., positions, normals, height above ground, colors,etc.) are projected onto a coarse sampling of 3D locations,and then a network of 3D convolutional ﬁlters is trained toextract features for semantic classiﬁcation – e.g., using con-volutions over voxels [42, 25, 30, 36, 9, 13], octrees [33],point clouds [29, 31], or mesh vertices [24]. The advantageof these approaches over 2D image-based methods is thatconvolutions operate directly on 3D data, and thus are rel-atively unaffected by view-dependent image effects, suchas perspective, occlusion, lighting, and background clut-ter. However, the resolution of current 3D representationsis generally quite low (2cm is typical), and so the abilityof 3D CNNs to discriminate ﬁne-scale semantic patterns isusually far below their color image counterparts [23, 16].To address this issue, we propose a new convolutionalneural network, TextureNet, that extracts features directlyfrom high-resolution signals associated with 3D surfacemeshes. Given a map that associates high-resolution sig-nals with a 3D mesh surface (e.g., RGB photographic tex-4440ture), we deﬁne convolutional ﬁlters that operate on thosesignals within domains deﬁned by geodesic surface neigh-borhoods. This approach combines the advantages of fea-ture extraction from high-resolution signals (as in [10]) withthe advantages of view-independent convolution on 3D sur-face domains (as in [39]). This combination is important forthe example in labeling the chair in Figure 1, whose surfacefabric is easily recognizable in a color texture map.During our investigation of this approach, we had to ad-dress several research issues, the most signiﬁcant of whichis how to deﬁne on geodesic neighborhoods of a mesh. Oneapproach could be to compute a global UV parameterizationfor the entire surface and then deﬁne convolutional opera-tors directly in UV space; however, that approach may in-duce signiﬁcant deformations due to ﬂattening, not alwaysfollow surface features, and/or produce seams at surfacecuts. Another approach could be to compute UV param-eterizations for local neighborhoods independently; how-ever, then adjacent neighborhoods might not be orientedconsistently, reducing the ability of a network to properlylearn orientation-dependent features. Instead, we computea 4-RoSy (four-fold rotationally symmetric) ﬁeld on the sur-face using QuadriFlow [17] and deﬁne a new 4-RoSy con-volutional operator that explicitly accounts for the 4-foldrotational ambiguity of the cross ﬁeld parameterization. A4-RoSy (four-way rotationally symmetric) ﬁeld is a conﬁg-uration of 4 orthogonal tangent directions associated witheach vertex in the shape of a cross that varies smoothly overthe mesh surface. Since the 4-RoSy ﬁeld from QuadriFlowhas no seams, aligns to shape features, induces relatively lit-tle distortion, has few singularities, and consistently orientsadjacent neighborhoods (up to 4-way rotations), it providesan attractive trade-off between distortion and orientation in-variance.Results on 3D semantic segmentation benchmarks showan improvement of the 4-RoSy convolution on surfacesover alternative geometry-only approaches (by 6.4%), plussigniﬁcantly further improvement when applied to high-resolution color signals (by 6.9-8.2% ). With ablation stud-ies, we verify the importance of the consistent orientationof a 4-RoSy ﬁeld and demonstrate that our sampling andconvolution operator works better than other alternatives.Overall, our core research contributions are:• a novel learning-based method for extracting featuresfrom high-resolution signals living on surfaces embed-ded in 3D, based on consistent local parameterizations,• a new 4-RoSy convolutional operator designed forcross ﬁelds on general surfaces in 3D,• a new deep network architecture, TextureNet, com-posed of 4-RoSy convolutional operators,• an extensive experimental investigation of alternativeconvolutional operators for semantic segmentation ofsurfaces in 3D.444102. 相关工作03D深度学习。随着3D形状数据库 [42, 7,36]和真实世界标记的3D扫描数据 [35, 1, 9,6]的可用性，对三维数据的深度学习引起了极大的兴趣。早期的工作开发了在3D体素网格上运行的CNN [42,25]。它们已经用于3D形状分类 [30, 33]、语义分割 [9,13]、物体补全 [12]和场景补全[13]。最近，研究人员开发了可以将3D点云作为输入输入到神经网络中并预测对象类别或语义点标签的方法 [29, 31,39, 37, 2]。AtlasNet[14]学习生成3D形状的表面。在我们的工作中，我们利用了稀疏点采样数据表示，然而，我们利用了具有新的4-RoSy表面卷积核的几何表面结构上的高分辨率信号。0网格上的卷积。一些研究人员提出了在流形网格上本质上应用卷积神经网络的方法。FeaStNet[40]提出了一个建立滤波器权重之间对应关系的图操作符。Jiang等人 [20]在非结构化球面网格上应用微分算子。GCNN[24]提出使用由半径和角度参数化的切平面上的离散补丁操作符。然而，他们选择的测地补丁的方向是任意的，并且参数化在高高斯曲率区域高度扭曲或不一致。ACNN[3]观察到这个限制，并引入了从主曲率导出的各向异性热核。MoNet[26]进一步推广了具有可学习高斯核的架构进行卷积。基于主曲率的框架选择方法被Xu等人[43]用于非刚性表面的分割，被Tatarchenko等人[39]用于点云的语义分割，被ADD[4]用于频谱域中的形状对应。它自然地消除了方向的歧义，但在执行特征聚合时未考虑框架不一致性问题，这在室内场景（通常具有许多平面区域，其中主曲率不确定）和真实世界扫描（通常具有噪声和不均匀采样，其中一致的主曲率难以预测）中尤为明显。相比之下，我们定义了一个4-RoSy场，为相邻的卷积域提供一致的方向。0多视角和2D-3D联合学习。其他研究人员研究了如何将RGB输入的特征融入到3D深度网络中。典型的方法是简单地将颜色值分配给体素、点或网格顶点，并将它们视为附加的特征通道。然而，考虑到几何和RGB数据的分辨率差异很大，这种方法会导致颜色信号的显著下采样，因此并不适用。GeodesicSearchGroupingConvolutionAggregation𝟏𝟑𝒋DownsampleUpsample+InterpConcatenateSkip LinkTextureConvMaxpool-2x210x10 Patch4x41x1 High-res Feature𝒇𝒊𝒑𝒊𝜴𝒑𝒊 = {𝒑𝒋𝟏}⋃{𝒑𝒋𝟐}⋃{𝒑𝒋𝟑}{𝒑𝟏}{𝒑𝒋𝟐}{𝒑𝒋𝟑}3. The TextureNet ApproachOur approach performs convolutions on high-resolutionsignals with geodesic convolutions directly on 3D surfacemeshes. The input is a 3D mesh associated with a high-resolution surface signal (e.g., a color texture map), and theoutputs are learned features for a dense set of sample pointsthat can be used for semantic segmentation and other tasks.Our main contribution is deﬁning a smooth, consistentlyoriented domain for surface convolutions based on four-way rotationally symmetric (4-RoSy) ﬁelds. We observethat 3D surfaces can be mapped with low-distortion to two-dimensional parameterizations anchored at dense samplepoints with locally consistent orientations and few singu-larities if we allow for a four-way ambiguity in the orien-tation at the sample points. We leverage that observationin TextureNet by computing a 4-RoSy ﬁeld and point sam-pling using QuadriFlow [17] and then building a networkusing new 4-RoSy convolutional ﬁlters (TextureConv) thatare invariant to the four-way rotational ambiguity.We utilize this network design to learn and extract fea-tures from high-resolution signals on surfaces by extractingsurface patches with high-resolution signals oriented by the4-RoSy ﬁeld at each sample point. The surface patches areconvolved by a few TextureConv layers, pooled at samplepoints, and then convolved further with TextureConv lay-ers in a UNet [34] architecture, as shown in ﬁgure 2. Fordown-sampling and up-sampling, we use the furthest pointsampling and three-nearest neighbor interpolation methodproposed by PointNet++ [31]. The output of the network isa set of features associated with point samples that can beused for classiﬁcation and other tasks. The following sec-tions describe the main components of the network in detail.3.1. High-Resolution Signal RepresentationOur network takes as input a high-resolution signal as-sociated with a 3D surface mesh. In the ﬁrst steps of pro-cessing, it generates a set of sample points on the meshand deﬁnes a parameterized high-resolution patch for eachsample (Section 3.2) as follows: For each sample pointpi, we ﬁrst compute its geodesic neighborhood Ωρ(pi)(Eq. 1) with radius ρ. Then, we sample an NxN point cloud{qxy| − N/2 ≤ x, y < N/2}. The texture coordinates forqxy are ((x+0.5)d, (y +0.5)d) – d is the distance betweenthe adjacent pixels in the texture patch. In practice, we se-lect N = 10 and d = 4mm. Finally, we use our newlyproposed “TextureConv” and max-pooling operators (Sec-tion 3.3) to extract the high-res feature fi for each point pi.3.2. 4-RoSy Surface parameterizationA critical aspect of our network is to deﬁne aconsistently-oriented geodesic surface parameterization forany position on a 3D mesh. Starting with some basic deﬁ-nitions, for a sampled point p on the surface, we can locallyparameterize its tangent plane by two orthogonal tangentvectors i and j. Also, for any point q on the surface, thereexists a shortest path on the surface connecting p and q,e.g., the orange path in ﬁgure 3(a). By unfolding it to thetangent plane, we can map q along the shortest path to q∗.Using these constructs, we deﬁne the local texture coordi-nate q in p’s neighborhood astp(q) =�iTjT �(q∗ − p).We additionally deﬁne the local geodesic neighborhood ofp with receptive ﬁeld ρ asΩρ(p) = {q | ||tp(q)||∞ < ρ}.(1)44420� � ( )0� � ( )0� � ( )0� � � = �(,,,,,,,,,,,,,)0纹理卷积0高分辨率网络0图2：TextureNet架构。我们提出了一个用于分层特征提取的UNet[34]架构。架构中的关键创新是纹理卷积层。我们有效地查询每个表面点的局部测地补丁，将每个邻域与一个局部、方向一致的纹理坐标系相关联。这使我们能够提取局部的3D表面特征以及相关的高分辨率信号，如RGB输入。0充分利用其中的高频模式。另一种方法是将从RGB图像中提取的特征组合在多视角CNN[38]中。这种方法已经用于3D语义分割中的3DMV[10]，其中特征从2DRGB图像中提取，然后反投影到3D体素网格中，然后与3D体素卷积一起合并和进一步处理。与我们的方法类似，3DMV处理高分辨率的RGB信号；然而，它在2D图像平面上进行卷积，其中遮挡和背景杂波会产生混淆。相比之下，我们的方法直接在3D表面上本质上对高分辨率信号进行卷积，这是与视角无关的。𝒑𝒒𝒊𝒋(b) Visualization of the Geodesic Patches(a) Parameterization of the 𝛀𝝆(𝒑)𝒕𝒙𝒕𝒚𝒒∗Figure 3: (a) Local texture coordinates. (b) Visualizationof geodesic neighborhoods Ωρ (ρ = 20 cm) on a set of ran-domly sampled vertices.3.3. 4-RoSy Surface Convolution OperatorTextureNet is a network architecture composed of con-volutional operators acting on geodesic neighborhoods ofsample points with 4-RoSy parameterizations. The input toeach convolutional layer is three-fold: 1) a set of 3D samplepoints associated with features (e.g., RGB, normals, or fea-tures computed from high-resolution surface patches or pre-44430(a) QuadriFlow参数化 (b) 谐波参数化 (c) 几何图像0图4：(a)使用适当的方法如Quadri�ow，我们可以获得与形状特征对齐且扭曲可忽略的表面参数化。(b)谐波参数化导致尺度上的高度扭曲。(c)几何图像[15]导致方向上的高度扭曲。0对于在表面域上学习成功来说，选择网格采样位置集合{ p}及其切向量i和j是至关重要的。理想情况下，我们会选择间距均匀且切向量在邻域中一致定向的点，这样底层参数化就不会有扭曲或接缝，如图4(a)所示。具备这些特性，我们可以像处理图像一样学习具有平移不变性的卷积操作。不幸的是，只有当表面是平面时，才能实现这些特性。对于一般的3D表面，我们只能希望选择一组点采样和切向量，使得点之间的间距和局部表面参数化的扭曲最小化。图4(b)展示了一个谐波表面参数化引入的大尺度扭曲的例子-2D卷积在鼻子处具有大的感受野，而在脖子处具有小的感受野。图4(c)展示了一个几何图像[15]参数化，在方向上具有高度扭曲-在这样的映射上进行卷积会产生随机扭曲和不规则的感受野，使得网络难以学习到规范特征。不幸的是，表面上的平滑变化的方向场通常很难获得。根据方向场设计的研究[32,22]，减小扭曲的最佳方法是计算一个四向旋转对称（4-RoSy）的方向场，通过引入方向模糊来最小化扭曲。此外，方向场需要在不同的几何形状之间具有一致的定义，最直观的方法是使其与主曲率等形状特征对齐。幸运的是，[19,17]使用外部能量来实现这一点。0(a)切割绿线0间隙0(b)切割蓝线 (c)切割橙线0图5：立方体顶点的奇异性，(a)-(c)展示了三种不同的展开局部邻域的方式。通过使用最短路径的纹理坐标定义，我们消除了奇异性周围的歧义。对于紫色点，(a)是一个有效的邻域，而(b)中的蓝色点和(c)中的橙色点沿着不是最短路径的路径展开。类似地，间隙位置的歧义也被消除了。0为了实现这一点，我们使用QuadriFlow[17]在均匀分布的点采样上计算外部4-RoSy方向场，并将其用于定义表面上任意位置的切向量。由于方向模糊，我们随机选择一个方向作为i，并计算j = n ×i，其中n为任意位置的法向量。尽管在表面的这种局部参数化中存在4种旋转的模糊性（将在下一节中通过新的卷积操作解决），但由此产生的4-RoSy场提供了一种在整个表面上一致提取测地邻域的方法，即使在奇异点附近也是如此。图5(a,b,c)展示了奇异点可能展开邻域的歧义性。由于QuadriFlow[17]将奇异点视为面而不是顶点，所有采样位置都具有明确定义的方向场。更重要的是，我们的最短路径补丁参数化确保了每个测地邻域的参数化是明确定义的。例如，只有图5(a)是紫色点的有效参数化，而图5(b)和(c)中的蓝色点和橙色点的位置是沿着不是最短路径的路径展开的。在奇异点周围展开测地邻域还会导致另一个潜在问题，即通常需要进行接缝切割，导致3奇异性处的间隙或5奇异性处的多表面覆盖。例如，图5(a)中右下角有一个由绿色点线表示的接缝切割引起的间隙。幸运的是，我们的最短路径定义也明确定义了接缝的位置：它必须是通过奇异点的最短测地路径。因此，我们对局部邻域的定义保证了在角落和奇异点周围的表面参数化的规范方式。(a) Image Coordinate(b) 3D parametrizationInconsistent(c) Inconsistent FrameFigure 6: (a) Traditional convolution kernel on a regulargrid. (b) Frames deﬁned by the orientation ﬁeld on a 3Dcube. (c) For the patch highlighted in orange in (b), multi-layer feature aggregation would be problematic with tradi-tional convolution due to the frame inconsistency caused bythe directional ambiguity of the orientation ﬁeld.vious layers); 2) a coordinate system stored as two tangentvectors representing the 4-RoSy cross ﬁeld for each pointsample; and 3) a coarse triangle mesh, where each faceis associated with the set of extracted sampled points andconnectivity indices that support fast geodesic patch queryand texture coordinate computation for the samples inside ageodesic neighborhood, much like the PTex [5] representa-tion for textures.Our key contribution in this section is the design of aconvolution operator suitable for 4-RoSy ﬁelds. The prob-lem is that we cannot use traditional 3x3 convolution ker-nels on domains parameterized with 4-RoSy ﬁelds withoutinducing inconsistent feature aggregation at higher levels.Figure 6 demonstrates the problem for a simple example.Figure 6(a) shows 3x3 convolution in a traditional ﬂat do-main. Figure 6(b) shows the frames deﬁned by our 4-RoSyorientation ﬁeld of the 3D cube where red spots representthe singularities.Although the cross-ﬁeld in the orangepatch is consistent under the 4-RoSy metric, the framesare not parallel when they are unfolded into a plane (ﬁg-ure 6(c)). Aggregation of features inside such a patch istherefore problematic.“TextureConv” is our solution to remove the di

下载后可阅读完整内容，剩余1页未读，立即下载