KPConv：点云可变形卷积

9 浏览量更新于2023-10-12 收藏 15.95MB PDF 举报

身份认证购VIP最低享 7 折!

领优惠券(最高得80元）

are coupled with corresponding features like colors. In thiswork, we will always consider a point cloud as those two el-ements: the points P ∈ RN×3 and the features F ∈ RN×D.Such a point cloud is a sparse structure that has the propertyto be unordered, which makes it very different from a grid.However, it shares a common property with a grid whichis essential to the deﬁnition of convolutions: it is spatiallylocalized. In a grid, the features are localized by their in-dex in a matrix, while in a point cloud, they are localized bytheir corresponding point coordinates. Thus, the points areto be considered as structural elements, and the features asthe real data.Various approaches have been proposed to handle suchdata, and can be grouped into different categories that wewill develop in the related work section.Several meth-ods fall into the grid-based category, whose principle is toproject the sparse 3D data on a regular structure where aconvolution operation can be deﬁned more easily [23, 28,33]. Other approaches use multilayer perceptrons (MLP) toprocess point clouds directly, following the idea proposedby [47, 25].More recently, some attempts have been made to designa convolution that operates directly on points [2, 44, 19, 14,13]. These methods use the spatial localization property of64110KPConv：用于点云的灵活可变形卷积0Hugues Thomas 1 Charles R. Qi 2 Jean-Emmanuel Deschaud 1 Beatriz Marcotegui 10Franc¸ois Goulette 1 Leonidas J. Guibas 2 , 301 Mines ParisTech 2 Facebook AI Research 3 Stanford University0摘要0我们提出了Kernel Point Convolution1（KPConv），一种新的点卷积设计，即在没有任何中间表示的点云上操作。KPConv的卷积权重通过核点在欧几里得空间中定位，并应用于靠近它们的输入点。KPConv可以使用任意数量的核点，比固定网格卷积更具灵活性。此外，这些位置在空间上是连续的，并且可以由网络学习。因此，KPConv可以扩展为学习将核点适应局部几何的可变形卷积。由于正则子采样策略，KPConv在不同密度下也具有高效性和鲁棒性。无论是在复杂任务中使用可变形的KPConv，还是在简单任务中使用刚性的KPConv，我们的网络在多个数据集上都优于最先进的分类和分割方法。我们还提供消融研究和可视化结果，以理解KPConv学到了什么，并验证了可变形KPConv的描述能力。01. 引言0深度学习的兴起以离散卷积作为其基本构建块，极大地推动了现代计算机视觉的发展。这个操作将2D网格上的局部邻域数据进行组合。由于这种规则结构，它可以在现代硬件上高效计算，但是当剥夺了这种规则结构时，卷积操作仍然需要像在2D网格上一样高效地定义。随着3D扫描技术的兴起，许多依赖于这种非规则数据的应用也随之增长。例如，3D点云分割或3D同时定位和建图依赖于非网格结构的数据：点云。点云是3D（或更高维）空间中的一组点。在许多应用中，这些点与颜色等相应的特征耦合。在这项工作中，我们将始终将点云视为这两个元素：点P ∈ R N × 3和特征F ∈ R N ×D。这样的点云是一种稀疏结构，具有无序的属性，这使得它与网格非常不同。然而，它与网格共享一个对于卷积定义至关重要的共同属性：它是空间局部化的。在网格中，特征通过其在矩阵中的索引进行定位，而在点云中，它们通过其相应的点坐标进行定位。因此，点应被视为结构元素，而特征应被视为真实数据。已经提出了各种方法来处理这种数据，并且可以分为不同的类别，我们将在相关工作部分进行详细介绍。几种方法属于基于网格的类别，其原则是将稀疏的3D数据投影到可以更容易定义卷积操作的规则结构上[23，28，33]。其他方法直接使用多层感知器（MLP）处理点云，遵循[47，25]提出的思想。最近，一些尝试直接在点上进行卷积的方法已经被设计出来[2，44，19，14，13]。这些方法利用点云的空间定位属性来定义具有空间核的点卷积。它们共享的思想是卷积应该定义一组可定制的空间滤波器，局部应用于点云中。本文介绍了一种名为Kernel PointConvolution（KPConv）的新型点卷积算子。KPConv也由一组局部3D滤波器组成，但克服了以前点卷积的局限性，如相关工作所示。KPConv受到基于图像的卷积的启发，但是我们使用一组核点来定义每个核权重应用的区域，如图1所示，而不是像素。因此，核权重与输入特征一样由点承载，并且它们的影响区域由相关函数定义。核点的数量没有限制，使得我们的设计非常灵活。尽管词汇相似，但我们的工作与[31]不同，后者受到点云配准技术的启发，并使用没有任何权重的核点来学习局部几何模式。此外，我们提出了我们的卷积的可变形版本[7]，它包括学习应用于核点的局部位移（见图3）。我们的网络在每个卷积位置生成不同的位移，这意味着它可以为输入点云的不同区域调整其核的形状。我们的可变形卷积与其图像对应物的设计方式不同。由于数据的不同性质，它需要正则化来帮助变形的核适应点云几何并避免空白空间。我们使用有效感受野（ERF）[21]和消融研究来比较刚性KPConv和可变形KPConv。与[40，2，44，19]相反，我们更喜欢半径邻域而不是k最近邻（KNN）。正如[13]所示，KNN在非均匀采样环境中不具有鲁棒性。我们的卷积对于不同密度的鲁棒性是通过半径邻域和输入点云的正则子采样的组合来确保的[37]。与归一化策略[13，14]相比，我们的方法还减轻了卷积的计算成本。在我们的实验部分，我们展示了KPConv可以用于构建用于分类和分割的非常深的架构，同时保持快速的训练和推理时间。总的来说，刚性和可变形的KPConv在多个数据集上都表现出色，超过了竞争算法。我们发现，刚性KPConv在更简单的任务（如对象分类或小型分割数据集）上表现更好。可变形KPConv在更困难的任务（如具有许多对象实例和更大多样性的大型分割数据集）上表现出色。我们还展示了可变形KPConv对于较少的核点数更具鲁棒性，这意味着更大的描述能力。最后但并非最不重要的，KPConv ERF的定性研究表明可变形核改进了网络。01 项目主页：https://github.com/HuguesTHOMAS/KPConv0图1.KPConv在2D点上的示意图。输入点具有恒定的标量特征（灰色），通过一组核点（黑色）上的滤波权重对其进行卷积。a point cloud to deﬁne point convolutions with spatial ker-nels. They share the idea that a convolution should deﬁne aset of customizable spatial ﬁlters applied locally in the pointcloud.This paper introduces a new point convolution operatornamed Kernel Point Convolution (KPConv). KPConv alsoconsists of a set of local 3D ﬁlters, but overcomes previouspoint convolution limitations as shown in related work. KP-Conv is inspired by image-based convolution, but in placeof kernel pixels, we use a set of kernel points to deﬁnethe area where each kernel weight is applied, like shownin Figure 1. The kernel weights are thus carried by points,like the input features, and their area of inﬂuence is deﬁnedby a correlation function. The number of kernel points isnot constrained, making our design very ﬂexible. Despitethe resemblance of vocabulary, our work differs from [31],which is inspired from point cloud registration techniques,and uses kernel points without any weights to learns localgeometric patterns.Furthermore, we propose a deformable version of ourconvolution [7], which consists of learning local shifts ap-plied to the kernel points (see Figure 3). Our network gen-erates different shifts at each convolution location, meaningthat it can adapt the shape of its kernels for different re-gions of the input cloud. Our deformable convolution isnot designed the same way as its image counterpart. Dueto the different nature of the data, it needs a regularizationto help the deformed kernels ﬁt the point cloud geometryand avoid empty space. We use Effective Receptive Field(ERF) [21] and ablation studies to compare rigid KPConvwith deformable KPConv.As opposed to [40, 2, 44, 19], we favor radius neighbor-hoods instead of k-nearest-neighbors (KNN). As shown by[13], KNN is not robust in non-uniform sampling settings.The robustness of our convolution to varying densities isensured by the combination of radius neighborhoods andregular subsampling of the input cloud [37]. Compared tonormalization strategies [13, 14], our approach also allevi-ates the computational cost of our convolution.In our experiments section, we show that KPConv canbe used to build very deep architectures for classiﬁcationand segmentation, while keeping fast training and infer-ence times. Overall, rigid and deformable KPConv bothperform very well, topping competing algorithms on sev-eral datasets. We ﬁnd that rigid KPConv achieves betterperformances on simpler tasks, like object classiﬁcation, orsmall segmentation datasets. Deformable KPConv thriveson more difﬁcult tasks, like large segmentation datasets of-fering many object instances and greater diversity. We alsoshow that deformable KPConv is more robust to a lowernumber of kernel points, which implies a greater descrip-tive power. Last but not least, a qualitative study of KPConvERF shows that deformable kernels improve the network64120适应场景对象的几何形状的能力。02. 相关工作0在本节中，我们简要回顾了以前的深度学习方法来分析点云，特别关注与我们对点卷积的定义更接近的方法。投影网络。几种方法将点投影到中间的网格结构上。基于图像的网络通常是多视图的，使用从不同视点渲染的一组2D图像来表示点云[34, 4,17]。对于场景分割，这些方法容易受到遮挡表面和密度变化的影响。[35]提出了将局部邻域投影到局部切平面上，并使用2D卷积进行处理的方法。然而，这种方法在很大程度上依赖于切平面的估计。在基于体素的方法中，点被投影到欧几里得空间中的3D网格上[23, 29,3]。使用像八叉树或哈希映射这样的稀疏结构可以实现更大的网格和更好的性能[28,9]，但这些网络仍然缺乏灵活性，因为它们的核被限制为使用3x3=27或5x3=125个体素。使用半正则晶格而不是欧几里得网格可以将核减少到15个晶格[33]，但这个数字仍然是有限制的，而KPConv允许任意数量的核点。此外，避免中间结构应该使得更复杂的架构设计，如实例掩膜检测器或生成模型，更加直观。图卷积网络。在图上定义卷积算子的方法有不同的方式。图上的卷积可以通过其谱表示上的乘法来计算[8, 46]，或者可以关注图所表示的表面[22, 5, 32,24]。尽管点卷积和最近的图卷积[38,42]之间存在相似之处，但后者是在边缘关系上学习滤波器，而不是点的相对位置。换句话说，图卷积将特征组合在局部表面块上，同时对欧几里得空间中这些块的变形是不变的。相反，KPConv根据3D几何局部地组合特征，从而捕捉到表面的变形。点级MLP网络。PointNet[25]被认为是点云深度学习中的一个里程碑。该网络在每个点上使用共享的MLP，然后进行全局最大池化。共享的MLP充当了一组学习到的空间编码，并且输入点云的全局特征是在所有点的这些编码中的每个编码的最大响应中计算的。该网络的性能受限，因为它没有考虑数据中的局部空间关系。在PointNet之后，一些分层架构已经被开发出来，用MLP来聚合局部邻域信息[26, 18, 20]。(F ∗ g)(x) =�xi∈Nxg(xi − x)fi(1)64130图2.图像卷积（左）和KPConv（右）在2D点上的比较，用于更简单的说明。在图像中，每个像素特征向量与由核与图像对齐分配的权重矩阵（Wk）k

下载后可阅读完整内容，剩余1页未读，立即下载