在线语义3D场景分割的融合感知点卷积方法

需积分: 46 77 浏览量更新于2024-08-26 收藏 2.06MB PDF 举报

"Fusion-Aware Point Convolution for Online Semantic 3D Scene Segmentation" 这篇论文主要探讨了在线语义3D场景分割与实时RGB-D重建过程中所面临的特殊挑战，并提出了一种新颖的融合感知3D点卷积方法，旨在直接在逐步融合的3D几何数据上进行操作，同时智能地融合帧间信息，以实现高质量的3D特征学习。在线语义3D分割在实时RGB-D重建中的核心问题是如何直接对正在构建的3D几何数据执行3D卷积，以及如何有效地帧间信息融合。作者们提出的解决方案是融合感知3D点卷积，它利用一种专有的动态数据结构来组织在线获取的点云数据。全局层面，他们采用增量生长的坐标区间树来编译在线重建的3D点，这使得快速插入点和查询邻居成为可能。局部层面，每个点的邻域信息通过一棵八叉树来维护，其构建得益于全局树的快速查询功能。这两层的数据结构设计确保了对3D点云的有效处理和特征提取。 3D点卷积是深度学习在3D场景理解中的关键操作，传统方法往往在离线环境下处理静态数据。然而，在实时环境中，数据是连续流入的，因此需要一种能够适应这种流式数据并能逐步更新模型的方法。论文中的融合感知点卷积就是为了解决这个问题，它能够在数据融合的同时学习到更丰富的特征。此外，通过利用帧间的相关性，该方法可以减少计算复杂性，提高处理速度，这对于实时应用至关重要。这种智能的信息融合策略有助于捕捉3D场景的动态变化，从而提高在线语义分割的准确性和鲁棒性。这篇论文提出了一个创新的解决方案，将3D点云处理和实时信息融合相结合，为实时3D语义分割提供了一个强大的工具。这一技术对于自动驾驶、机器人导航、增强现实等领域具有广泛的应用潜力。通过结合全局和局部的视角，以及动态数据结构的运用，研究者能够实现高效且准确的3D场景理解和分割。

Fusion-Aware Point Convolution for Online Semantic 3D Scene Segmentation

Jiazhao Zhang

1,∗

Chenyang Zhu

Lintao Zheng

Kai Xu

1,2†

National University of Defense Technology

SpeedBot Robotics Ltd.

Abstract

Online semantic 3D segmentation in company with real-

time RGB-D reconstruction poses special challenges such

as how to perform 3D convolution directly over the progres-

sively fused 3D geometric data, and how to smartly fuse in-

formation from frame to frame. We propose a novel fusion-

aware 3D point convolution which operates directly on the

geometric surface being reconstructed and exploits effec-

tively the inter-frame correlation for high quality 3D fea-

ture learning. This is enabled by a dedicated dynamic data

structure which organizes the online acquired point cloud

with global-local trees. Globally, we compile the online re-

constructed 3D points into an incrementally growing coor-

dinate interval tree, enabling fast point insertion and neigh-

borhood query. Locally, we maintain the neighborhood in-

formation for each point using an octree whose construction

beneﬁts from the fast query of the global tree. Both levels

of trees update dynamically and help the 3D convolution ef-

fectively exploits the temporal coherence for effective infor-

mation fusion across RGB-D frames. Through evaluation

on public benchmark datasets, we show that our method

achieves the state-of-the-art accuracy of semantic segmen-

tation with online RGB-D fusion in 10 FPS.

1. Introduction

Semantic segmentation of 3D scenes is an fundamental task

in 3D vision. The recent state-of-the-art methods mostly

apply deep learning on either 3D geometric data solely [

25]

or the fusion of 2D and 3D data [

20]. These approaches,

however, are usually ofﬂine, working with an already re-

constructed 3D scene geometry [

5, 14]. Online scene un-

derstanding associated with real-time RGB-D reconstruc-

tion [

13, 22], on the other hand, is deemed to be more

appealing due to the potential applications in robot and

AR. Technically, online analysis can also fully exploit the

spatial-temporal information during RGB-D fusion.

For the task of semantic scene segmentation in company

with RGB-D fusion, deep-learning-based approaches com-

monly adopt the frame feature fusion paradigm. Such meth-

Joint ﬁrst authors

†

Corresponding author: kevin.kai.xu@gmail.com

Frame 180

Frame 350Frame 260

Figure 1: We present fusion-aware 3D point convolution

which operates directly over the progressively acquired and

online reconstructed scene surface. We show the point-wise

labeling is being gradually improved (the chairs are recog-

nized) as more and more frames (ﬁrst row) are fused in.

ods ﬁrst perform 2D convolution in the individual RGB-D

frames and then fuse the extracted 2D features across con-

secutive frames. Previous works conduct such feature fu-

sion through either max-pooling operation [

14] or Bayesian

probability updating [

20]. We advocate the adoption of di-

rect convolution over 3D surfaces for frame feature fusion.

3D convolution on surfaces learns features of the intrinsic

structure of the geometric surfaces [

2] that cannot be well-

captured by view-based convolution and fusion. During on-

line RGB-D fusion, however, the scene geometry changes

progressively with the incremental scanning and reconstruc-

tion. It is difﬁcult to perform 3D convolution directly over

the time-varying geometry. Besides, to attain a powerful

3D feature learning, special designs are needed to exploit

the temporal correlation between adjacent frames.

In this work, we argue that a fast and powerful 3D convo-

lution for online segmentation necessitates an efﬁcient and

versatile in-memory organization of dynamic 3D geomet-

ric data. To this end, we propose a tree-based global-local

dynamic data structure to enable efﬁcient data maintenance

and 3D convolution of time-varying geometry. Globally, we

organize the online fused 3D points with an incrementally

growing coordinate interval tree, which enables fast point

4534

下载后可阅读完整内容，剩余9页未读，立即下载

小码1号

粉丝: 10

在线语义3D场景分割的融合感知点卷积方法

"深信服edr终端检测响应平台权限绕过代码审计挖掘与分析

Matlab在机械电子领域的最新研究文章综述

Ember.js组件尺寸变化监听新工具：ember-resize-aware

segmentation papers in deep learning

region-conv:并非所有像素均相等

Deployment and Optimization of YOLOv8 Model on Mobile Devices

机器学习驱动的广告拦截器-Content-aware Ad Blocker-crx插件

Ad-Aware Free：免费系统安全工具，轻松清除恶意软件

cole_02_0507.pdf

工程硕士开题报告：无线传感器网络路由技术及能量优化LEACH协议研究

最新资源