RGB-D传感器助力：室内场景分割与标记技术

185 浏览量更新于2024-08-26 收藏 1.82MB PDF 举报

身份认证购VIP最低享 7 折!

30元优惠券

"根据RGB-D为视障者分割和标记室内场景" 随着RGB-D传感器和3D点云技术的持续发展，它们在视障者避障领域发挥了重要作用。RGB-D数据结合了彩色（RGB）和深度（D）信息，为理解和解析环境提供了丰富的信息。然而，开发出能够实时、稳定地处理这些数据的算法是一项艰巨的任务，特别是在帮助视障人士导航的场景下。本文主要关注的是场景分割和标记，这是避障和导航的关键步骤。室内场景通常由多个平面区域和结构组成，例如墙壁、地板和家具。因此，准确地识别和分类这些平面对于进一步的环境理解至关重要。为了克服噪声和提高分割效果，作者提出了一个多尺度体素策略。这种策略通过在不同尺度上处理数据，可以更有效地滤除不一致的点，从而增强分割的质量。体素化是一种将3D空间离散化为小立方体单元的方法，有助于在处理3D数据时保持计算效率。在本研究中，多尺度体素策略被用来减少噪声对平面分割的影响。接下来，将分割结果与深度数据和颜色信息相结合，这允许利用深度数据提供的空间信息和颜色数据提供的语义信息。通过这种结合，可以应用基于图的图像分割算法，进一步细化和优化分割结果。随后，使用级联决策树对这些分割后的区域进行分类，将它们归类为不同的语义类别，如地面、墙壁、家具等。级联决策树是一种机器学习模型，能有效地处理大量特征，并在每个阶段过滤掉不相关的实例，从而提高整体分类效率。实验是在NYU深度数据集的一个子集上进行的，这个数据集包含了大量的室内环境扫描，是评估RGB-D算法的理想选择。实验结果显示，所提出的方法成功地融合了深度数据和场景几何特征，显著提高了场景分割的精度，同时也有助于更有效地检测障碍物，这对视障人士的导航特别有益。这项工作展示了如何利用RGB-D数据来创建一个对视障者友好的室内环境模型，通过精确的场景分割和语义标记，提高了障碍物检测和导航的效率。未来的研究可能会进一步优化这种算法，使其更加鲁棒和实时，以适应各种复杂的室内环境，为视障人士提供更好的生活辅助工具。

资源详情

资源推荐

RGB-D Indoor Segment 451

objects are recognized using optical character recognition (OCR) software on

extracted text regions. Lee et al. [5] incorporate visual odometry and feature

based metric-topological Simultaneous Localization And Mapping (SLAM) into

the navigation system. Then a vicinity map based on dense 3D data obtained

from RGB-D camera is built to do path planning. These methods only focus on

some speciﬁc part of the scene. So the user can’t receive the information about

the whole scene.

Semantical scene analysis could help the visually impaired know better of

the surrounding environment. And there have been many development in RGB-

D scene analysis method recently. Silberman et al. [6] use depth for bottom-up

segmentation and use context features to infer support relationships in the scene.

Ren et al. [7] use kernel descriptors on superpixels and use a Markov Random

Field (MRF) on superpixel with segmentation tree to model the context of the

scene. Choi et al. [8] use 3D geometric phrase model to capture the semantic and

geometric relationship between objects which frequently co-occur in the same

3D spatial conﬁguration and then understand the indoor scenes. Gupta et al. [9]

propose algorithms for object boundary detection and hierarchical segmentation.

Their algorithms visit the segmentation problem afresh from ground-up and

develop a gPb like machinery to combine depth information naturally. Wang

et al. [10] propose a label propagation method to utilize the existing massive

2D semantic labeled datasets such as ImageNet. Koppula et al. [11] parse the

indoor scene with RGB-D data in a mobile robots. A full 3D reconstruction is

applied with multiple views of the scene acquired with a Kinect sensor. Then the

3D point cloud is over-segmented and used as underlying structure for a MRF

model. These methods focus on the algorithm for general scene segmentation

and labeling, while lacking speciﬁc analysis for the visually impaired. Wang et

al. [12] use hough transform to extract the concurrent parallel lines on the RGB

channels and then use depth information to distinguish stairs from pedestrian

crosswalks. Then stairs are be recognized as upstairs and downstairs. These

methods are mainly focus on the accuracy of scene segmentation while neglect

the eﬃciency of the algorithm which, however, is a key factor in our work. Liu

et al. [13] use a graph-based segmentation algorithm which combines the result

of plane segmentation and RGB-D data. The method is more focused on the

eﬃciency of the algorithm. However, in order to help the visually impaired to

know better of the scene, more semantical analysis, like the type of diﬀerent

structures, should be conducted.

In man-made indoor environment, there exist many planes which contain

much structural information. Extracting these planes could be very helpful in

scene segmentation. There are many plane segmentation algorithms in litera-

ture. One way to extract planes is applying 2D segmentation methods on 3D

data. However, this approach performs badly if two planes are very close to each

other. In order to take advantage and make full use of 3D data, many new meth-

ods have been proposed. Holz et al. [14] compute local surface normal of point

clouds using integral images. And then the points are clustered, segmented, and

classiﬁed in both normal space and spherical coordinates. This method achieves

剩余11页未读，继续阅读

weixin_38722721

粉丝: 5
资源: 927

RGB-D传感器助力：室内场景分割与标记技术

基于深度传感器的近距离障碍预警.rar

inf029-evertondasilva:视障者INF029的邮寄资料库-计划实验室（Laboratóriodeprogramação）

Walk-me-Through-it:视障人士的耳机导航系统

deep-learning-notes:视障人士的深度学习

door-detection:视障人士的门检测

snake-game:视障人士科学博览会项目

Ampel-Pilot:视障人士的行人交通灯探测器

视障者夹克-项目开发

安全技术-网络信息-视障残疾人社会支持网络构建个案报告.pdf

Game-for-visually-impaired:有关视障人士游戏设计的学士学位论文

GuideBeacon：蓝牙低能耗供电的室内导航移动应用程序，适合盲人，视障者和迷失方向（BVID）的人

谷歌师兄的leetcode刷题笔记-sjj-impaired:向耶和华“欢呼”-改善视障人士

button-visually-impaired-[removed]按钮视障

可以帮助视障者识别物体颜色的特殊戒指-电路方案

QQ & 信息无障碍研究会-2019视障人士在线社交报告-2019.10-28页.rar

QQ & 信息无障碍研究会-2019视障人士在线社交报告-2019.10-28页.pdf

精品报告系列2019-QQ & 信息无障碍研究会-2019视障人士在线社交报告-2019.10-28页.pdf

ESC102-Prototype:利用NFC ID以及音频记录和回放的视障者辅助对象重新识别装置

最新资源