实时RGB图像的6D物体检测与姿态识别：ECCV2018最佳论文

需积分: 44 173 浏览量更新于2024-07-18 2 收藏 2.67MB PDF 举报

标题：“机器人3D视觉 - 6自由度位姿识别：ECCV 2018最佳论文”聚焦于先进的计算机视觉技术在实时RGB图像中的应用，特别是在物体检测和6度自由度（6D）姿态估计领域的创新方法。这项由Facebook人工智能研究院的研究员在ECCV 2018年全球计算机视觉顶级会议上的成果，赢得了最佳论文奖，其核心是Implicit 3D Orientation Learning for 6D Object Detection from RGB Images。研究人员开发了一种实时的RGB图像为基础的系统，它在对象检测和姿态估计方面表现出色。他们的关键贡献在于一种新型的3D姿态估计方法，即增强自动编码器（Augmented Autoencoder）。这个模型借鉴了Denoising Autoencoder的技术，并通过领域随机化（Domain Randomization）策略进行训练，这种方法在模拟的3D模型视角上进行。与现有方法相比，增强自动编码器有显著优势。首先，它无需依赖真实的、标定姿态的训练数据，这意味着模型的泛化能力更强，能够适应各种类型的传感器。此外，它能自然地处理对象和视角的对称性，避免了传统方法中对输入图像到对象姿态的显式映射问题。取而代之的是，该模型提供了一种隐式的物体姿态表示，使得姿态估计更为精确且高效。研究者们通过这种方法不仅实现了6D物体检测，还能准确估计出物体在三维空间中的位置和方向（旋转），这对于机器人操作、自动驾驶、无人机导航等应用场景至关重要。这种技术的进步有助于推动机器人技术的发展，使其能够在复杂的环境中更准确地理解和交互，从而提高整体性能和可靠性。这篇论文展示了在深度学习和计算机视觉领域的前沿进展，特别是如何利用模拟数据和新颖的模型架构来解决实际问题，这将对今后的3D视觉和机器人技术研究产生深远影响。对于对此领域感兴趣的学习者和工程师来说，深入理解并研究这一技术将有助于提升他们在实际项目中的技术水平。

4 M. S u n d erm eyer, Z. Marton, M. Durner, M. Brucker, R. Triebel

COCO background im ages [21] while varying brightness and contrast. This lets

the network generalize to real images and enables 6D detection at 10Hz . Like us,

for very accurat e distance estimation they rely on Iterative Closest Point (ICP)

post-processing using depth data. In contrast, we do not treat 3D orientation

estimation as a classiﬁc at ion task.

2.2 Learning representations of 3D orientations

We describe the diﬃculties of training with ﬁxed SO(3) parameterizations which

will m ot ivate the learning of object-speciﬁc repr es entations.

Regression. Since rotations live in a continuous space, it seems natural t o

directly regress a ﬁxed SO(3) parameterizations like quaternions. However, rep-

resentational constraints and pose ambiguities can introduce convergence issues

[32]. In practice, direct regression approaches for full 3D object orientation esti-

mation h ave not been very successful [23].

Classiﬁcation of 3D object orientations require s a discretization of SO(3). Even

rather coarse intervals of ∼ 5

lead to over 50.000 possible class e s . Since each

class appears only sparsely in the training data, this hinders convergence. In

SSD6D [17] the 3D orientation is learned by separ at el y classifying a discretized

viewpoint and in-plane rotation, thus reducing the complexity to O(n

). How-

ever, for non-canonical views, e.g. if an object is seen from above, a change

of viewpoint can be nearly equivalent to a change of in-plane rotation which

yields ambiguous class combinations. In general, the relation between diﬀ er ent

orientations is ignored when performing one-hot classiﬁcat i on.

Symmetries are a severe issue when relying on ﬁxed representations of 3D ori-

entations since they cause pose ambiguities (Fig. 2). If not m anually addressed,

identical training images can have diﬀerent orientation labels assigned which

can signiﬁcantly disturb the learning process. In order to cope with ambiguous

objects, most approaches in lit er at u re are manually adapted [40,9,17,28]. The

strategies reach from ignoring one axis of rotation [40,9] over adapt i n g the dis-

cretization according to the object [17] to the training of an extra CNN to predi ct

symmetries [28]. These depict t e di ou s, manual ways to ﬁlter out object symme-

tries (2a) in advance, but treating ambiguities due to self-occlusions (2b) and

occlusions (2c) are harder to address. Symmetries do not only aﬀect regression

and c las s i ﬁc at ion met h ods, but any learning-based algorithm that discriminates

object views solely by ﬁxed S O ( 3) repre se ntations.

Descriptor Learning can be used to learn a representation that relates ob-

ject views in a low-dimensional space. Wohlhart et al. [40] introduced a CNN-

based descriptor learni ng approach using a triplet loss that minimizes/maximizes

the Euclidean distance between similar/dissi mi l ar object orientations. Although

剩余16页未读，继续阅读

ppto

粉丝: 3
资源: 4

实时RGB图像的6D物体检测与姿态识别：ECCV2018最佳论文

北交大《超人类机器人视觉技术》课程论文.doc

六自由度机械臂上位机python代码（带详细注释）

3D视觉代码

matlab的egde源代码-Video-Summarization-with-LSTM:实施ECCV2016论文（具有长短期记忆的视频摘要）

GTA-IM-Dataset:[ECCV-20] 3D人类场景交互数据集

NAS-DIP-pytorch:[ECCV 2020] NAS-DIP

matlabrcnn代码-part-based-RCNN:ECCV14论文“PartbasedRCNNsforFine-grainedcate

Awesome-ECCV2020-Low-Level-Vision:ECCV2020低水平视觉或图像重建的论文和代码的集合

Self-supervised-Fewshot-Medical-Image-Segmentation:[ECCV'20]具有超像素的自我监督

Python-实例批量标准化网络ECCV2018

最新资源