对象姿态恢复技术概述：从3D边界框检测到完整6D姿态估计

需积分: 9 5 浏览量更新于2024-07-09 收藏 3.68MB PDF 举报

"这篇论文是关于3D对象姿态恢复的综述，涵盖了从3D边界框检测器到全6D姿态估计器的研究进展。作者Caner Sahin、Guillermo Garcia-Hernando、Juil Sock和Tae-Kyun Kim来自英国帝国理工学院。文章探讨了在自动驾驶、机器人和增强现实等技术领域中，对象姿态恢复的重要性和日益增长的关注。现有的研究主要集中在2D视觉级别，通过方法来识别RGB图像中的目标物体2D边界框。为了扩大搜索空间，方法利用3D空间中的几何信息以及单目/立体RGB图像，或者利用LIDAR传感器和/或RGB-D相机的深度数据。3D边界框检测器在与重力对齐的图像上进行评估，而全6D对象姿态估计器通常在实例级别上进行测试，不考虑对齐约束。近期，6D对象姿态估计已发展到类别级别。论文中，作者详尽分析了这些领域的最新技术和挑战。" 这篇综述论文深入探讨了计算机视觉领域的关键问题——对象姿态恢复，特别是在3D环境中的应用。对象姿态恢复对于自动驾驶系统中的障碍物识别、机器人的精确操作以及增强现实中的虚拟对象定位至关重要。传统的方法主要集中在2D图像上的对象检测，通过生成2D边界框来定位目标。然而，这种2D方法限制了对真实世界三维空间的理解。为了克服这些局限性，研究人员开始结合3D空间信息和多种传感器数据（如立体视觉、LIDAR和RGB-D相机）来提升定位精度。3D边界框检测器是这一领域的一个关键进步，它们能够对物体进行类别级别的无模态3D边界框预测，这在处理与重力对齐的图像时特别有用。另一方面，全6D对象姿态估计则更进一步，不仅确定物体的位置，还确定其在3D空间中的旋转，允许在没有特定对齐条件的情况下准确估计物体的姿态。随着技术的发展，6D对象姿态估计逐渐扩展到类别级别，这意味着模型可以处理同一类别内不同实例的物体，而不仅仅是单个实例。这为实际应用提供了更大的灵活性，但同时也带来了新的挑战，如类内形状变化的处理和泛化能力的提升。论文全面回顾了这些方法的技术细节、优缺点以及当前面临的挑战，包括数据集的多样性、计算效率、鲁棒性以及在复杂环境中的表现。作者们可能还讨论了未来的研究方向，比如深度学习在对象姿态估计中的作用、实时性能的提高以及如何应对有限训练数据的问题。这篇综述为读者提供了对3D对象姿态恢复领域的全面理解，是该领域研究者和从业者的重要参考文献。

TABLE II: Classiﬁcation-based methods (CNN: Convolutional Neural Network, CRF: Conditional Random Field, ICP: Iterative Closest

Point, NMS: Non-Maximum Suppression, R: Real data, RF: Random Forest, S: Synthetic data, s-SVM: structured-Support Vector Machine,

for symbols see Table I)

method input input training classiﬁcation classiﬁer trained reﬁnement ﬁltering level

pre-processing data parameters training classiﬁer step

2D-DRIVEN 3D

GS3D [104] RGB 7 R θ

CNN CNN 7 category

reﬁnement step RGB 7 R x, d, θ

CNN 7 7

Papon et al. [88] RGB-D intensity & normal R & S θ

, z L

CNN 7 NMS category

Gupta et al. [81] RGB-D normal S θ

CNN ICP 7 category

Sliding Shapes [89] Depth 3D grid R & S x, d, θ

hinge

SVM 7 NMS category

Ren et al. [149] RGB-D 3D grid R x, d, θ

IoU

s-SVM s-SVM NMS category

reﬁnement step RGB-D 3D grid R x, d, θ

IoU

s-SVM 7 7

Wang et al. [96] LIDAR 3D grid R x L

hinge

SVM 7 NMS category

Vote3Deep [77] LIDAR 3D grid R x, θ

hinge

CNN 7 NMS category

Bonde et al. [27] Depth 3D grid S Θ IG RF 7 7 instance

Brachmann et al. [28] RGB-D 7 R & S x IG RF ICP 7 instance

Krull et al. [29] RGB-D 7 R & S x IG RF CNN 7 instance

reﬁnement step Depth 7 R & S x, Θ log-like CNN 7 7

Michel et al. [33] RGB-D 7 R & S x IG RF CRF & ICP 7 instance

reﬁnement step RGB-D 7 R & S x, Θ 7 CRF ICP 7

more accurate results [128], [129], [130], [131], [132]. In

classiﬁcation forests, information gain is often used as the

quality function Q

cla

and in regression tasks, the training

objective Q

reg

is to minimize the variance in translation offset

vectors and rotation parameters. For pose regression problems,

Hough voting process [133] is usually employed.

A. Classiﬁcation

Overall schematic representation of the classiﬁcation-based

methods is shown in Fig. 1. In the ﬁgure, the blocks drawn

with continuous lines are employed by all methods, and

depending on the architecture design, dashed-line blocks are

additionally operated by the clusters of speciﬁc methods.

Training Phase. During an off-line stage, classiﬁers are

trained based on synthetic or real data. Synthetic data are

generated using the 3D model M of an interested object

O, and a set of RGB/D/RGB-D images are rendered from

different camera viewpoints. The 3D model M can either be

a CAD or a reconstructed model, and the following factors are

considered when deciding the size of the data:

• Reasonable viewpoint coverage. In order to capture rea-

sonable viewpoint coverage of the target object, synthetic

images are rendered by placing a virtual camera at each

vertex of a subdivided icosahedron of a ﬁxed radius. The

hemisphere or full sphere of icosahedron can be used

regarding the scenario.

• Object distance. Synthetic images are rendered at differ-

ent scales depending on the range in which the target

object is located.

Computer graphic systems provide precise data annotation,

and hence synthetic data generated by these systems are used

by the classiﬁcation-based methods [88], [81], [89], [27], [28],

[29], [33]. It is hard to get accurate object pose annotations for

real images, however, there are classiﬁcation-based methods

using real training data [104], [88], [89], [149], [77], [96], [28],

[29], [33]. Training data are annotated with pose parameters

i.e., 3D translation x = (x, y, z), 3D rotation Θ = (θ

, θ

or both. Once the training data are generated, the classiﬁers

are trained using related loss functions.

Testing Phase. A real test image, during an on-line stage,

is taken as input by the classiﬁers. 2D-driven 3D methods

[104], [88], [81] ﬁrst extract a 2D BB around the object of

interest (2D BB generation block), which is then lifted to 3D.

Depending on the input, the methods in [88], [81], [89], [149],

[96], [77], [27] employ a pre-processing step on the input

image and then generate 3D hypotheses (input pre-processing

block). 6D object pose estimators [27], [28], [29], [33] extract

features from the input images (feature extraction block), and

using the trained classiﬁers, estimate objects’ 6D pose. Several

methods further reﬁne the output of the trained classiﬁers

[104], [81], [149], [28], [29], [33] (reﬁnement block), and

ﬁnally hypothesize the object pose after ﬁltering.

Table II details the classiﬁcation-based methods. GS3D

[104] concentrates on extracting the 3D information hidden

in a 2D image to generate accurate 3D BB hypotheses. It

modiﬁes Faster R-CNN [117] to classify the rotation θ

RGB images in addition to the 2D BB parameters. Utilizing

another CNN architecture, it reﬁnes the object’s pose parame-

ters further classifying x, d, and θ

. Papon et al. [88] estimate

semantic poses of common furniture classes in complex clut-

tered scenes. The input is converted into the intensity (RGB)

& surface normal (D) image. 2D BB proposals are generated

using the 2D GOP detector [122], which is then lifted to 3D

space further classifying θ

and z using the bin-based cross

entropy loss L

. Gupta et al. [81] start with the segmented

剩余24页未读，继续阅读

仙猫漫步

粉丝: 22
资源: 37

对象姿态恢复技术概述：从3D边界框检测到完整6D姿态估计

2019 A Review of Sparse Recovery Algorithms.pdf

Occlusion Handling in Generic Object Detection A Review.pdf

Short-term travel-time prediction on highway a review of the data-driven ....pdf

6D pose综述2.pdf

Object tracking review_Pattern Recognition_2018.pdf

2.Deep Learning for Anomaly Detection A Review 论文分享（中）.pdf

936-A Review on Multi-Label Learning Algorithms.pdf

2015-Review-State of the Art in Research on Microgrids - A Review.pdf

CISA Review Manual-2010中文版.pdf

Multi object tracking-a litrature review.pdf

最新资源