野外物体的3D姿态估计与模型检索：突破与应用

下载需积分: 37 | PDF格式 | 1.84MB | 更新于2024-09-09 | 156 浏览量 | 举报

本文主要探讨了"3D Pose Estimation and 3D Model Retrieval for Objects in the Wild"这一课题，其核心内容集中在如何在现实世界场景下，高效、准确地识别和检索物体的三维模型。研究者提出了一种创新的方法，将3D姿态估计与3D模型检索结合，旨在提升对复杂环境中的物体进行三维建模的能力。首先，作者介绍了一个针对特定对象类别设计的3D姿态估计方法。该方法在Pascal3D+数据集上取得了显著的性能提升，超越了现有的最先进的技术。姿态估计是关键步骤，它能够捕捉到物体在三维空间中的精确位置和方向，这对于后续的3D模型匹配至关重要。其次，研究者利用所估计的3D姿态作为先验信息，进行3D模型检索。他们通过将深度图像渲染自3D模型，然后利用深度学习的卷积神经网络（CNN）进行多视角的度量学习，来匹配输入的RGB图像特征和渲染后的深度图像特征。这种策略使得模型能够有效地理解和识别物体的几何形状，即使在复杂的自然环境中也能找到最匹配的3D模型。在Pascal3D+数据集上的实验结果非常显著，他们的方法在验证集上平均有50%的图像能够选择出与人类标注相同的3D模型，这是3D模型检索领域的首次定量评估。这表明，他们的方法不仅具有较高的精度，而且能够在实际应用中提供可靠的结果。此外，研究者还展示了他们的方法在纯Pascal3D+训练后，能在ShapeNet上从RGB图像中成功检索出丰富且准确的3D模型，进一步证明了其跨数据集的泛化能力。这项工作对于推动3D计算机视觉领域的发展，特别是在物体识别和三维重建方面，具有重要的意义。总结来说，本文的主要贡献在于开发了一种结合3D姿态估计与深度学习的3D模型检索方法，显著提升了在自然场景中处理复杂物体的能力，并在Pascal3D+数据集上达到了人类水平的性能。这一成果对于提高现实世界物体的3D建模准确性以及相关应用如虚拟现实、机器人导航等领域具有重要价值。

3D Pose Estimation and 3D Model Retrieval for Objects in the Wild

Alexander Grabner

Peter M. Roth

Vincent Lepetit

2,1

Institute of Computer Graphics and Vision, Graz University of Technology, Austria

Laboratoire Bordelais de Recherche en Informatique, University of Bordeaux, France

{alexander.grabner,pmroth,lepetit}@icg.tugraz.at

Abstract

We propose a scalable, efﬁcient and accurate approach

to retrieve 3D models for objects in the wild. Our contri-

bution is twofold. We ﬁrst present a 3D pose estimation

approach for object categories which signiﬁcantly outper-

forms the state-of-the-art on Pascal3D+. Second, we use

the estimated pose as a prior to retrieve 3D models which

accurately represent the geometry of objects in RGB im-

ages. For this purpose, we render depth images from 3D

models under our predicted pose and match learned im-

age descriptors of RGB images against those of rendered

depth images using a CNN-based multi-view metric learn-

ing approach. In this way, we are the ﬁrst to report quanti-

tative results for 3D model retrieval on Pascal3D+, where

our method chooses the same models as human annota-

tors for 50% of the validation images on average. In ad-

dition, we show that our method, which was trained purely

on Pascal3D+, retrieves rich and accurate 3D models from

ShapeNet given RGB images of objects in the wild.

1. Introduction

Retrieving 3D models for objects in 2D images, as

shown in Fig.

1, is extremely useful for 3D scene under-

standing, augmented reality applications and tasks like ob-

ject grasping or object tracking. Recently, the emergence of

large databases of 3D models such as ShapeNet [

3] initiated

substantial interest in this topic and motivated research for

matching 2D images of objects against 3D models. How-

ever, there is no straight forward approach to compare 2D

images and 3D models, since they have considerably differ-

ent representations and characteristics.

One approach to address this problem is to project 3D

models onto 2D images, which is known as rendering [

24].

This converts the task to comparing 2D images, which is,

however, still challenging, because the appearance of ob-

jects in real images and synthetic renderings can signiﬁ-

cantly differ. In general, the geometry and texture of avail-

able 3D models do not exactly match those of objects in real

Figure 1: Given an RGB image (top), we predict a 3D pose

and a 3D model for objects of different categories (bottom).

images. Therefore, recent approaches [

2, 10, 23, 28] use

convolutional neural networks (CNNs) [

7, 8, 22] to extract

features from images which are partly invariant to these

variations. In particular, these methods compute image de-

scriptors from real RGB images and synthetic RGB images

which are generated by rendering 3D models under multiple

poses. While this allows them to train a single CNN purely

on synthetic data, there are two main disadvantages:

First, there is a signiﬁcant domain gap between real and

synthetic RGB images: Real images are affected by com-

plex lighting, uncontrolled degradation and natural back-

grounds. This makes it is hard to render photo-realistic im-

ages from the available 3D models. Therefore, using a sin-

gle CNN for feature extraction from both domains is lim-

ited in performance, and even domain adaption [

13] does

not fully account for the different characteristics of real and

synthetic images.

3022

下载后可阅读完整内容，剩余9页未读，立即下载

AlgoFei

粉丝: 9

野外物体的3D姿态估计与模型检索：突破与应用

pose model

:T-Rex:obj2usdz，转换。obj文件到。usdz在iOS上

ObjectPoseEstimationSummary:对象姿态估计领域中的资源（纸张，数据集，渲染方法）

【CVPR2018】3D Pose Estimation and 3D Model Retriev

few-shot object detection and viewpoint estimation for objects in the wild

【CVPR2018】2D3D Pose Estimation and Action Recogni

2D_3D Pose Estimation and Action Recognition using Multitask Deep Learning

2D/3D Pose Estimation and Action Recognition using Multitask Deep Learning

【CVPR2018】3D Human Pose Estimation in the Wild by

cvpr18-3D Human Pose Estimation in the Wild by Adversarial Learning

最新资源