单目摄像头实现实时3D位姿估计_单目位姿估计

25 浏览量更新于2023-05-04 评论 5 收藏 282KB PDF 举报

身份认证购VIP最低享 7 折!

领优惠券(最高得80元）

资源详情

资源评论

资源推荐

单目摄像头实现实时单目摄像头实现实时3D位姿估计位姿估计

Real-time 3D Pose Estimation with a Monocular Camera Using Deep Learning and Object Priors On an Autonomous Racecar

背景背景

三维物体投影在平面上会失去一个维度，即不知道物体的距离。但是，有了三维物体的先验信息，我们可以知道三维物体的距离

To this end, we propose a low-latency real-time pipeline to detect and estimate 3D position of multiple objects of interest using just a single measurement, i.e. a single image

without the need for any special external markers

We propose a novel “keypoint regression” scheme that exploits prior information about the object’s shape and size to regress and ﬁnd speciﬁc feature points on the image.

We propose a complete pipeline that allows object detection and simultaneously estimate pose of these multiple object using just a single image by exploiting object priors

As per the rules of the competition, the track is marked by cones. The left and right track limits are marked by blue and yellow trafﬁc cones respectively

A novel feature regression scheme, “keypoint regression” is introduced which is used to match 2D-3D correspondences

This section shifts the focus on how to estimate 3D position of multipleobjects from a single image. Although,it is an ill-posed problem but with a priori information in the form of

the shape,size and geometry of the object-of-interest, this is solvable, as elaborated in this chapter.

采用采用ROS系统的优势系统的优势

1.ROS通过节点通讯，并且有各种传感器、导航的消息类型

2.ROS开源，有一系列的可视化、仿真工具

The pipeline’s sub-modules are run as nodes using Robot Operating System or ROS [5] as the framework that eases handling of communication and data messages across

multiple systems as well as different nodes. Different sub-modules communicate via messages, they receive data and output processed information. Another important aspect is

that ROS is open-source and provides tools for visualization, monitoring and simulation, making it easy to integrate, test, diagnose and develop the complete software system.

视觉感知系统（两部分）视觉感知系统（两部分）

双目立体

单目

the stereo and the monocular pipeline. The stereo pipeline use the sub-modules explained in this section to have an extremely efﬁcient way of triangulating and estimating depth

from binocular vision. This methodology of drastically reducing the search space and cleverly tackling the issue of having numerous and often incorrect feature matche

单目通道单目通道

The monocular pipeline has 3 crucial sub-modules which enable it to detect multiple objects of interest and accurately estimate their 3D position up to a distance of 15 meters by

making use of a single measurement in the form of an image captured by the monocular camera.

三个子模块三个子模块

The monocular pipeline can be broken down into three parts. (1) Multiple object detection, (2) Keypoint regression and (3) 2D-3D correspondence followed by 3D pose

estimation from a single image

4.2 多目标检测多目标检测

Object recognition has 4 main categorizes of tasks:

(1) classiﬁcation, (2) classiﬁcation and localization,(3)objectdetectionand(4)instancesegmentation

Instead of using slow and computationally intensive cascade and sliding window approaches, weemployaquick,real-time and powerful object detector in our pipeline in the form

of YOLOv2

4.2.1 Importance of color information

The path planning then has a cost function with apenalization term for potential paths that drive the car through same colored cones.

怎么获取锥形桶颜色？

We design the detector such that the cone color information can be directly obtained from it. In other words, we treat each colored cone as a different class for the object

detector.

4.2.2 Customizing YOLOv2 for Formula Student Driverless

控制阈值

We choose YOLOv2 for the purpose of detecting different colored cones. Thresholds for it are chosen such that false positives, incorrect detections and misclassiﬁcation are

avoided at any cost; even if that translate to not being able to detect all cones in a given image

不太懂，不过应该是缩小置信区间，重新计算特征

Since the annotations for cones are long and thin rectangular bounding boxes, we exploit such prior information by re-calculating the anchor boxes used by YOLOv2. This is

done by performing k-means clustering on the aspect-ratio of the rectangle annotations in the dataset and improves the object detector’s performance.

needs to distinguish and detect ‘yellow’, ‘ blue’ and ‘orange’ cones that provide information about the track

4.2.3 Training to detect cones 训练样本训练样本

4.3 Keypoint Regression（关键点回归）（关键点回归）

先验信息中的geometry（几何）是怎么知道的？

However, since there is prior information about the 3D shape, size and geometry of the cone, one has hope to recover 3D pose from a single measurement。

4.3.1 From patches to features-The need for “keypoint regression”

Using an object detector, cones can be detected in an image. However, one needs more information to go from detections on the image to 3D positions. We exploit a priori

knowledge about the cone and a calibrated camera to help estimate its depth via 2D-3D correspondences

分辨率不高或其他情况，提取不到足够的3D信息。

为此，我们引入了一种基于经典计算机视觉的特征提取方案，该方案具有通过机器学习从数据中学习的味道（To this end, we introduce a feature extraction scheme that is

inspired by classical computer vision but has a ﬂavor of learning from data via machine learning）

4.3.2 Design and architecture of the “keypoint regressor”

卷积神经网络

The primary difference between this scheme and any other feature extraction process is that this is very speciﬁc as compared to commonly used techniques.

In our case, we want to ﬁnd position of very speciﬁc points on the image that correspond to 3D counterparts whose locations can be measured in 3D from an arbitrary world

frameFw.

本内容试读结束，登录后可阅读更多

下载后可阅读完整内容，剩余2页未读，立即下载