Real-time 3D Pose Estimation with a Monocular Camera Using Deep Learning and Object Priors On an Autonomous Racecar 背景 三维物体投影在平面上会失去一个维度，即不知道物体的距离。但是，有了三维物体的先验信息，我们可以知道三维物体的距离 To this end, we propose a low-latency real-time pipeline to detect and estimate 3D position of multiple objects of interest
Real-time 3D Pose Estimation with a Monocular Camera Using Deep Learning and Object Priors On an Autonomous Racecar
To this end, we propose a low-latency real-time pipeline to detect and estimate 3D position of multiple objects of interest using just a single measurement, i.e. a single image
without the need for any special external markers
We propose a novel “keypoint regression” scheme that exploits prior information about the object’s shape and size to regress and ﬁnd speciﬁc feature points on the image.
We propose a complete pipeline that allows object detection and simultaneously estimate pose of these multiple object using just a single image by exploiting object priors
As per the rules of the competition, the track is marked by cones. The left and right track limits are marked by blue and yellow trafﬁc cones respectively
A novel feature regression scheme, “keypoint regression” is introduced which is used to match 2D-3D correspondences
This section shifts the focus on how to estimate 3D position of multipleobjects from a single image. Although,it is an ill-posed problem but with a priori information in the form of
the shape,size and geometry of the object-of-interest, this is solvable, as elaborated in this chapter.
The pipeline’s sub-modules are run as nodes using Robot Operating System or ROS  as the framework that eases handling of communication and data messages across
multiple systems as well as different nodes. Different sub-modules communicate via messages, they receive data and output processed information. Another important aspect is
that ROS is open-source and provides tools for visualization, monitoring and simulation, making it easy to integrate, test, diagnose and develop the complete software system.
the stereo and the monocular pipeline. The stereo pipeline use the sub-modules explained in this section to have an extremely efﬁcient way of triangulating and estimating depth
from binocular vision. This methodology of drastically reducing the search space and cleverly tackling the issue of having numerous and often incorrect feature matche
The monocular pipeline has 3 crucial sub-modules which enable it to detect multiple objects of interest and accurately estimate their 3D position up to a distance of 15 meters by
making use of a single measurement in the form of an image captured by the monocular camera.
The monocular pipeline can be broken down into three parts. (1) Multiple object detection, (2) Keypoint regression and (3) 2D-3D correspondence followed by 3D pose
estimation from a single image
Object recognition has 4 main categorizes of tasks:
(1) classiﬁcation, (2) classiﬁcation and localization,(3)objectdetectionand(4)instancesegmentation
Instead of using slow and computationally intensive cascade and sliding window approaches, weemployaquick,real-time and powerful object detector in our pipeline in the form
4.2.1 Importance of color information
The path planning then has a cost function with apenalization term for potential paths that drive the car through same colored cones.
We design the detector such that the cone color information can be directly obtained from it. In other words, we treat each colored cone as a different class for the object
4.2.2 Customizing YOLOv2 for Formula Student Driverless
We choose YOLOv2 for the purpose of detecting different colored cones. Thresholds for it are chosen such that false positives, incorrect detections and misclassiﬁcation are
avoided at any cost; even if that translate to not being able to detect all cones in a given image
Since the annotations for cones are long and thin rectangular bounding boxes, we exploit such prior information by re-calculating the anchor boxes used by YOLOv2. This is
done by performing k-means clustering on the aspect-ratio of the rectangle annotations in the dataset and improves the object detector’s performance.
needs to distinguish and detect ‘yellow’, ‘ blue’ and ‘orange’ cones that provide information about the track
4.2.3 Training to detect cones 训练样本训练样本
4.3 Keypoint Regression（关键点回归）（关键点回归）
However, since there is prior information about the 3D shape, size and geometry of the cone, one has hope to recover 3D pose from a single measurement。
4.3.1 From patches to features-The need for “keypoint regression”
Using an object detector, cones can be detected in an image. However, one needs more information to go from detections on the image to 3D positions. We exploit a priori
knowledge about the cone and a calibrated camera to help estimate its depth via 2D-3D correspondences
为此，我们引入了一种基于经典计算机视觉的特征提取方案，该方案具有通过机器学习从数据中学习的味道（To this end, we introduce a feature extraction scheme that is
inspired by classical computer vision but has a ﬂavor of learning from data via machine learning）
4.3.2 Design and architecture of the “keypoint regressor”
The primary difference between this scheme and any other feature extraction process is that this is very speciﬁc as compared to commonly used techniques.
In our case, we want to ﬁnd position of very speciﬁc points on the image that correspond to 3D counterparts whose locations can be measured in 3D from an arbitrary world
- 我的内容管理 收起
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额