自动驾驶新时代：KITTI视觉基准挑战

需积分: 18 99 浏览量更新于2024-08-05 收藏 855KB PDF 举报

本文主要探讨了"Are we ready for Autonomous Driving?"这一关键议题，通过介绍KITTI视觉基准套件（The KITTI Vision Benchmark Suite）来评估自动驾驶技术的发展水平。作者Andreas Geiger和Philip Lenz，以及Raquel Urtasun，他们分别来自Karlsruhe Institute of Technology和Toyota Technological Institute at Chicago，共同构建了一个旨在推动自动驾驶领域研究的高标准测试平台。 KITTI数据集，特别是针对自动驾驶任务，包括立体视觉、光流估计、视觉里程估计（SLAM）和3D物体检测，提供了极具挑战性的测试环境。该平台配备了四台高分辨率摄像头，一个Velodyne激光扫描器，以及先进的定位系统，确保了数据的多样性和复杂性。数据集包含了389对立体图像和光学流对，涵盖了39.2公里的立体视觉里程估计序列，以及超过20万条在密集环境中（每张图片最多可见15辆车和30个行人）的3D物体标注。通过对当前最先进的算法进行测试，论文发现，那些在传统基准如Middlebury等上表现优异的方法，在面对KITTI数据集的复杂场景时，并不总是名列前茅。这表明，尽管现有的视觉识别系统已经在某些领域取得了显著进步，但在模拟真实世界自动驾驶环境中的性能仍有待提高。因此，KITTI数据集不仅为研究人员提供了一个衡量技术先进性的标准，也促进了算法在处理实际场景中的鲁棒性和准确性方面的持续改进。这个数据集对于推动自动驾驶技术的成熟和商业化至关重要，因为它迫使开发者去解决现实世界中的各种挑战，如光照变化、遮挡、动态物体和多目标跟踪等问题。

Are we ready for Autonomous Driving?

The KITTI Vision Benchmark Suite

Andreas Geiger and Philip Lenz

Karlsruhe Institute of Technology

{geiger,lenz}@kit.edu

Raquel Urtasun

Toyota Technological Institute at Chicago

rurtasun@ttic.edu

Abstract

Today, visual recognition systems are still rarely em-

ployed in robotics applications. Perhaps one of the main

reasons for this is the lack of demanding benchmarks that

mimic such scenarios. In this paper, we take advantage

of our autonomous driving platform to develop novel chal-

lenging benchmarks for the tasks of stereo, optical ﬂow, vi-

sual odometry / SLAM and 3D object detection. Our record-

ing platform is equipped with four high resolution video

cameras, a Velodyne laser scanner and a state-of-the-art

localization system. Our benchmarks comprise 389 stereo

and optical ﬂow image pairs, stereo visual odometry se-

quences of 39.2 km length, and more than 200k 3D ob-

ject annotations captured in cluttered scenarios (up to 15

cars and 30 pedestrians are visible per image). Results

from state-of-the-art algorithms reveal that methods rank-

ing high on established datasets such as Middlebury per-

form below average when being moved outside the labora-

tory to the real world. Our goal is to reduce this bias by

providing challenging benchmarks with novel difﬁculties to

the computer vision community. Our benchmarks are avail-

able online at:

www.cvlibs.net/datasets/kitti

1. Introduction

Developing autonomous systems that are able to assist

humans in everyday tasks is one of the grand challenges in

modern computer science. One example are autonomous

driving systems which can help decrease fatalities caused

by trafﬁc accidents. While a variety of novel sensors have

been used in the past few years for tasks such as recognition,

navigation and manipulation of objects, visual sensors are

rarely exploited in robotics applications: Autonomous driv-

ing systems rely mostly on GPS, laser range ﬁnders, radar

as well as very accurate maps of the environment.

In the past few years an increasing number of bench-

marks have been developed to push forward the perfor-

mance of visual recognitions systems, e.g., Caltech-101

Figure 1. Recording platform with sensors (top-left), trajectory

from our visual odometry benchmark (top-center), disparity and

optical ﬂow map (top-right) and 3D object labels (bottom).

[

17], Middlebury for stereo [41] and optical ﬂow [2] evalu-

ation. However, most of these datasets are simplistic, e.g.,

are taken in a controlled environment. A notable exception

is the PASCAL VOC challenge [

16] for detection and seg-

mentation.

In this paper, we take advantage of our autonomous driv-

ing platform to develop novel challenging benchmarks for

stereo, optical ﬂow, visual odometry / SLAM and 3D object

detection. Our benchmarks are captured by driving around a

mid-size city, in rural areas and on highways. Our recording

platform is equipped with two high resolution stereo cam-

era systems (grayscale and color), a Velodyne HDL-64E

laser scanner that produces more than one million 3D points

per second and a state-of-the-art OXTS RT 3003 localiza-

tion system which combines GPS, GLONASS, an IMU and

RTK correction signals. The cameras, laser scanner and lo-

calization system are calibrated and synchronized, provid-

ing us with accurate ground truth. Table

1 summarizes our

benchmarks and provides a comparison to existing datasets.

Our stereo matching and optical ﬂow estimation bench-

mark comprises 194 training and 195 test image pairs at

a resolution of 1240 × 376 pixels after rectiﬁcation with

semi-dense (50%) ground truth. Compared to previous

datasets [

41, 2, 30, 29], this is the ﬁrst one with realis-

tic non-synthetic imagery and accurate ground truth. Dif-

下载后可阅读完整内容，剩余7页未读，立即下载

寒墨阁

粉丝: 5039
资源: 33

自动驾驶新时代：KITTI视觉基准挑战

kitti数据集的标注文件

KITTI数据集真值处理

KITTI数据集00序列times.txt文件

KITTY数据集介绍

kitty

HelloKitty

Kitty猫

kitty:KiTTY 稍微定制版

zsh-kitty：zsh kitty插件

Kitty_Website：Kitty_Website

最新资源