微软Kinect传感器在计算机视觉中的应用综述

需积分: 10 133 浏览量更新于2024-07-24 收藏 6.71MB PDF 举报

"微软的Kinect传感器是一种低成本的设备，它为广泛的用户提供了高分辨率的深度和视觉（RGB）感应。Kinect传感器提供的深度和视觉信息的互补性为解决计算机视觉的基本问题开辟了新的可能性。这篇综述文章详细回顾了基于Kinect的计算机视觉算法和应用，将所讨论的方法分为预处理、目标跟踪与识别、人体活动分析、手部姿态分析和室内3D映射等类别，并对比了它们与仅使用RGB信息的方法的优缺点。" Kinect是微软推出的一种创新性的体感设备，它以其独特的功能在游戏、交互式设计和计算机视觉领域引起了广泛关注。该传感器的核心特点是能够同时捕捉到场景的深度信息和彩色图像，这使得它能够实现更为精确的环境理解和用户行为分析。在预处理方面，Kinect的数据包括了红外图像和深度图，这些数据需要进行一系列的处理，如去噪、校正和融合，以便于后续的计算机视觉算法使用。这些预处理步骤对于提高整体系统性能至关重要，尤其是对于在复杂环境中的应用。目标跟踪与识别是Kinect应用的一大亮点。通过结合深度信息和彩色图像，系统可以更准确地定位和识别物体，即使在光照条件变化或背景复杂的场景下也能保持稳定的效果。此外，这种跟踪能力还使得Kinect在人机交互、虚拟现实和游戏等领域具有广泛应用。人体活动分析是另一大研究热点。利用Kinect的深度信息，可以实时追踪人体关节，分析人的动作和姿态，这对于健身指导、康复治疗甚至游戏设计都有重要意义。例如，可以开发出能够识别特定运动或舞蹈动作的应用。手部姿态分析则利用Kinect的高精度深度感知来识别和跟踪手指动作，这对于手势控制和无接触交互有着革命性的影响。在无障碍技术、智能家居控制和远程医疗等领域，这种技术具有广泛的应用潜力。最后，室内3D映射是Kinect的另一个强大功能。通过扫描环境并构建3D模型，用户可以创建房间的数字孪生，用于室内设计、导航或者虚拟现实体验。这种方法在建筑、房地产和室内设计行业中非常有用。微软Kinect传感器通过提供丰富的视觉和深度数据，极大地推动了计算机视觉领域的发展，其相关算法和应用不断扩展，涵盖了众多领域，从娱乐到专业应用，展示了强大的创新性和实用性。

HAN et al.: ENHANCED COMPUTER VISION WITH MICROSOFT KINECT SENSOR 1321

Fig. 4. Example for hole-ﬁlling based on the bilateral ﬁlter [25]. (a) Raw

depth image. (b) Depth image after ﬁltering.

turns out that Kinect is able to capture the relative 3-D

coordinates of markers with minor errors (< 1cm) in case the

sensor is positioned in an ideal range (1m to 3m) and with an

effective ﬁeld of view. In [15], authors examine the accuracy

of joint localization and the robustness of pose estimation

with respect to more realistic setups. In the experiment, six

exercises are conducted, in which the subject is either seated

or positioned next to a chair. The exercise is generally chal-

lenging for human pose recognition since the self-occlusion

appears frequently and the capturing view angle is changed

over time. The acquired 3-D location of each joint is then

compared to the data generated by a marker-based motion

capture system, which can be considered as ground truth data.

According to the results, Kinect has a signiﬁcant potential as

a low-cost alternative for real-time motion capturing and body

tracking in healthcare applications. The accuracy of the Kinect

joint estimation is comparable to marker-based motion capture

in a more controlled body pose (e.g., standing and exercising

arms). However, in general poses, the typical error of Kinect

skeletal tracking is about 10 cm. Moreover, the current Kinect

algorithm frequently fails due to occlusions, nondistinguishing

depth (limbs close to the body) or clutter (other objects in the

scene).

III. Preprocessing

The data obtained with Kinect normally cannot be directly

fed into the designed computer vision algorithms. Most of

the algorithms take advantage of rich information (RGB and

depth) attached to a point. In order to correctly combine the

RGB image with the depth data, it is necessary to spatially

align the RGB camera output and the depth camera output. In

addition, the raw depth data are very noisy and many pixels

in the image may have no depth due to multiple reﬂections,

transparent objects or scattering in certain surfaces (such as

human tissue and hair). Those inaccurate/missing depth data

(holes) need to be recovered prior to being used. Therefore,

many systems based on Kinect start with a preprocessing mod-

ule, which conducts application-speciﬁc camera recalibration

and/or depth data ﬁltering.

A. Kinect Recalibration

In fact, Kinect has been calibrated during manufacturing.

The camera parameters are stored in the device’s memory,

which can be used to fuse the RGB and depth information.

This calibration information is adequate for casual usage, such

as object tracking. However, it is not accurate enough for

reconstructing a 3-D map, for which a highly precise cloud of

3-D points should be obtained. Moreover, the manufacturer’s

calibration does not correct the depth distortion, and is thus

incapable of recovering the missing depth.

Zhang et al. [16] and Herrera et al. [17] develop a cali-

bration board based technique, which is derived from Zhang’s

camera calibration technique used for the RGB camera [18].

In this method, 3-D coordinates of the feature points on the

calibration card are obtained from the RGB camera’s coordi-

nate system. Feature-point matching between the RGB image

and the depth image is able to spatially correlate those feature

points between two different images. This spatial mapping

helps feature points to get their true depth values based on

the RGB camera’s coordinate system. Meanwhile, the depth

camera measures 3-D coordinates of those feature points in the

IR camera’s coordinate system. It assumes that the obtained

depth values by the depth camera can be transformed to the

true depth values by an afﬁne model. As a result, the key is

to estimate the parameters of the afﬁne model, which can be

done by minimizing the distances between the two point sets.

This technique combined with a calibration card allows users

to recalibrate the Kinect sensor in case the initial calibration is

not accurate enough for certain applications. The weakness of

this method is that it does not speciﬁcally pay attention to the

depth distortion. Correcting the depth distortion may become

unavoidable for most 3-D mapping scenarios.

There are a few publications that discuss solutions for

Kinect depth distortion correction. Smisek et al. [11] discover

that the Kinect device has shown radially symmetric distor-

tions. In order to correct this distortion, a spatially varying

offset to the calculated depth is applied. The offset at a given

pixel position is calculated as the mean difference between

measured depth and expected depth in metric coordinates.

In [19], a disparity distortion correction method is proposed

based on the observation that a more accurate calibration can

be made by correcting the distortion directly in disparity units.

An interesting paper [20] deals with more practical issues,

which investigates a possible inﬂuence of thermal and envi-

ronmental conditions when calibrating Kinect. The experiment

turns out that variations of the temperature and air draft have a

notable inﬂuence on Kinect’s images and range measurements.

Based on the ﬁndings, temperature-related rules have been

established in the paper, which reduce errors in the calibration

and measurement process of the Kinect.

B. Depth Data Filtering

Another preprocessing step is depth data ﬁltering, which can

be used for depth image denoising or missing depth (hole)

recovering. A naive approach considers the depth data as a

monochromatic image and thus applies existing image ﬁlters

on it, such as a Gaussian ﬁlter. This simple method works only

for regions where the signal statistics is in favor of the underly-

ing ﬁlter. A more sophisticated algorithm [21] investigates the

speciﬁc characteristics of a depth map created by Kinect, and

ﬁnds out that there are two types of occlusions/holes caused

by different reasons. The algorithm automatically separates

剩余16页未读，继续阅读

无视007

粉丝: 0
资源: 1

微软Kinect传感器在计算机视觉中的应用综述

kinect详细介绍（中文）

kinect原理介绍以及专利文档

kinect一些原理与介绍

kinect教程

Kinect开发

KINECT dtw

kinect配置

Kinect教程

Kinect Fusion

Kinect-SDK-Public.rar_kinect_kinect 深度

最新资源