视觉手势识别驱动的人机交互技术

需积分: 10 176 浏览量更新于2024-09-12 收藏 3.18MB PDF 举报

"这篇文章探讨了基于视觉手部手势识别的人机交互技术，使用局部二值模式的体积空间图进行识别。研究中开发了一个鲁棒的视觉手势识别系统，并创建了一个测试数据库。系统分为三个主要阶段：手部检测、特征提取和分类。关键词包括手势识别、图像描述符、视频描述符、模式、分割、时空、LBP（局部二值模式）、SVM（支持向量机）和分类。" 在人机交互领域，手势识别是一种新兴且具有潜力的技术，它允许用户通过自然的手势与计算机进行交互，无需物理接触设备，增加了交互的直观性和便利性。本文关注的是基于视觉的手势识别，这是一种非侵入性的方法，用户只需在摄像头前做出特定的手势，系统就能识别并执行相应的操作。首先，文章提到了一个鲁棒的视觉手势识别系统。这样的系统通常包括预处理、特征提取和分类等步骤。预处理阶段可能涉及图像增强、噪声过滤以及手部检测。手部检测是关键，常用的方法包括肤色模型、背景差分或机器学习算法，目的是准确地从图像中分离出手部区域。接下来，特征提取阶段是识别的核心。在这篇文章中，作者提到了使用局部二值模式（Local Binary Patterns, LBP）的体积空间图。LBP是一种纹理描述符，对图像中的每个像素点及其邻域进行比较，生成一个二进制代码，这些代码组合起来可以表示局部纹理信息。将LBP扩展到三维空间（即时间维度），可以捕捉到手势的时空特性，这对于识别动态手势尤其有效。最后，特征向量被输入到分类器中，如支持向量机（Support Vector Machine, SVM）。SVM是一种监督学习模型，擅长处理小样本分类问题，能有效地找到最优分类超平面，对手势进行识别。为了验证所提出的系统性能，作者创建了一个测试数据库，这在研究中是至关重要的，因为它提供了真实世界的测试场景和各种手势样本，可以全面评估系统的鲁棒性和准确性。总体来说，这篇论文贡献了一种新的手部手势识别方法，利用了LBP的时空特性，为实现更加自然、直观的人机交互提供了可能。这一技术有望应用于多种领域，如智能家居、虚拟现实、自动驾驶汽车的乘客交互等，极大地拓展了人机交互的边界。

A.I. Maqueda et al. / Computer Vision and Image Understanding 000 (2015) 1–12 3

ARTICLE IN PRESS

JID: YCVIU [m5G;August 21, 2015;15:29]

Fig. 1. Local binary pattern (LBP) from a pixel neighborhood. (a) 3 × 3 gray scale neighborhood. (b) Differences between the neighbor pixels and the center one. (c) Thresholded

neighborhood differences. (d) Histogram of LBP (H-LBP) from the whole image.

Fig. 2. Circularly symmetric neighbor sets for different P and R (extracted from [15]).

Fig. 3. Step 1: H-LBP computation.

2.2. S-LBP computation

The second step consists of extracting spatial information from

the image of LBPs, as shown Fig. 4. First, the coordinates of all the

LBP patterns that have contributed to a speciﬁc bin in the H-LBP his-

togram (representing a speciﬁc LBP type) are computed. From the al-

gorithmic viewpoint, this computation is not necessary as it is previ-

ously performed during the multi-scale LBP computation. Second, a

uniform sub-sampling of the image region coordinates is carried out,

obtaining a total of M × N sampled coordinates, deﬁning M as the

number of rows, and N as the number of columns. The set of coordi-

nates of each LBP bin contributes to one histogram of M × N sampled

coordinates, which are called S

, S

,…, S

M×N−1

in Fig. 4, using a bilin-

ear interpolation. This way, a histogram of spatial coordinates is gen-

erated per each LBP bin of the computed H-LBP (spatial histograms).

As a result, we obtain 2

spatial histograms whose length is M × N,

where P was the number of neighbors in the LBP

P, R

. The H-LBP itself

and the set of spatial histograms are all concatenated to form a super-

descriptor called Spatiogram of Local Binary Patterns (S-LBP), whose

dimension is 2

+ [2

× (M × N)].

The S-LBP descriptor is highly discriminative since it contains both

local (the H-LBP) and global spatial information (histograms of spa-

tial coordinates of all the LBP patterns). The uniform sub-sampling of

the image coordinates allows to shorten the histograms length and

keep the computational cost manageable, establishing a trade-off be-

tween the computational cost and the discrimination ability. On the

other hand, the bilinear interpolation approach increases the robust-

ness against slight image translations, and the grid effect.

2.3. Temporal sampling

The last step consists of adding temporal information to the S-LBP

framework by carrying out a randomly and quasi-equally temporal

sampling scheme in the video sequence. Close images in time hardly

change their appearance, containing redundant information to iden-

tify the action that is being performed. This strategy also allows to

deal with variations in the execution speed of the hand gestures by

considering several sampling steps.

The randomly and quasi-equally spaced sampling is carried out as

follows. An additive random shift is applied to those images corre-

sponding to an equally spaced sampling in the temporal dimension

deﬁned by



, as shown in Fig. 5.

The random shifting is performed following a discrete uniform

distribution over the considered maximum interval



max

.Onceall

the sampled images have been obtained, the S-LBP descriptors from

those selected images are concatenated to form Volumetric Spa-

tiograms of Local Binary Patterns.

Please cite this article as: A.I. Maqueda et al., Human–computer interaction based on visual hand-gesture recognition using volumetric spa-

tiograms of local binary patterns, Computer Vision and Image Understanding (2015), http://dx.doi.org/10.1016/j.cviu.2015.07.009

剩余11页未读，继续阅读

晴天娃娃0x

粉丝: 0

视觉手势识别驱动的人机交互技术

《Human-Computer Interaction》

Auto Hand-3.2 - VR Physics Interaction

04-Human-Computer-Interaction-3rd-Edition-by-ALAN-DIX-JANET-FINLAY-ISBN-0130461091-pdf.zip

Brain-Computer Interfaces: Applying our Minds to Human-Computer Interaction (Human-Computer Interaction Series)

HUMAN-COMPUTER-INTERACTION---breaking-the-glass-ceiling:Shirley Haimoff和我使用Arduino，3D打印，Rhino和Grasshopper设计和编程了人机交互。 该项目结合了使用加速度计的手势识别和电脑游戏，挑战玩家打破玻璃天花板

User Centered System Design New Perspectives on Human-Computer Interaction

A Visual Attention-based Method to Address the Midas Touch Problem Existing in Gesture-based Interaction

Real-Time Vision for Human-Computer Interaction

human-computer interaction an empirical research perspective

Interactive Design: Beyond Human-Computer Interaction

最新资源

HUMAN-COMPUTER-INTERACTION---breaking-the-glass-ceiling:Shirley Haimoff和我使用Arduino，3D打印，Rhino和Grasshopper设计和编程了人机交互。该项目结合了使用加速度计的手势识别和电脑游戏，挑战玩家打破玻璃天花板