4R.-D.Vatavu
with regards to the feedback mechanisms that transform into events in the human-
computer dialogue. The main purpose of the chapter is thus to bring further knowl-
edge to the current understanding of gesture-based interfaces by the means of
meaningful events and by looking at the topic from two different but equally im-
portant perspectives.
2 Events for Spotting and Detecting Gestures
Techniques for spotting gestures in video sequences are being implemented by de-
tecting and monitoring (or tracking) custom events defined using location, time and
posture criteria. The successful detection of one event triggers gesture recording
while detection of another starts the gesture recognizer that classifies the recorded
motion. This section analyses various event types and their associated criteria as
well as the algorithms and techniques employed for segmenting gestures from
sequences of continuous motion. The focus will be primarily oriented towards
vision-based computing but other capture technologies will be mentioned when the
techniques and algorithms used for segmenting human motions are relevant to the
discussion. Jaimes and Sebe [45], Poppe [38], Moeslund et al. [33, 34] and Erol et
al. [12] provide extensive surveys on vision-based motion capture and analysis as
well as on multimodal human-computer interaction and they represent good starting
points for a general overview on the advances in these fields.
We identify four different event types that have been used extensively either
singly or in various combinations for segmenting gestures in video sequences:
• Location represents a powerful cue for detecting postures as well as for segment-
ing interesting gestures from continuous movements. Requiring that a gesture
starts or ends in a predefined region or knowing/learning that some locations in
the scene are more likely to contain valid gestures leads to great reduction in
algorithmic complexity;
• Posture information allows marking gesture commands in a way that feels natural
and accessible for the users to perform and model cognitively: for example, a
given posture could mark the beginning of a gesture while another signals its
ending. Posture is a robust cue that gives both the user as well as the system the
certainty that a gesture command is being entered: the system is able to filter
out the majority of movements while being interested only in the actual gesture
commands. Also, users are creating themselves a mental model for the interaction
process: commands are issued only if specified postures are being executed in a
similar manner to how click-like events work in standard WIMP interfaces;
• Tap and touch events can be detected by touch-sensitive materials as well as by
video cameras (most horizontal interactive surfaces use IR video cameras in order
to detect touch events on the tabletop). A tap or a touch is clearly perceived as a
marking event from both the system as well as the user’s perspective. Touching
clearly signifies both intent as well as command during the interaction process;
• Custom-based events other than the above may be additionally used in order to
ease even further the gesture detection process. They usually relate to various