xv
Chapter 7, A Generic Framework for 2D and 3D Upper Body Tracking, targets upper body tracking,
a problem to track the pose of human body from video sequences. It is difcult due to such problems
as the high dimensionality of the state space, the self-occlusion, the appearance changes, etc. In this
chapter, they propose a generic framework that can be used for both 2D and 3D upper body tracking
and can be easily parameterized without heavily depending on supervised training. They rst construct
a Bayesian Network (BN) to represent the human upper body structure and then incorporate into the BN
various generic physical and anatomical constraints on the parts of the upper body. They also explicitly
model part occlusion in the model, which allows to automatically detect the occurrence of self-occlusion
and to minimize the effect of measurement errors on the tracking accuracy due to occlusion. Using the
proposed model, upper body tracking can be performed through probabilistic inference over time. A
series of experiments were performed on both monocular and stereo video sequences to demonstrate the
effectiveness and capability of the model in improving upper body tracking accuracy and robustness.
Chapter 8, Real-Time Recognition of Basic Human Actions, describes a simple and computationally
efcient, appearance-based approach for real-time recognition of basic human actions. They apply a
technique that depicts the differences between two or more successive frames accompanied by a threshold
lter to detect the regions of the video frames where some type of human motion is observed. From each
frame difference, the algorithm extracts an incomplete and unformed human body shape and generates a
skeleton model which represents it in an abstract way. Eventually, the recognition process is formulated
as a time-series problem and handled by a very robust and accurate prediction method (Support Vector
Regression). The proposed technique could be employed in applications such as vision-based autono-
mous robots and surveillance systems.
Chapter 9, Fast Categorisation of Articulated Human Motion, exploits the problem of visual cat-
egorisation of human motion in video clips. Most published methods either analyse an entire video and
assign it a single category label, or use relatively large look-ahead to classify each frame. Contrary to
these strategies, the human visual system proves that simple categories can be recognised almost in-
stantaneously. Here they present a system for categorisation from very short sequences (“snippets”) of
1–10 frames, and systematically evaluate it on several data sets. It turns out that even local shape and
optic ow for a single frame are enough to achieve 80-90% correct classication, and snippets of 5-7
frames (0.2-0.3 seconds of video) yield results on par with the ones state-of-the-art methods obtain on
entire video sequences.
Chapter 10, Human Action Recognition with Expandable Graphical Models, proposes an action
recognition system that is independent of the subjects who perform the actions, independent of the
speed at which the actions are performed, robust against noisy extraction of features used to character-
ize the actions, scalable to large number of actions and expandable with new actions. In this chapter,
they describe a recently proposed expandable graphical model of human actions that has the promise
to realize such a system. This chapter rst presents a brief review of the recent development in human
action recognition. Then, the expandable graphical model is presented in detail and a system that learns
and recognizes human actions from sequences of silhouettes using the expandable graphical model is
developed.
Chapter 11, Detection and Classication of Interacting Persons, presents a way to classify interac-
tions between people. Examples of the interactions they investigate are; people meeting one another,
walking together and ghting. A new feature set is proposed along with a corresponding classication
method. Results are presented which show the new method performing signicantly better than the
previous state of the art method as proposed by Oliver et al.
Chapter 12, Action Recognition, rst reviews the current action recognition methods from the fol-
lowing two aspects: action representation and recognition strategy. Then, a novel method for classifying