1
Combining 3D Joints Moving Trend and Geometry
Property for Human Action Recognition
Bangli Liu
1
, Hui Yu
2
, Xiaolong Zhou
1,3
, Honghai Liu
1
1
School of Computing, University of Portsmouth, UK
2
School of Creative Technologies, University of Portsmouth, UK
3
College of Computer Science and Technology, Zhejiang University of Technology, China
Abstract—Depth image based human action recognition has
attracted many attentions due to the popularity of the depth
sensors. However, accurate recognition still remains a challenge
because of various object appearances, poses and video sequences.
In this paper, a novel skeleton joints descriptor based on 3D
Moving Trend and Geometry (3DMTG) property is proposed
for human action recognition. Specifically, a histogram of 3D
moving directions between consecutive frames for each joint is
constructed to represent the 3D moving trend feature in spatial
domain. The geometry information of joints in each frame is
modelled by the relative motion with the initial status. The
proposed feature descriptor is evaluated on two popular datasets.
The experimental results demonstrate the superior performance
of our method over the state-of-the-art methods, especially the
higher recognition rates for complex actions.
Index Terms—Human action recognition, 3D Moving Trend,
geometry property.
I. INTRODUCTION
As immense applications in human-machine interaction,
vedio surveillance, elderly care and entertainment, human
action recognition has been attracting extensive attentions in
computer vision. Early proposed strategies mainly recognize
human action from 2D sequences captured by RGB cameras
[1][2][3][4]. However, the sensitivity to illumination changes
and subject texture variations often degrades the recognition
accuracy. These problems can be solved by using depth in-
formation acquired by depth sensors such as Microsoft Kinect
and ASUS Xtion, which have been promoting the research on
human action recognition. Because images from depth channel
provide another dimension information (the depth data), this
encourages a lot of depth sensors based recognition methods.
With the availability of 3D joint positions extracted by a
real time skeleton tracking algorithm [5], a lot of researchers
use these joints to build action representations. For example,
a histogram of 3D joint locations (HOJ3D) is proposed to
represent human postures in [6]. Gowayyed et al. [7] propose
a 2D trajectory descriptor for each skeleton joint, where the 3D
joint trajectory is projected into three plane, then a histogram
of oriented displacements(HOD) is used to record the angles
between two consecutive motion frames in each plane.
Inspired but quite different from [7], we partition moving di-
rections of joints into m even bins according to m vectors, and
introduce a histogram of 3D directions. The histogram records
the moving trend of each joint over the entire sequence.
Moreover, we also propose a sequenced motion feature by
extracting the geometry property of each joint. The final
feature descriptor is the concatenation of these two types of
features. Contributions of this paper are as follows.
1) A new histogram projection method is proposed to extract
the 3D moving trend of each joint, which can describe its
specific tendency in 3D space.
2) The geometry property of joints is constructed by using
the relative motion of each frame with the initial status to
represent the evolution of actions.
3) A novel scale-invariant skeleton joints feature descriptor
based on 3D Moving Trend and Geometry (3DMTG) property,
which is named as 3DMTG descriptor, is proposed for human
action recognition. Experimental results show that the pro-
posed feature descriptor has superior performance over many
leading methods in the state-of-the-art, especially a better
recognition ability for actions in Cross Subject Tests.
The remainder of this paper is organized as follows: Section
II reviews related work for human action recognition. Section
III introduces the process of modelling the 3DMTG feature
descriptor. Section IV reports various experimental results
as well as the comparison with the state-of-the-art methods.
Section V summarizes the work of this paper.
II. RELATED WORK
In recent years, there is extensive literature on depth images
based human motion recognition. Depending on used feature
types, these methods can be broadly divided into two cat-
egories: depth maps-based methods and skeletal joints/body
parts-based methods.
Depth maps-based methods mainly extract space features
along time [8]. Some authors [9][10] project depth images
onto three 2D orthogonal planes to capture action features
from diverse viewpoints. In [9], depth motion map (DMM)
is generated by accumulating motion energy over the whole
sequence and the histogram of gradient (HOG) for each
DMM is computed to describe actions. Local interest points
and occupancy patterns are also presented as descriptors of
actions [11][12]. Vieiral et al [12] apply space-time occupancy
patterns (STOP), where the depth map sequence is represented
as a 4D grid with same-size cells whose occupancy value
are recorded. A saturation scheme is used to enhance the
cells containing more information about either silhouettes or
moving parts of the body. In [13], the 4D spatio-temporal
feature is captured using information from both RGB and