Human Action Recognition With Trajectory Based
Covariance Descriptor In Unconstrained Videos
Hanli Wang
∗
Yun Yi Jun Wu
Department of Computer Science and Technology, Tongji University, Shanghai, China
Key Laboratory of Embedded System and Service Computing, Ministry of Education,
Tongji University, Shanghai, China
{hanliwang,13yunyi,wujun}@tongji.edu.cn
ABSTRACT
Human action recognition from realistic videos plays a key
role in multimedia event detection and understanding. In
this paper, a novel Trajectory Based Covariance (TBC) de-
scriptor is proposed, which is formulated along the dense
trajectories. To map the descriptor matrix to vector space
and trim out the redundancy of data, the TBC descriptor
matrix is projected to Euclidean space by the Logarithm
Principal Components Analysis (LogPCA). Our method is
tested on the challenging Hollywood2 and TV Human In-
teraction datasets. Experimental results show that the pro-
posed TBC descriptor outperforms three baseline descrip-
tors (i.e., histogram of oriented gradient, histogram of opti-
cal flow and motion boundary histogram), and our method
achieves better recognition performances than a number of
state-of-the-art approaches.
Categories and Subject Descriptors
I.2.10 [Artificial Intelligence]: Vision and Scene Under-
standing
General Terms
Algorithms, Exp erimentation, Performance
Keywords
TBC Descriptor; Motion Trajectory; LogPCA; Covariance
∗
H. Wang is the corresponding author. This work was sup-
ported in part by the National Natural Science Foundation
of China under Grant 61472281, the “Shu Guang” project of
Shanghai Municipal Education Commission and Shanghai
Education Development Foundation under Grant 12SG23,
the Program for Professor of Special Appointment (Eastern
Scholar) at Shanghai Institutions of Higher Learning (No.
GZ2015005), and the Fundamental Research Funds for the
Central Universities under Grant 0800219270.
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full cita-
tion on the first page. Copyrights for components of this work owned by others than
ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re-
publish, to post on servers or to redistribute to lists, requires prior specific permission
and/or a fee. Request permissions from Permissions@acm.org.
MM’15, October 26–30, 2015, Brisbane, Australia
c
° 2015 ACM. ISBN 978-1-4503-3459-4/15/10$15.00
DOI: http://dx.doi.org/10.1145/2733373.2806310.
1. INTRODUCTION
The past few years have witnessed a great success of so-
cial networks and multimedia technologies, leading to the
generation of vast amount of Internet videos. To well orga-
nize these videos and provide value-added services to users,
it is increasingly important to automatically understand the
human activities from videos. The success of many applica-
tions (e.g., intelligent visual surveillance, human computer
interaction, video retrieval and smart cameras) are condi-
tioned on the accuracy of human action recognition. A num-
ber of research studies are fo cused on this challenging topic,
such as [1, 2, 3], to name a few.
There are generally two main processing steps about a
typical human action recognition algorithm. The first is fea-
ture extraction, in which the human action is described by
feature vectors. The second is detection, in which the fea-
ture vectors are utilized for event classification. This paper
focuses on the first step by describing human actions with
dense trajectory and Riemannian manifolds.
The major contributions of this work are summarized as
follows. First, a novel Trajectory Based Covariance (TBC)
descriptor is proposed to describe human actions. Unlike
other covariance descriptors, the TBC descriptor is formu-
lated along the dense trajectories, which enhances the ability
of describing human actions. Second, the TBC descriptor is
projected to Euclidean space by the Logarithm Principal
Components Analysis (LogPCA) to further improve its de-
scribing ability. The rest of this paper is organized as follows.
The proposed TBC descriptor for human action recognition
is introduced in Section 2. The experimental setup and re-
sults are presented in Section 3. Finally, Section 4 concludes
this paper.
2. TBC DESCRIPTOR
2.1 Descriptor Formulation
As stated in [3], the dense trajectory based method with
camera motion estimation is able to achieve excellent human
action recognition performances on challenging datasets. In-
spired by this, the proposed TBC descriptor is designed
along dense trajectories. Given a video, it is firstly divided
into N trajectories T = {T
1
, T
2
, · · · , T
N
}, and a trajectory
can be defined as
T
n
τ
= {R
1
(W, H), R
2
(W, H), · · · , R
L
(W, H)}, (1)