SPATIAL-TEMPORAL RECOVERY FOR HIERARCHICAL FRAME BASED VIDEO
COMPRESSED SENSING
Wenbin Che, Xinwei Gao, Xiaopeng Fan, Feng Jiang, Debin Zhao
Dept. of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
{chewenbin, xwgao.cs, fxp, fjiang, dbzhao}@hit.edu.cn
ABSTRACT
In this paper, the hierarchical frame based video compressed
sensing (CS) framework is proposed, which outperforms
the traditional framework through the better exploitation of
frames correlation with reference frames, the unequal sample
subrates setting among frames in different layers and the re-
duction of the error propagation. By considering the spatial
and temporal correlations of the video sequence, a spatial-
temporal sparse representation based recovery is proposed
for this framework. The similar blocks in both the current
frame and these recovered reference frames are composed
as a spatial-temporal group, which is defined as the unit of
the sparse representation. By exploiting the low dimensional
subspace description of each group, the video CS recovery
is converted as a low-rank matrix approximation problem,
which can be solved by exploiting the hard thresholding
and the gradient descent. Experimental results show that
the proposed method achieves better performance against
both the state-of-art still-image CS recovery algorithms and
the existing residual domain based video CS reconstruction
approaches.
Index Terms— Video compressed sensing, hierarchical
structure framework, spatial-temporal sparse representation
1. INTRODUCTION
As a new methodology of signal-sampling and recovery, com-
pressed sensing(CS) has been extensively studied in recent
years. As applied to video frames, this theory makes the sam-
pling process faster than traditional sampling methods. Sig-
nificant process in video CS has been made with a single-
pixel cameras[1], based on representing a video in the Fouri-
er domain or the wavelet domain. However, video CS faces
challenges including high recovery quality at a relatively low
subrate[2]. Low subrate which makes it easier to capture
video sequences at a high speed by camera will result in a
poor recovery performance using the still-image CS recovery
This work has been supported in part by the Major State Basic Research
Development Program of China (973 Program 2015CB351804), the National
Science Foundation of China under Grant No. 61272386.
algorithms. By considering the spatial and temporal correla-
tions, it is possible to achieve a high-quality even employing
a low subrate[3]. Mun et al.[4] proposed a residual recov-
ery based on Motion Compensation(MC), which utilized the
temporal redundancy and residual sparse property in video se-
quence. Two subtrates are used in sampling stage of the resid-
ual recovery, where high subrate is adopted for key frames
and low subrate for non-key frames.
In the CS theory, the signal can be well recovered if it is
sparse enough in some domain. Mun et al.[5] cast the CS re-
construction in the base of contourlet transform or complex-
valued dual-tree wavelet transform(DWT), resulting in bet-
ter performance compared to the conventional fixed domain
based recovery methods. However, it is almost impossible
to find a universal domain in which all kinds of signals are
sparse. As an alternative to the CS reconstruction scheme,
the iterative algorithms based on non-local patches have been
proposed recently (e.g.[6, 7]). In [6], the number of nonzeros
3-D transformation coefficients of a group, which is stacked
by the non-local patches, was used to measure the non-local
sparsity. Additionally, the collaborative sparsity measure was
established in [6], enforcing local smoothness and non-local
sparsity simultaneously. A group sparse representation (GSR)
modeling was further developed in [7], using the non-local
grouping technique as well. In essence, this modeling effi-
ciently utilized the intrinsic low-rank property of natural im-
ages, which also exhibits the patch similarity among patch
group. Also, GSR modeling improves the performance of re-
covery over conventional fixed domain based recovery meth-
ods.
In this paper, we consider the Block Compressed Sens-
ing(BCS) recovery of video sequences in which the hierarchi-
cal structure and group sparse representation based method
are used to aid the recovery process. We employ different
subrates for different layers. The 3D patch matching model-
ing, the hard thresholding and the gradient descent are also
adopted to the recovery stage. It can be found in experimen-
tal simulations that the proposed CS recovery based on hier-
archical structure outperforms the state-of-art still-image re-
covery method. Additionally, the proposed technique exceeds
the quality of residual domain based reconstruction by a large
margin.