by formulating it as a graph coloring problem.
3. Graph construction and coloring
In this section, we first analyze the potential collision relationship
between tubes extracted from the video. Then the construction process
of potential collision graph from the tubes is elaborated based on the
analysis. Finally, a greedy algorithm for coloring the graph to rearrange
the tubes appropriately is presented and discussed in detail.
3.1. Relationship between tubes
The original video with N frames is represented in a 3D space–time
volume as
xyt(, ,
, where (x,y) is the spatial coordinates of the pixel
and t
tN(1 ≤ ≤
is the frame number. A tube is a frame sequence of the
same moving object and can be depicted in a 3D spatial–temporal
volume. The base of the volume cube represents the x–y spatial
dimension (image domain) and the cube's height represents the
temporal dimension (time domain), as shown in Fig. 2. The extraction
of tubes will be described in detail in Section 4.1.
Let T
i
be the tube of a dynamic object with index i and its related
information is obtained from a tracking procedure. During the tube's
existing interval
ttt≤≤
i
s
i
e
, a sequence of rectangle bounding boxes
R
i
are used to denote its spatial locations formally. Although the term
“tube” and “trajectory” are used interchangeably in some similar works,
we differentiate the two terms here. The trajectory of an object is
depicted by the central points of a sequence of rectangle bounding
boxes
R
i
while the tubes are represented by the rectangle bounding
boxes
tt R=([ , ],{ }
i
i
s
i
e
i
t
, where t
i
s
and t
i
e
stand for the starting-time
and the ending-time of tube i, respectively.
To generate video synopsis, the existing methods usually shift the
tubes in time dimension only [19] or in both time and spatial
dimension simultaneously [28]. However, collisions may occur in the
generated synopsis when the tubes are shifted inappropriately. It is
easy to see that from Fig. 2(a), the underlying reason for unpleasant
collisions in synopsis is that two tubes intersect or overlap in the 3D
space, i.e., two objects are rearranged to pass the same area in the
image domain at the same time. Therefore, to mitigate or eliminate
collisions, one should guarantee that no two tubes intersect or overlap
in the 3D space–time volume. However, traditional methods usually
judge two tubes collide or not and how seriously they collide in the 3D
space iteratively during the whole optimization process: the judgment
is conducted repeatedly once a tube changes its synopsis label, which
causes large amounts of computation redundancy. Essentially, the fact
whether or not two tubes collide in the final synopsis can be
determined beforehand by projecting them onto the image plane. As
shown in Fig. 2(c), tube (green) and tube (yellow) intersect at point
xy(′, ′
, while tube (green) and tube (blue) overlap at a series of points
xy x y(, ),…,(,
k
k
1
1
. These points indicate potential collisions of the tubes
in synopsis and will become real ones if they are rearranged to pass
these points at the same frame, whereas other parts of tubes will never
cause collision at all. Accordingly, much emphasis shall be put on these
potential collision points instead of the whole tubes. To avoid collisions
happening in the synopsis, we should guarantee that tubes are passing
these potential collision points at different times. Below three types of
relationship between tubes will be defined to judge whether two tubes
collide or not before rearranging them.
Given two tubes i and j, three types of relationships between them
can be defined according to their spatial relationship after projecting
them onto the image plane, as shown in Fig. 3.
(a) Irrelevant: Two tubes are said to be irrelevant if they have no
intersecting points after being projected onto the image plane, i.e., the
x–y plane as shown in Fig. 3(a), where t
i
s
, t
i
e
and t
j
s
, t
j
e
are the
starting and ending frame of tube i and tube j, respectively. In this case,
the two tubes do not pass the same area of the image plane at all during
their appearance in the camera view, which means that they will never
cause collision when being shifted along the time axis in synopsis video.
As a result, tubes with this kind of relationship can be rearranged free
of collisions arbitrarily, if other issues, such as chronological order and
so on, are not considered.
(b) Intersecting: Two tubes are said to be intersecting if their
projections onto the image plane intersect, as shown in Fig. 3(b).
Although the two tubes do not meet in the original video volume, as
shown by the left figure of Fig. 3(b), they probably cause collision if
they are rearranged improperly. Specifically, let t
i
c
and t
j
c
be the frame
number of the intersection point of tube i and j, respectively, collision
will definitely occur if t
i
c
and t
j
c
are labeled with the same frame-label
in the synopsis video.
(c) Overlapping: Two tubes are said to be overlapping if their parts
or whole tubes share the same spatial regions after being projected
onto the image plane, as shown in Fig. 3(c) and (d). This corresponds to
the case that the original tubes are in the same plane of the 3D spatial–
temporal volume which is perpendicular to the image plane. Two
different kinds of relationship exist in this case: overlapping in the
same direction and in the opposite direction, as shown in Fig. 3(c) and
(d), respectively. The overlapping starting frames of tube i and j are
denoted by
t
i
os
and t
j
os
, and the overlapping ending frames of them are
represented by t
i
oe
and t
j
oe
. Note that t
i
os
, t
i
oe
, t
j
os
and t
j
oe
are all
ordinal numbers in frame counting from the tubes’ starting time. This
is the most difficult case for tube rearrangement and is prone to cause
serious collisions in the synopsis video, especially for the case of
overlapping in the opposite direction. Moreover, it is also one of the
factors that affects the condensation ratio of synopsis since there is not
too much free space for the objects to shift in time dimension without
Fig. 2. The video volume with three tubes. (a) Three tubes in 3D spatial–temporal domain. (b) The 2D x–t axis mapped from (a) to present the tubes in time-axis. (c) The 2D spatial
domain of (a), where the red dots represent the potential collision parts of tubes when rearranging the tubes in synopsis. (For interpretation of the references to color in this figure
caption, the reader is referred to the web version of this paper.)
Y. He et al.
Neurocomputing 225 (2017) 64–79
67