基于图着色的视频摘要：减少碰撞，优化观看体验

68 浏览量更新于2024-07-14 收藏 2.64MB PDF 举报

本文主要探讨了一种创新的视频摘要技术，基于图着色的监控视频概述。传统的视频监控中，由于不理想的管子重排（即监控画面中的视频流线）导致的碰撞问题，给用户的观看体验带来困扰。为了解决这个问题，作者提出了一种新颖的方法，将视频摘要视为一个图着色问题。这种方法首先对监控画面中的所有管子（视频流线）进行空间映射，以识别它们之间的潜在冲突关系。这个过程通过构建一个图形结构来表示，每个节点代表一个视频流线，节点间的边则表示可能存在的碰撞点。在新方法中，关键在于寻找一种颜色分配策略，使得每个视频流线（或管子）被赋予一个颜色，从而最大程度地减少碰撞，同时保持视频摘要的简洁性。这里的颜色数量由一个参数q控制，允许用户灵活调整，以平衡碰撞减少和视频摘要长度。通过优化这个图着色问题，算法能够生成更为紧凑且碰撞较少的视频摘要，提供更舒适的观看体验。相比于传统的全局能量函数最小化方法，这种基于图着色的方法更具创新性，因为它将复杂的视频流线优化问题转化为一个经典的图论问题，具有更高的效率和更好的理解性。实验结果证明了这种方法的有效性，它能够在减少碰撞的同时，生成出比现有方法更高质量的视频摘要，这对于大规模视频数据的快速浏览和检索有着显著的优势。本文的研究成果不仅提高了视频监控系统的用户体验，也为视频数据处理领域的图着色技术应用开辟了新的可能性，对于视频摘要、管子重排以及图论在计算机视觉中的实际应用具有重要的推动作用。

by formulating it as a graph coloring problem.

3. Graph construction and coloring

In this section, we ﬁrst analyze the potential collision relationship

between tubes extracted from the video. Then the construction process

of potential collision graph from the tubes is elaborated based on the

analysis. Finally, a greedy algorithm for coloring the graph to rearrange

the tubes appropriately is presented and discussed in detail.

3.1. Relationship between tubes

The original video with N frames is represented in a 3D space–time

volume as

xyt(, ,

)

, where (x,y) is the spatial coordinates of the pixel

and t

tN(1 ≤ ≤

)

is the frame number. A tube is a frame sequence of the

same moving object and can be depicted in a 3D spatial–temporal

volume. The base of the volume cube represents the x–y spatial

dimension (image domain) and the cube's height represents the

temporal dimension (time domain), as shown in Fig. 2. The extraction

of tubes will be described in detail in Section 4.1.

Let T

be the tube of a dynamic object with index i and its related

information is obtained from a tracking procedure. During the tube's

existing interval

ttt≤≤

, a sequence of rectangle bounding boxes

are used to denote its spatial locations formally. Although the term

“tube” and “trajectory” are used interchangeably in some similar works,

we diﬀerentiate the two terms here. The trajectory of an object is

depicted by the central points of a sequence of rectangle bounding

boxes

while the tubes are represented by the rectangle bounding

boxes

tt R=([ , ],{ }

)

, where t

and t

stand for the starting-time

and the ending-time of tube i, respectively.

To generate video synopsis, the existing methods usually shift the

tubes in time dimension only [19] or in both time and spatial

dimension simultaneously [28]. However, collisions may occur in the

generated synopsis when the tubes are shifted inappropriately. It is

easy to see that from Fig. 2(a), the underlying reason for unpleasant

collisions in synopsis is that two tubes intersect or overlap in the 3D

space, i.e., two objects are rearranged to pass the same area in the

image domain at the same time. Therefore, to mitigate or eliminate

collisions, one should guarantee that no two tubes intersect or overlap

in the 3D space–time volume. However, traditional methods usually

judge two tubes collide or not and how seriously they collide in the 3D

space iteratively during the whole optimization process: the judgment

is conducted repeatedly once a tube changes its synopsis label, which

causes large amounts of computation redundancy. Essentially, the fact

whether or not two tubes collide in the ﬁnal synopsis can be

determined beforehand by projecting them onto the image plane. As

shown in Fig. 2(c), tube (green) and tube (yellow) intersect at point

xy(′, ′

)

, while tube (green) and tube (blue) overlap at a series of points

xy x y(, ),…,(,

)

. These points indicate potential collisions of the tubes

in synopsis and will become real ones if they are rearranged to pass

these points at the same frame, whereas other parts of tubes will never

cause collision at all. Accordingly, much emphasis shall be put on these

potential collision points instead of the whole tubes. To avoid collisions

happening in the synopsis, we should guarantee that tubes are passing

these potential collision points at diﬀerent times. Below three types of

relationship between tubes will be deﬁned to judge whether two tubes

collide or not before rearranging them.

Given two tubes i and j, three types of relationships between them

can be deﬁned according to their spatial relationship after projecting

them onto the image plane, as shown in Fig. 3.

(a) Irrelevant: Two tubes are said to be irrelevant if they have no

intersecting points after being projected onto the image plane, i.e., the

x–y plane as shown in Fig. 3(a), where t

, t

and t

, t

are the

starting and ending frame of tube i and tube j, respectively. In this case,

the two tubes do not pass the same area of the image plane at all during

their appearance in the camera view, which means that they will never

cause collision when being shifted along the time axis in synopsis video.

As a result, tubes with this kind of relationship can be rearranged free

of collisions arbitrarily, if other issues, such as chronological order and

so on, are not considered.

(b) Intersecting: Two tubes are said to be intersecting if their

projections onto the image plane intersect, as shown in Fig. 3(b).

Although the two tubes do not meet in the original video volume, as

shown by the left ﬁgure of Fig. 3(b), they probably cause collision if

they are rearranged improperly. Speciﬁcally, let t

and t

be the frame

number of the intersection point of tube i and j, respectively, collision

will deﬁnitely occur if t

and t

are labeled with the same frame-label

in the synopsis video.

or whole tubes share the same spatial regions after being projected

onto the image plane, as shown in Fig. 3(c) and (d). This corresponds to

the case that the original tubes are in the same plane of the 3D spatial–

temporal volume which is perpendicular to the image plane. Two

diﬀerent kinds of relationship exist in this case: overlapping in the

same direction and in the opposite direction, as shown in Fig. 3(c) and

(d), respectively. The overlapping starting frames of tube i and j are

denoted by

and t

, and the overlapping ending frames of them are

represented by t

and t

. Note that t

, t

and t

are all

ordinal numbers in frame counting from the tubes’ starting time. This

is the most diﬃcult case for tube rearrangement and is prone to cause

serious collisions in the synopsis video, especially for the case of

overlapping in the opposite direction. Moreover, it is also one of the

factors that aﬀects the condensation ratio of synopsis since there is not

too much free space for the objects to shift in time dimension without

Fig. 2. The video volume with three tubes. (a) Three tubes in 3D spatial–temporal domain. (b) The 2D x–t axis mapped from (a) to present the tubes in time-axis. (c) The 2D spatial

domain of (a), where the red dots represent the potential collision parts of tubes when rearranging the tubes in synopsis. (For interpretation of the references to color in this ﬁgure

caption, the reader is referred to the web version of this paper.)

Y. He et al.

Neurocomputing 225 (2017) 64–79

剩余15页未读，继续阅读

weixin_38722184

粉丝: 5
资源: 899

基于图着色的视频摘要：减少碰撞，优化观看体验

基于D3D的YV12视频渲染 更新

基于GPU图像边缘检测的实时性.pdf

基于嵌入式CPU-GPU的高清鱼眼视频实时校正系统.pdf

基于i.MX6的面向智能设备的SABRE平台

基于人工智能算法的图像处理程序集合python源码.zip

opengl显示视频demo可以测试性能

Qt上位机接收ESP32视频流

fisheye-camera：输入：YUV420p流媒体进行鱼眼视频的矫正

QT控件快速显示连续图像

图形着色器与录像查询技术实践

最新资源

基于D3D的YV12视频渲染更新