视频真实性检测：基于场景帧指纹的新方法

110 浏览量更新于2024-08-29 收藏 3.36MB PDF 举报

"基于场景帧指纹的视频真实性检测方法是一篇研究论文，主要探讨了如何利用视频中的场景帧指纹来验证视频的真实性。该方法由浙江工业大学、南京信息工程大学网络监控江苏工程研究中心以及南京信息科技大学计算机与软件学院的研究人员提出。文章在2015年5月首次提交，经过修订后于同年7月接受，并于9月1日被接受发表，由胡平 Liu 沟通审阅，9月10日在线发布。关键词包括图像处理、视频指纹、场景帧、视频真实性以及倒排索引的二进制搜索和元数据。" 正文: 视频真实性检测是数字内容保护领域的重要课题，特别是在版权保护需求日益增长的背景下，视频指纹技术成为了一种有效的认证手段。本文提出的基于场景帧指纹的视频真实性检测方法，旨在通过分析视频中的关键帧信息来鉴别视频是否被篡改或伪造。首先，理解“场景帧”是关键。在视频中，场景帧通常是指视频流中显著变化的帧，比如镜头切换、动作开始或结束等时刻的帧。这些帧包含了大量的视觉信息，能够反映出视频内容的结构和特征。因此，它们成为了视频指纹生成的基础。该方法首先对视频进行预处理，提取出每个场景帧的关键特征。这些特征可能包括颜色分布、纹理信息、空间布局等多个维度，形成一种独特的“指纹”。然后，通过使用二进制搜索算法在倒排索引中进行快速匹配，可以高效地对比和查找相似的场景帧指纹，从而判断视频片段之间的对应关系是否一致。此外，论文还考虑到了元数据在视频真实性检测中的作用。元数据包含了关于视频的创作、编辑、传输等信息，它可以作为辅助证据来验证视频的原始性和完整性。结合场景帧指纹和元数据，可以构建一个更强大的验证系统，提高检测的准确性和鲁棒性。整个方法的创新之处在于其结合了场景帧的视觉特性与高效的检索策略，为视频真实性的鉴定提供了一个新的视角。通过这种方法，可以有效地检测出视频中的篡改痕迹，对于数字媒体的安全性评估和司法鉴定具有重要的实际应用价值。这篇研究论文展示了在视频真实性检测领域的一种新思路，即利用场景帧指纹作为识别标志，结合现代信息检索技术和元数据，提高了检测的效率和可靠性。这种方法对于打击非法篡改视频的行为，保护知识产权，以及维护数字内容的真实性和可信度具有重要意义。

into 3/4QCIF (108 132). The size of QCIF has been widely applied

in video processing ﬁeld.

In Refs. [10–12], the down sampling methods are adopted to

reduce image resolution into 32 32. The ﬁngerprint of the frame

is made up of two parts: (1) local characteristic, (2) global

characteristic-conﬁdence of the frame. The ﬁrst one presents the

average value of local luminance and the differential of multi-

resolution. Each average value or differential is called frame ele-

ment. However, we believe that most of the information will be

eliminated or obfuscated by down sample the whole image into

32 32, and the extracted ﬁngerprint will be thus insufﬁcient to

represent the whole image. So, here we just borrow the ideas from

the above references and extract the ﬁngerprint from the QCIF

image. The details are shown in Fig. 2 (In the 9 11 sub-region, a

to h is the average value of the local pixels). The exaction method

of the frame element is: (1) the mean element of 9 11 sub-

region; (2) four differential elements a–b, c–d, e–f and g–h. There

are 720 elements in total, including 144 mean elements marked as

A elements and 576 differential elements marked as D elements.

For a large video database, normally the elements are stored as

ﬂoat-point format. This consumes a lot of storage space and is thus

not good for the implementation of video copyright authenticity,

tracing, and retrieving. To improve the situation, the elements are

commonly operated as integers. In Refs. [8,16,17], the Hashing

value is obtained by binarizing these elements. In Refs. [10–12]

these elements are operated by ternary qualiﬁcation. Though these

methods can save storage memory and increase the efﬁciency of

the computer work; however the accuracy of the authentication

will be reduced correspondingly. Comparing with those methods,

the technique of quaternion qualiﬁcation proposed in this paper

can deliver more accurate results.

Let A

represents the value of element A. A

is quantiﬁed into

quaternion value x

by using Eq. (1). Where i¼1, 2,…,144 denotes

the dimension index of A.

3; if ðA

1284 ThAÞ

2; if 0o A

128r ThAÞ

1; if ðThAo A

128r 0Þ

0; if ðA

128r ThAÞ

ð1Þ

The threshold value ThA can be simply deﬁned as (max(A

)

min(A

))/4. However, this approach only divides the value between

the maximum value and minimum value of A

into four equivalent

values. If an abrupt change value in A element occurs, such as a

large difference between the maximum and second largest or

between the minimum and second smallest, all the values of

element A will be quantiﬁed into a binary value. To avoid such a

situation, in this paper, the threshold value ThA is set dynamically

according to Refs. [11,12].

1. Let a

¼abs(A

128), where abs(⊡) is the operator to get the

absolute value, sorting a

in the ascending order. Let a

; a

; ⋯; a

denote the values sorted in the ascending order.

Note that k¼1, 2,⋯,144 is an index representing the order of the

sorted values, and is different from dimension index i.

2. The threshold ThA¼a

, where k¼ﬂoor(0.25*N), N¼144.

Similarly, for the element with the D

dimensions from 145

to 720, they are quantiﬁed into the quaternion value x

3; if ðD

4 ThDÞ

2; if ð0o D

r ThDÞ

1; if ðThDo D

r 0Þ

0; if ðD

r ThDÞ

ð2Þ

The process to get the threshold ThD is as follows:

1. Take d

¼abs(D

), sorting d

in ascending order, and let d

¼{d

…,d

} denote the values sorted in ascending order. Note that

k¼1, 2,…, 576 is an index representing the order of sorted

values, and is different from dimension index i.

2. The threshold ThD¼d

, ThD¼d

, where k¼ﬂoor(0.25*N),

N¼576.

Seven hundred twenty elements will occupy 720 bytes, if we

take the quaternion value {3,2,1,0} to represent the video scene

frame without any special operation. The space required to store

the ﬁngerprint data is therefore becoming huge for the large

amount of video database for video authentication. So, the four

dimensions value is converted to 8 bits, called word, by storing the

extracted four weight value in binary formats, and the total bits

will be 720/4¼180 words (1440 bits).

Let word

, i¼1, 2,…, 180 denotes the encoded value of each 4-

dimensional unit which will be calculated as

word

¼4

 x

ði 1Þ4 þ1

þ4

 x

ði 1Þ4 þ2

þ4

 x

ði 1Þ4 þ3

þx

ði 1Þ4 þ4

ð3Þ

Obviously, word

A [0, 255]. As the method encodes four units to

one-byte, it saves 3/4¼75% storage space. Thus, 180 words are

extracted from each scene frame.

3. Video ﬁngerprint extraction based on scene frame

In order to compare our ﬁngerprint extraction method with

other extraction methods, we will ﬁrst review the existing video

ﬁngerprint extraction methods.

3.1. The extraction methods for video ﬁngerprint at present

Recently, the ﬁngerprint technology is widely used in the ﬁeld

of copyright authentication, copy detection, multimedia index and

so on. The algorithms of video ﬁngerprint put forward by

researchers can be classiﬁed into 4 categories [8,14,18]: color-

space-based, temporal, spatial and spatial–temporal.

The color-space-based approach relies on the color histogram

of video spatial–temporal domain [14] to extract the ﬁngerprint by

the statistics of color characteristics. Due to the 24 bits true color,

the statistics is too big to get a reasonable ﬁngerprint extraction

speed. Besides, the color will be changed very much [8] in the

different formats of videos. Furthermore, the ﬁngerprint extraction

method for the color space cannot be applied to the white–black

videos. It is the reason why this method is not widely used.

The temporal algorithm is to extract temporal characteristics

from the video frame sequence [13]. It can be applied to the long

duration videos, but not the short-time segments. Consequently it

is not suitable for online application which is full of the short-time

videos.

Fig. 2. Extraction of frame ﬁngerprint elements, various average and difference

elements are used to create the complete frame ﬁngerprint.

J. Mao et al. / Neurocomputing 173 (2016) 2022–20322024

剩余10页未读，继续阅读

weixin_38737144

粉丝: 4
资源: 942

视频真实性检测：基于场景帧指纹的新方法

一种基于场景帧指纹的视频真实性方法

基于MATLAB的指纹检测方法实现.pdf

Ridge-Slope-Valley功能用于指纹活动性检测

OpenCV人脸识别与深度学习融合：探索人脸识别新境界，解锁更多可能性

YOLO表情识别在安防领域的应用：提升安全性和效率，保障社会稳定

基于Java语言的蓝牙遥控器设计源码，支持键盘、鼠标、影音遥控器

数据手册-74HC573-datasheet.zip

苏州科技大学在辽宁2020-2024各专业最低录取分数及位次表.pdf

c++的概要介绍与分析

锻压成型机_三维3D设计图纸.zip

最新资源