3D-CNN分类器组合在视频复制检测中的应用

84 浏览量更新于2024-08-28 收藏 1.36MB PDF 举报

"这篇研究论文探讨了如何利用两类3D卷积神经网络（3D-CNN）分类器的组合来优化视频复制检测的过程。作者包括Jing Li, Huaxiang Zhang, Wenbo Wan和Jiande Sun。文章指出，尽管3D-CNN在视频分类上表现出色，但其计算需求和训练数据量对于复杂任务而言是巨大的挑战，限制了其广泛应用。" 正文: 在计算机视觉领域，3D-CNN（三维卷积神经网络）作为一种最新的模型，被广泛应用于视频分类任务中，因为它能够捕获视频中的时空特征。然而，3D-CNN在处理大规模视频数据和复杂分类任务时，其所需的计算资源和训练数据量成为了一大难题。为了解决这个问题，该研究论文提出了一个创新的平行3D-CNN架构。论文中，研究人员受到人类判断过程中的排除法启发，将多类分类任务分解成多个两分类任务。他们用3D-CNN作为每个两分类任务的分类器，这样做的好处是显著降低了训练3D-CNN的难度和数据需求。与传统的用于多类分类的3D-CNN相比，这种方法使得每个独立的3D-CNN只需关注一个更具体的分类问题，从而降低了复杂性。此外，通过结合多个两分类器，该方法还提供了更强的鲁棒性和准确性。这种组合策略允许系统从不同角度分析视频，提高对复制检测的敏感度，同时减少误报的可能性。通过这种方式，即使在训练数据相对有限的情况下，也能有效地识别和区分视频是否为复制版本。论文进一步讨论了实验设计和结果，可能包括训练和测试过程、性能评估指标（如准确率、召回率、F1分数等）以及与其他视频复制检测方法的对比。研究人员可能还分析了不同参数设置对系统性能的影响，以优化模型的配置。总结来说，这项研究提出了一种高效且适应性强的视频复制检测方法，通过将3D-CNN应用于两分类任务的并行结构，解决了3D-CNN在处理大量复杂视频数据时面临的计算和数据挑战。这种方法不仅简化了训练过程，而且提高了检测系统的整体性能，对于视频内容保护和版权管理等领域具有重要的实践意义。

Multimed Tools Appl

the multi-class classification task to two-class classification. The complexity of the design

and the requirements on computational resources of 3D-CNN are reduced at the same time.

However, the storage of the network parameters may be increased as more 3D-CNNs are

included.

The contributions of this paper are: 1) Parallel 3D-CNNs are proposed to be used for

multi-class classification. In the proposed parallel structure, each 3D-CNN is used as a two-

class classifier for one specific video class. It makes the training of 3D-CNN much easier

and reduces the requirement of calculation so that the parallel 3D-CNN can be implemented

on the computer with normal configurations on CPU, GPU, and Memory, and achieve

good enough performance. 2) The temporally downsampled versions of videos are used to

increase the volume of dataset and the number of positive training data. During the train-

ing of 3D-CNN, each video is downsampled into sub-videos with the fixed interval, and

these sub-videos can represent the video in a low temporal resolution. Here, downsampling

not only helps to increase especially the number of positive data, which can guarantee the

performance of 3D-CNN, but also makes it possible to classify videos based only on their

sub-videos, which can make the input of video classification light-loaded. 3) The proposed

parallel structure can grow with the increment of new class. Each 3D-CNN in the proposed

model recognizes whether the input video belongs to its class according to the defined

threshold. If a video belongs to none of the existing classes, the video will be classified as

a new class and an additional 3D-CNN for the new class is constructed in the same way

as each of the existing 3D-CNNs. The feasibility of the proposed parallel 3D-CNN model

in video classification is verified through its application on video copy detection within the

WEB VIDEO dataset.

2 Parallel 3D-CNNs

2.1 3D-CNN

3D-CNN is proposed by Ji et al for action recognition in [6]. 3D-CNN can extract features

from video input streams directly, which is good at derive local motion information from

the video. There is a hardwired layer after input video in the 3D CNN in [6], which is set to

generate multiple channels of information from the input frames, including gray, gradient-

x, gradient-y, optflow-x, and optflow-y. These features are mainly on motion or differences

caused by motion. However for video classification, only these features are not enough,

so we keep the main structure of the 3D-CNN presented in [6] except removing its hard-

wired layer, and use video frames as the input of the first convolutional layer to allow more

possible information be analyzed. The 3D-CNN model used in this paper is shown in Fig. 1.

The feature map of convolutional layer is defined as:

xyz

= sigm

⎛

⎝



−1



p=0

−1



q=0

−1



r=0

pqr

ij n

(x+p)(y+q)(z+r)

(i−1)n

⎞

⎠

(1)

where f

xyz

is the value of the point

(

x,y,z

)

of the jth feature map at the ith layer, sigm(

) is sigmoid function and b

is the bias of the jth feature map at the ith layer. w

pqr

ij n

the

(

p, q, r

)

th value of the kernel connected to the nth feature map in the previous layer.

) is the kernel size of ith layer.

剩余12页未读，继续阅读

weixin_38557896

粉丝: 0
资源: 971

3D-CNN分类器组合在视频复制检测中的应用

一种快速有效的网络视频拷贝检测方法

论文研究-一种快速有效的网络视频拷贝检测方法.pdf

2D-CNN-and-3D-CNN_MRI-Classification:使用5折交叉验证的MRI多重分类的2D CNN和3D CNN

A-Fast-3D-CNN-for-HSIC-master_3D-CNN_ai_

video-classification-3d-cnn-pytorch：使用3D ResNet的视频分类工具

运动想象分类matlab代码-CNN-MI-BCI:用于MI-BCI分类的CNN-SAE程序。（基于“Tabaretal-2016-JNeur

颜色分类leetcode-3d-cnn-action-recognition:在UCF-101数据集上使用3DConvnet实现动作识别

3D-CNN-3D-images-Tensorflow.zip

tensorflow 3d-cnn

Coronary-Artery-Tracking-via-3D-CNN-Classification:3D CNN跟踪器的PyTorch重新实现，可提取具有最先进（SOTA）性能的冠状动脉中心线。 （纸

最新资源

Coronary-Artery-Tracking-via-3D-CNN-Classification:3D CNN跟踪器的PyTorch重新实现，可提取具有最先进（SOTA）性能的冠状动脉中心线。（纸