联合镜头分割与关键帧提取框架

109 浏览量更新于2024-08-29 收藏 1.83MB PDF 举报

"本文提出了一种联合视频镜头边界检测与关键帧提取的框架，通过考虑关键帧的先验概率、镜头边界的条件概率以及每个视频帧的条件概率，将关键帧提取视为最大后验概率问题，并采用交替策略进行求解。实验结果表明，该方法能有效保持场景层次结构，提取出代表性且有区分度的关键帧，从而提高视频浏览和检索的效率。" 在计算机视觉领域，视频处理是一项重要的任务，其中关键帧提取和镜头边界检测是两个核心环节。关键帧提取是指从视频中挑选出最具代表性的帧，用于高效地浏览和检索视频内容。然而，这个过程具有挑战性，因为它需要考虑到视频内容的多样性和复杂性。本文提出的联合框架将这两个任务整合在一起，利用三个概率组件来优化这一过程。首先，考虑关键帧的先验概率，这是指基于先前知识或统计信息对关键帧出现可能性的估计。其次，引入镜头边界条件概率，镜头边界通常标志着场景的显著变化，对于理解视频内容至关重要。最后，考虑每个视频帧的条件概率，这有助于确定帧与帧之间的相似性和差异性。最大后验概率（MAP）是一种统计决策理论中的方法，用于在给定观测数据的情况下，找出最可能的模型参数。在这里，关键帧提取被视为一个MAP问题，通过综合考虑上述三个概率组件，可以更精确地定位关键帧的位置。通过交替优化策略，该方法可以在保证整体性能的同时，逐步改进关键帧的选择，使得提取出的帧能够有效地代表整个视频的场景层次结构。实验结果证实了该方法的有效性，它不仅能够保持场景的逻辑结构，而且提取出的关键帧具有高度的代表性和区分度。这意味着用户在浏览或检索视频时，可以快速地理解和概括视频的主要内容，从而极大地提高了视频处理的效率和用户体验。该研究提供了一个创新的解决方案，将镜头边界检测和关键帧提取相结合，通过概率模型和优化策略，提升了视频处理的准确性和实用性。这对于视频分析、内容检索以及服务机器人等应用领域具有重要价值。

Joint Shot Boundary Detection and Key Frame Extraction

Xiao Liu

, Mingli Song

, Luming Zhang

, Senlin Wang

Jiajun Bu

, Chun Chen

and Dacheng Tao

Zhejiang Provincial Key Laboratory of Service Robot

College of Computer Science, Zhejiang University

{ender liux, brooksong, snail wang, bjj, chenc}@zju.edu.cn

Center for Quantum Computation and Information Systems, UTS

dacheng.tao@gmail.com

Abstract

Representing a video by a set of key frames is use-

ful for efﬁcient video browsing and retrieving. But key

frame extraction keeps a challenge in the computer vi-

sion ﬁeld. In this paper, we propose a joint framework

to integrate both shot boundary detection and key frame

extraction, wherein three probabilistic components are

taken into account, i.e. the prior of the key frames, the

conditional probability of shot boundaries and the con-

ditional probability of each video frame. Thus the key

frame extraction is treated as a Maximum A Posteriori

which can be solved by adopting alternate strategy. Ex-

perimental results show that the proposed method pre-

serves the scene level structure and extracts key frames

that are representative and discriminative.

1 Introduction

There are millions of cameras over the world captur-

ing a gigantic amount of video data every day and raises

a new challenge: the mass storage and frequent retrieval

lead to temp-spatial cost inevitably. Hence it is valuable

to allow people to retrieve or gain certain perspectives

of a video without watching all the video data.

To maximally transfer the cues from the video into a

limited number of key frames, Zhang et al. [5] proposed

selecting a key frame if its histogram signiﬁcantly dif-

fers from the previous selected one. This method fails to

guarantee the representativeness of the key frames. By

representing each frame as a color histogram, Zhuang et

al. [6] clustered the frames of a video into several clus-

ters, and further obtained a key frame to describe each

cluster. This algorithm totally ignored the temporal in-

formation, which is very important for key frame repre-

sentation. Won et al. [4] detected video shot boundaries

using the luminance variance. Their method is based

on the difference of modelling errors of an ideally mod-

elled transition. Cernekov et al. [1] ﬁrstly detected shot-

s and then extracted key frames using mutual informa-

tion and the joint entropy. Kelm et al. [2] segmented

video into shots by detecting gradual and abrupt cut-

s, and extracted key frames using visual attention fea-

tures. Sun et al. [3] extracted key frames at the peaks of

the distance curve of color distribution between frames.

These methods rely on effective shot boundary detec-

tion. Unfortunately, shot boundary detection is data de-

pendent, and it is difﬁcult to obtain accurate detection

on different videos. Furthermore, even having gotten

semantic shots, the algorithms cannot guarantee each

shot involves a unique qualiﬁed key frame.

In contrast to the previous algorithms which ﬁrst-

ly detect shot boundaries and then extract key frames

based on the division, to solve or at least reduce the

aforementioned problems, we propose a joint frame-

work to integrate both shot boundary detection and key

frame extraction by a probabilistic model. The pro-

posed algorithm is designed to divide a video into ﬁxed

number of shots and to select a key frame for each

shot such that the selected key frames are best match-

ing to the original video. This formulation enables the

shot boundary detection and key frame extraction ben-

eﬁt from each other. And the key frame extraction is

treated as a Maximum A Posterior problem which can

be solved by adopting alternate strategy.

2 A Probabilistic Model for Representing

Video by Key Frames

For frame-based video summarization and retrieval,

shot boundary detection is usually taken as the ﬁrst

21st International Conference on Pattern Recognition (ICPR 2012)

November 11-15, 2012. Tsukuba, Japan

下载后可阅读完整内容，剩余3页未读，立即下载

weixin_38727567

粉丝: 7
资源: 874

联合镜头分割与关键帧提取框架

shot-boundary-detection-by-fcn:全卷积网络镜头边界检测（视频分类）

Shot-Boundary-Detection

Boundary Detection of Altered Region

ConstSeg.rar_Boundary Detection_horizon_horizon detection _边界_边界

Oriented Edge Forests for Boundary Detection论文中文翻译

Image-ReflectBndry.rar_Boundary Detection_symmetric image

A 3D LiDAR Data-Based Dedicated Road Boundary Detection Algorith

Boundary Extraction

Salient Region Detection via Unit Boundary Distribution and Energy Optimization

edgedetect.rar_Boundary Detection_matlab 边缘检测_边界检测_边界跟踪_边缘检测

最新资源