情感视角的视频摘要方法

需积分: 20 126 浏览量更新于2024-09-10 1 收藏 1.44MB PDF 举报

“video summarization是视频处理领域的一种技术，它通过稀疏编码方法来提取视频中的关键帧，从而创建视频摘要。该技术尤其关注非专业编辑的视频，这些视频往往包含大量冗余信息。研究从情感视角出发，利用人类标注的情绪数据对每一帧进行情感特征提取，并通过线性回归训练预测模型，以选择具有高情绪得分的片段作为视频的情感摘要。” 视频摘要（Video Summarization）是计算机视觉和多媒体处理领域的一个重要课题，它的目标是将长时的原始视频压缩成较短的、包含视频主要内容的精简版本。在这个过程中，关键帧的选取至关重要，因为它们能代表视频的主要情节和事件。本文介绍的方法采用稀疏编码技术来实现这一目标，这是一种高效的信号表示方法，可以有效地捕捉视频帧之间的相似性和差异性，从而识别出最具代表性的帧。文章中提到的研究着重于情感视角下的视频摘要。首先，研究人员创建了一个由人类标注情绪分数的视频数据集，这使得系统能够理解并量化视频中的情感内容。然后，他们从训练集视频中提取每帧的情感特征，这些特征可能包括色彩、纹理、运动和面部表情等与情感相关的元素。接下来，通过线性回归模型，他们训练了一个预测模型，该模型可以根据特征向量预测帧的情感得分。视频被分割成多个段，然后优化这些段的情感得分总和，选择长度低于特定值的子集作为视频的摘要。这种方法确保了所选片段不仅代表了视频的主要内容，还反映了整体的情感动态。实验结果显示，这种基于情感的视频摘要方案能有效捕获和传达视频的情感信息，对于非专业编辑的、包含大量冗余信息的视频尤为适用。这个研究提供了一种创新的视频摘要方法，它结合了稀疏编码和情感分析，使视频摘要更加生动、富有情感，更符合人的感知和理解。在实际应用中，这种方法可以帮助用户快速浏览和理解长视频，特别是在社交媒体、电影剪辑、监控视频分析等领域有着广阔的应用前景。

Creating Video Summarization From Emotion

Perspective

Yijie LAN, Shikui WEI



, Ruoyu LIU, Yao ZHAO

Institute of Information Science,

Beijing Jiaotong University, Beijing 10044, China

E-mail: 14120320@bjtu.edu.cn, shkwei@bjtu.edu.cn, 121120 62@bjtu.edu.cn, yzhao@bjtu.edu.cn

Corresponding Author

Abstract—This paper proposes a novel approach of

summarization to non-professional edited videos containing a

lot of redundant information, from the viewpoint of emotion.

Ground-truth emotion scores of each frame are firstly obtained

from our dataset annotated by humans. Then, we extract

emotional features of each frame from training set videos. After

that, we train our predictive model from the feature vectors and

emotion scores by using linear regression. Meanwhile, videos

are partitioned into several segments. We select a subset of

segments in the video whose length is below a specific value by

optimizing the sum of their emotion scores. This subset of

segments can be treated as the emotional video summarization

desired. The experimental results show that the proposed

scheme can achieve an effective emotional video summarization.

Keywords—Video summarization; emotion; regression model;

video segment

I. INTRODUCTION

With the rapid development of the technology of

multimedia and Internet, multimedia data experienced

explosive growth in the past decades. Digital videos, as a

major carrier of multimedia information, have been applied

widely in many aspects of our life. However, the majority of

these videos contain much redundant information. When

searching for interesting segments or significant parts of

videos which are tedious, it consumes a lot of time for the

viewer. Video summarization can solve this problem

efficiently. Video summarization, which is similar to text

summarization, is to summarize an original video, find some

useful or needed parts and comprise them into a short-time,

compact and efficient summarization. It can be easy for users

to find the video parts that they want to watch and transmit,

such as movie trailers.

It cannot be denied that each video has some emotion in

it. However, almost all of the previous methods neglect a fact

that emotion is a key point of information we want to get

from video. Generally speaking, users often focus on the

video part which has strong emotion. For example,

professional movie trailer is usually composed of the shots

that make users feel some sense of strong affection. Another

example is that videos of birthday party often express

delightful emotion. When we make a summary of this video,

the summarized parts tend to be the parts that have smiling

faces or about blowing out candles, instead of the parts about

preparing party, even though they have equivalent semantic

meaning of “birthday party”. Therefore, emotional video

summarization can provide great assistance in intuitive

understanding of a video.

The contributions of the proposed scheme can be

summarized as follows:

1. A new dataset is constructed which is annotated by

humans to get “emotion scores”. This dataset consists

of several movie fragments edited by non-professional

users, which contain redundant information. Emotion

scores are got by voting, which reflect the intensity of

emotion.

2. An emotion feature vectors is proposed, which

combines low-, mid- and high-level visual features. It

stores emotional information by establishing the

"bridge" between emotion and visual features.

3. A new approach for video summarization from emotion

perspective is proposed, by estimating the emotion

scores of segments and optimizing the sum of them.

II. RELATED WORKS

Two related techniques are introduced in this section, i.e.,

video summarization and emotion recognition.

A. Video Summarization

From the 1990s, video summarization technique has

drawn much research and industrial interest. In [1], Truong

et al. present a detail review of the video summarization

works before 2007. They describe two main types of video

abstracts: key-frames and video skims.

Key-frames are also called static storyboard which

consists of a collection of salient images extracted from the

video source. Early works in this form extracted key-frames

by using optical flow computations [2] or low-level features

[3]. In recent years, key-frames are selected using clustering

based on visual features [4], objects [5] or change detection

[6]. However, these key-frame based approaches are not

sufficient due to discarding the most important motion

information. The second form is video skim, which is also

called dynamic summary, which consists of a collection of

ICSP2016

1112

下载后可阅读完整内容，剩余5页未读，立即下载

qq_41799605

粉丝: 1
资源: 2

情感视角的视频摘要方法

LCSTS: A Large-Scale Chinese Short Text Summarization Dataset LCSTS：大型中文短文本摘要数据集-数据集

matlab的egde源代码-Video-Summarization-with-LSTM:实施ECCV2016论文（具有长短期记忆的视频摘要）

中文摘要LCSTS下载链接.txt

Efficient Video Recommendation with Multi-Head Self-Attention and Hybrid Sampling

No module named 'gensim.summarization'

安装summarization

在 '__init__.py | __init__.py' 中找不到引用 'summarization'

text summarization with pretrained encoders

gensim.summarization

最新资源

在 'init.py | init.py' 中找不到引用 'summarization'