视频序列紧凑描述符的高效编码方法

122 浏览量更新于2024-08-30 收藏 1.05MB PDF 举报

"用于从视频序列中提取紧凑描述符的高效编码框架" 这篇研究论文探讨了在视频序列中提取紧凑描述符的高效编码方法，旨在优化现有的图像匹配和检索任务。MPEG（Moving Picture Experts Group）制定的一项新兴标准——Compact Descriptors for Visual Search (CDVS) 已经为静止图像提供了压缩的局部和全局描述符。然而，针对视频序列的CDVS描述符的帧级编码并未解决帧间冗余问题，这可能导致大量的带宽和存储资源的消耗。在本文中，作者提出了一个有效的CDVS描述符编码框架，用于生成适用于视频序列的紧凑描述符。对于局部描述符，他们引入了一种多参考预测技术，利用时间连续性来减少相邻帧之间的相似信息重复。这种方法可以显著降低冗余，提高编码效率，从而节省存储空间并减少传输成本。此外，对于全局描述符，作者可能还考虑了视频序列的时空特性，可能采用了某种形式的时空一致性分析，以捕捉帧间的运动和变化模式。这可能包括运动估计、光流计算或其他运动补偿技术，以进一步压缩全局信息，同时保持足够的描述符精度。论文还可能涉及错误恢复策略，因为在实际应用中，数据丢失或传输错误是常见的问题。通过采用鲁棒的编码策略，比如前向错误纠正码或基于上下文的自适应变长编码，可以提高系统对这些情况的容忍度。该研究工作为视频处理和检索领域提供了一种创新的解决方案，通过高效的编码框架降低了视频描述符的存储和传输负担。这一技术对于视频分析、监控、内容检索等应用场景具有重要意义，可以提高系统的性能和效率。未来的研究可能会进一步优化这个框架，使其更加适应不同场景的需求，或者与其他视觉特征编码方法结合，以实现更高级别的视频理解和分析。

AN EFFICIENT CODING FRAMEWORK FOR COMPACT DESCRIPTORS EXTRACTED

FROM VIDEO SEQUENCE

Zhangshuai Huang

?

, Ling-Yu Duan

?

, Jie Lin

†

, Shiqi Wang

◦

, Siwei Ma

?

, Tiejun Huang

?

Institute of Digital Media, School of EE & CS, Peking University, Beijing, 100871, China



Cooperative Medianet Innovation Center, Shanghai, China

†

Institute for Infocomm Research, 119613, Singapore

◦

Dept. of Electrical and Computer Engineering, University of Waterloo, Waterloo, Canada

ABSTRACT

Towards effective and efﬁcient image matching or retrieval tasks, the

emerging MPEG standard, named Compact Descriptors for Visual

Search (CDVS), has fulﬁlled compact descriptors for still images,

consisting of compressed local and global descriptor. Nevertheless,

the frame-level coding of CDVS descriptors from a video sequence

does not address the inter-frame redundancy issue, which may con-

sume considerable bandwidth and storage resources. In this work,

we propose an efﬁcient coding framework of CDVS descriptors to

generate compact descriptors for video sequences. For local descrip-

tors, we propose a multiple reference predictive technique to exploit

the temporal correlation of local descriptors and location coordinates

over a sequence of frames. To further improve the prediction perfor-

mance, keypoint tracking is applied to identify temporally repeated

keypoints. For global descriptors, a propagation coding way is em-

ployed to compress the global descriptors of adjacent frames. The

empirical evaluation has shown that the proposed coding approach

has yielded a low bit rate of less than 40kbps on average, while main-

taining comparable matching and retrieval performance. Compared

to the sequence of original frame-level CDVS descriptors, the pro-

posed approach has achieved over 25× bit rate reduction.

Index Terms— Compact descriptor, MPEG CDVS, Interest

points tracking, Predictive coding, Propagation coding.

1. INTRODUCTION

Video analysis applications, such as mobile augmented reality, vi-

sual sensor network and distributed surveillance, usually transmit

visual data from mobile client to remote server for the subsequent

matching or retrieval tasks. Instead of sending raw data of images or

videos, recent works [1] [2] have proposed to directly extract low bit

rate visual descriptors on mobile client, towards low latency delivery

in wireless environment. In general, visual descriptors can be broad-

ly categorized into two groups. The ﬁrst group is local descriptor,

such as SIFT [3], SURF [4]. The second group is global descriptor,

such as Bag-of-Words [5], Fihser Vector (FV) [6] and Vectors of Lo-

cally Aggregated Descriptors (VLAD) [7]. These global descriptors

are usually aggregated from the statistics of local descriptors.

Compact representation of local and global descriptors has

drawn many research attentions. For compact local descriptor,

Chandrasekhar et al. proposed a Compact Histogram of Gradients

(CHoG) descriptor with ∼50 bits. Other representative works in-

clude BRIEF [8], ORB [9] and BRISK [10]. For compact global

descriptor, Chen et al. [11] introduced Residual Enhanced Visual

Vector (REVV) by reducing the VLAD dimension with Linear Dis-

CDVS

Descriptors

Video

Encoding

Retrieval or

Matching

Decoding

Compressed

stream

Client

Remote Server

CDVS

Descriptors

Fig. 1. Overview of our proposed approach. CDVS descriptors ex-

tracted from video are encoded at client and transmitted to server for

further retrieval or matching task.

criminative Analysis (LDA) followed by sign binarization. Lin et al.

[12] proposed Scalable Compressed Fisher Vector (SCFV) to direct-

ly binarize FV followed by centroid basis bit selection. In particular,

the emerging MPEG standrad [13], Compact Descriptors for Visual

Search (CDVS), has standardized both compact local and compact

global descriptors. It has shown that CDVS obtains state-of-the-art

image matching and retrieval performance at a low bit rate [14].

Nevertheless, there is few work on compressing descriptors ex-

tracted from video sequence, especially for CDVS descriptor. Unlike

still image, video is born with the so called temporal redundancy is-

sue. Recent work has proposed to address this issue on either local or

global descriptor. For local descriptor, Markar [15] proposed a tem-

porally coherent keypoint detector and inter-frame canonical patches

coding techniques. Barofﬁo [16][17] adopted both intra- and inter-

frame coding to compress SIFT- and BRIEF-like [8] descriptors,

where a coding mode decision scheme was proposed to improve the

coding efﬁciency. For global descriptor, Chen [18] proposed inter-

frame coding of scalable residual-based global signatures REVV by

propagating either codewords or residual vectors.

In this paper, we propose an efﬁcient coding framework to com-

press CDVS descriptors steam. An overview of the our proposed

approach is illustrated in Fig.1. In the details of our framework, we

investigate a coding pipeline for both local and global descriptors

(see Fig.2). We ﬁrst introduce a tracking process, before the feature

selection stage of CDVS framework, to recognize the repeated key-

points for better utilizing the temporal consistency. Then we employ

a multiple reference predictive coding technique to reduce the tem-

poral redundancy of local descriptors and location coordinates. At

last, an efﬁcient propagation coding technique is designed to com-

press the global descriptors. Extensive experiments have been con-

ducted over the Stanford MAR Dataset [15]. The results have shown

that our approach can achieve a signiﬁcant bit rate reduction, while

with little effect on the image matching and retrieval performance.

下载后可阅读完整内容，剩余4页未读，立即下载

weixin_38556205

粉丝: 4
资源: 938

视频序列紧凑描述符的高效编码方法

从MPEG2 TS流中提取视频序列流

dragon计算描述符

【Python数据序列化专家指南】：深入解析JSON在Python中的高级应用

【转换性能优化】：大数据集string to int的高效策略

【高效Python编程】：字符串转列表的10大实用技巧

递归树递归下降解析器构建：编译原理中的应用

PostgreSQL中JSON数据的扩展存储与查询：探索数据存储的新维度

【图解深度学习优势】：如何在图像识别中显著超越传统机器学习

【Java 8编程技巧】：方法引用在日期时间API与事件驱动中的巧妙应用

C#ASP.NET网络进销存管理系统源码数据库 SQL2008源码类型 WebForm

最新资源