基于RGB-D训练的视频立体化：前景分割与深度传播

PDF格式 | 1.88MB | 更新于2024-08-26 | 118 浏览量 | 举报

本文主要探讨了"基于示例的视频立体化（Example-Based Video Stereolization, EBVS）"这一主题，该研究发表在2014年11月的《IEEE Transactions on Multimedia》第16卷第7期。随着3D电视技术的发展，视频立体化在近年来得到了广泛关注，然而从单视图的2D视频中生成高质量的深度地图是一项挑战。作者Lei Wang和Cheolkon Jung提出了一个自动化的视频立体化方法，特别强调了前景分割和深度传播的重要性。首先，EBVS方法旨在提高性能并兼顾计算复杂性。它针对关键帧和非关键帧采用了不同的处理策略。在关键帧中，研究人员利用RGB-D训练数据集中的示例来估计初始深度地图。这种方法依赖于深度学习和计算机视觉技术，通过对训练样本进行分析，提取特征并生成深度信息。为了确保边界清晰度，特别是对于前景物体，他们对初步深度图进行了细化处理，确保边缘细节的准确呈现。在非关键帧中，EBVS采用运动补偿技术，通过将关键帧的深度信息传播到相邻帧，减少了深度地图的计算负担。这种方法考虑了视频序列中的连续性和运动趋势，有助于保持场景的连贯性。通过这种方式，即使在没有直接深度信息的帧中，也能生成相对可靠的深度信息。最后，利用深度图像渲染技术（Depth-Image Based Rendering, DIBR），研究人员将这些深度地图与原始2D视频结合，生成逼真的立体视图。DIBR是将不同视角的深度信息映射到同一帧上，从而创建出三维效果的过程。这个过程既包括视差计算，也涉及纹理合成，使得最终生成的立体视频在观感上更为真实。这篇研究论文深入研究了如何通过结合示例学习、前景分割、深度传播和运动补偿等技术，有效地从2D视频中生成立体化效果，为视频内容的3D化提供了一种有效且高效的解决方案。其研究成果对于增强3D多媒体应用的用户体验具有重要意义。

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 16, NO. 7, NOVEMBER 2014 1905

Example-Based Video Stereolization

With Foreground Segmentation and

Depth Propagation

Lei Wang and Cheolkon Jung, Member, IEEE

Abstract—With advances in 3DTV technology, video stereoliza-

tion has attracted much attention in recent years. Althou gh video

stereolization can enrich stere

oscopic 3D contents, it is hard to

create good depth maps from monocular 2D videos. In this paper,

we propose an automatic example-based video stereolization

method with foreground segmen

tation and depth propagation,

called EBVS. To cons id er b oth performance and computational

complexity, we separately estimate depth maps according to the

key and non-key frames. In th

ekeyframes,weﬁrst estimate an

initial depth map based on examples from the RGB-D training

data set, then reﬁne it to preserve boundaries of foreground

objects. In the non-ke

y frames, we propagate the depth map of

the key frame using motion compensation, and generate depth

maps. Finally, we employ depth-image-based-rendering (DIBR) to

generate stereosco

pic views from 2D videos and their depth maps.

Extensive experiments verify that the proposed EBVS produces

visually pleasing and realistic stereoscopic 3D views from 2D

videos.

Index Terms—3DT

V, depth generation, depth propagation,

depth-image-based-rendering , learning-based, stereoscopic views,

video stereolization.

I. INTRODUCTIO N

ECAUSE 3DTV p

rovides realisti c 3D e ffects to viewers

based on stereoscopic 3D contents, it is expected to have

a dominant position in the market of the next generation digital

TV. Howeve

r, the promotion of 3DTV is constrained due to

the lack of available stereoscopic 3D contents. Altho ugh new

stereoscopic contents are recently captured by stereoscopic

cameras

such as activ e depth sensors [1], it still remains an

open problem t o con vert the existing large amounts of mono c-

ular 2D videos into stereoscopic 3D co ntents, called vid eo

stere

olization. The vis ual ability to perceive stereoscopic 3D

contents is closely related to the h um an depth perception. That

is, the slight diff erence between the left-eye and right-eye

vie

ws, i .e., horizontal disparity, is transformed into different

depth informa tion, and leads to different stereoscopic visual

Manuscript received February 06, 2014; revised May 20, 2014; accepted July

14, 2014. Date of publication July 22, 2014; date of current version October 13,

2014. This work was supported by the National Natural Science Foundation of

China under Grant 61271298 and the International S&T Cooperation Program

of China under Grant 2014DFG12780. The associate editor coordinating the

review of this manuscript and approving it for publication was Prof. Jing-Ming

Guo.

The authors are with the Key Lab of Intelligent Perception and Image Un-

derstanding, Ministry of Education of China, Xidia n University, Xi’an 710071,

China (e-mail: lwang@stu.xidian.edu.cn; zhengzk@xidian.edu.cn).

Color versions of one or more of the ﬁgures in this paper are available online

at http://ieeexplore.ieee.org.

Digital Object Identiﬁer 10.1109/TMM.2014.2341599

perceptions of outward perception, on-screen perception, and

inward perception [2].

A. Related Work

A key component for video stereolization is depth map esti-

mation from 2D videos. Up to the present, a number

of stu dies

have been conducted to estimate depth maps from m onocular

2D videos automatically and semi-automatically [3]–[24].

Representative m ethods for automatic de

pth map estimation

are structure from motion (SFM) [3], depth-from-defocus [4],

depth from geometric perspective [5], depth from model [6],

[7] among others. SFM ob tain ed the dept

h information based

on the tracked feature points and the cam era poses. With the

calculated camera poses, m ultip le view stereos were applied

to each frame for producing the de

nse depth maps, which is

based on th e assumption of orthographic projection to estimate

the 3D structure and camera motion [ 8]. Disparis et al. [9] and

Moustakas et al. [10] dealt wi

th dynamic scenes by segmenting

rigid objects into layers, and em ployed SFM [3] for each layer

to reconstruct 3D structures. Kn orr and Sikora [11], Rotem

[12], and Zhang et al. [13] g

enerated dense depth maps by

synthesizing one view from the other f ram e s in an input video

to achieve less computation al complexity. However, th ey were

designed to han dle stat

ic scenes, and the certain assumptions

regarding pixel correspondence and the p rojection model had

to be ma de to reconstruct scene geometry structure and camera

positions from two or

more images. In [4], wavelet transform

was u sed to measure defocus information in the image, and

then the depth values were assigned to the hig h frequency area

of the image. The d

epth values were obtained by analyzing the

high-frequency wavelet subbands of an image. The numb e r

of high va lue wavelet transform coefﬁcients was taken as a

blurring measu

re. In [5], depth maps were generated based

on the position of the lines and the vanishing points which

generally had the farthest distance. Since more than one depth

cues exist i

n most cases, [14] u tilized hybrid-depth cues such

as perspective geometry, defocus, and visual saliency. The

ﬁnal depth map is generated by fusing them together. Another

approach w

as model-based a u tomatic depth estim atio n to con-

struct one or several depth m odels for natural scenes and blend

them together [6], [7], [15]. Chen et al. [6 ] utilized the edge

inform a

tion to segment regions, and then generated depth maps

by assigning each region to a priori hypothesis of the depth gra-

dient. Yamada et al. [7] generated depth maps with three simple

model

s based on the color theory. Lin et al. [15] adopted a

depth estimation strategy based on the foreground-background

separation. They adopted a three-layer back-prop agation neural

See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

下载后可阅读完整内容，剩余9页未读，立即下载

weixin_38507208

粉丝: 5

基于RGB-D训练的视频立体化：前景分割与深度传播

基于深度学习的肝影像分割系统，含示例数据集以及代码，内部含教程

halcon深度学习视频.zip

基于多示例深度学习与损失函数优化的交通标志识别算法.pdf

立体视觉里程计仿真框架深度剖析：构建高效仿真流程

【立体视觉与图像分割】：用OpenCV实现目标提取与场景解析的方法

【图像分割算法全面解析】：从基础知识到深度学习，彻底掌握图像分割的艺术

【多媒体数据表示全解析】：音频、视频和图像格式的深度解析

全局立体匹配算法深度对比：全面透视发展趋势及挑战

OpenCV.js在安防领域的应用：视频监控图像处理技术深度解析

立体视觉与深度感知：多视图几何中的技术原理与实战挑战

最新资源