视频序列深度图恢复：去噪与时空一致性

1星需积分: 20 169 浏览量更新于2024-07-19 4 收藏 4.04MB PDF 举报

本文主要探讨的是"获取视频序列的深度图"这一主题，针对如何从连续的视频帧中恢复一致的深度图提出了一种创新方法。作者Guofeng Zhang、Jiaya Jia、Tien-Tsin Wong和Hujun Bao作为IEEE学生会员和成员，共同提出了一个解决立体重建难题的新框架，这些难题包括图像噪声处理、遮挡和异常值的处理。传统的多视图立体方法在处理视频深度图时存在局限性，它们主要依赖于照片一致性约束，但可能无法有效处理复杂场景中的噪声和不一致性。为了克服这些问题，本文的方法不仅应用了照片一致性约束，还通过统计方式明确了多个帧之间的几何连贯性。这样做的好处是可以自然地保持深度图在时间维度上的连贯性，避免过度平滑，保留细节。为了实现高效推理，作者们设计了一个迭代优化策略。首先，他们利用分割先验来初始化偏移（disparity maps），然后通过束优化（bundle adjustment）进一步精细化这些偏移。与传统的定义可见性参数的方式不同，他们的方法隐式地建模了重建噪声以及概率性的可见性假设。这样，即使在复杂的视觉环境中，也能得到更准确和稳定的深度估计。在整个过程中，该方法强调了深度图的全局一致性与局部精细度的平衡，通过深度图序列的前后帧间的协同工作，实现了高质量的深度重建。这种方法对于计算机视觉、3D重建和实时视频分析等领域具有重要的实用价值，能够提升视频深度图的准确性，为后续的三维重建、运动分析等任务提供更可靠的基础数据。

However, these methods typically need a good starting point

(e.g., a visual hull model [25]).

2.5 Recovering Consistent View-Dependent

Depth Maps

Instead of reconstructing a complete 3D model, we focus on

recovering a set of consistent view-dependent depth maps

from a video sequence in this paper. It is mainly motivated

by applications such as view interpolation, depth-based

segmentation, and video enhancement. Our work is closely

related to that of [19], [15], which also aims to infer

consistent depth maps from multiple images. Kang and

Szeliski [19] proposed simultaneously optimizing a set of

depth maps at multiple key frames by adding a temporal

smoothness term. This method makes the disparities across

frames vary smoothly. However, it is sensitive to outliers

and may c ause the blending artifacts around object

boundaries. Gargallo and Sturm [15] formulated

3D modeling from images as a Bayesian MAP problem,

and solved it using the expectation-maximization (EM)

algorithm. They use the estimated depth map to determine

the visibility prior. Hidden variables are computed in a

probabilistic way to deal with occlusions and outliers. A

multiple-depth-map prior is finally used to smooth and

merge the depths while preserving d iscontinuities. In

comparison, our method statistically incorporates the

photo-consistency and geometric coherence constraints in

the data term definition. This scheme is especially effective

for processing video data because it can effectively suppress

temporal outliers by making use of the statistical informa-

tion available from multiple frames. Moreover, we use

efficient loopy belief propagation [10] to perform the overall

opti mization. By combi ning the photo-c onsistency and

geometric coherence constraints, the distribution of our

data cost becomes distinctive, making the BP optimization

stable and converge quickly.

The temporal coherence constraints were also used in

optical flow estimation [1] and occlusion detection [30], [37].

Larsen et al. [24] presented an approach for 3D reconstruc-

tion from multiple synchronized video streams. In order to

improve the final reconstruction quality, they used optical

flow to find corresponding pixels in the subsequent frames

of the same camera, and enforced the temporal consistency

in reconstructing successive frames. With the observation

that the depth error in conventional stereo methods grows

quadratically with depth, Gallup et al. [14] proposed a

multibaseline and multiresolution stereo method to achieve

constant depth accuracy by varying the baseline and

resolution proportionally to depth.

In summary, although many approaches have been

proposed to model 3D objects or to estimate depths using

multiple input images, the problem of how to appropriately

extract information and recover consistent depths from a

video remains challenging. In this paper, we show that by

appropriately maintaining the temporal coherence, surpris-

ingly consistent and accurate dense depth maps can be

obtained from the video sequences. The recovered depth

maps have high quality and are readily usable in many

applications such as 3D modeling, view interpolation, layer

separation, and video enhancement.

3FRAMEWORK OVERVIEW

Given a video sequence

I with n frames taken by a freely

moving camera, we denote

I ¼fI

j t ¼ 1; ...;ng, where

ðxÞ represents the color (or intensity) of pixel x in frame t.

It is either a 3-vector in a color image or a scalar in a

grayscale image. In our experiments, we assume it is an

RGB color vector. Our objective is to estimate a set of

disparity maps

D ¼fD

j t ¼ 1; ...;ng.Byconvention,

disparity D

ðxÞ (d

for short) is defined as d

¼ 1=z

, where

is the depth value of pixel x in frame t. For simplicity, the

terms “depth” and “disparity” are used interchangeably in

the following sections.

The set of camera parameters for frame t in a video

sequence is denoted as C

¼fK

; R

; T

g, where K

is the

intrinsic matrix, R

is the rotation matrix, and T

is the

translation vector. The parameters for all frames can be

estimated reliably by the structure from motion (SFM)

techniques [17], [29], [50]. Our system employs the SFM

method of Zhang et al. [50].

In order to robustly estimate a set of disparity maps, we

define the following energy in a video:

Eð

IÞ¼

t¼1

ðE

ðD

;

DnD

ÞþE

ðD

ÞÞ; ð1Þ

where the data term E

measures how well disparity

fits the given sequence

I and the smoothness term E

encodes the disparity smoothness. For each pixel in

disparity map D

, because it maps to one point in 3D,

there should exist corresponding pixels in other nearby

frames. These pixels not only satisfy the photo-consis-

tency constraint, but also have their geometric informa-

tion consistent. We thus propose a bundle optimization

framework to model the explicit correlation among the

pixels and use the collected statistics to optimize the

disparities jointly.

Fig. 2 gives an overview of our framework. With an

input video sequence, we first employ the SFM method to

recover the camera parameters. Then, we initialize the

disparity map for each frame independently. Segmentation

prior is incorporated into initialization for improving the

disparity estimation in large textu reless regions. After

initialization, we perform bundle optimization to iteratively

976 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 31, NO. 6, JUNE 2009

Fig. 2. Overview of our method.

Authorized licensed use limited to: Zhejiang University. Downloaded on April 25, 2009 at 11:39 from IEEE Xplore. Restrictions apply.

剩余14页未读，继续阅读

清楼小刘

粉丝: 178
资源: 5

视频序列深度图恢复：去噪与时空一致性

显示深度图像

matlab代码，可将三角网格网格转换成深度图，深度值是由三维点的z值计算而来

matlab图像背景提取代码-DeepPBM:视频序列的深度概率背景模型估计（DLPR20）

基于运动矢量提取二维视频序列深度信息算法的研究

视频序列中图像引导的标签图传播

虚拟视点测试视频序列楼梯

基于双目视觉绿色作物视频流的深度图FPGA实现-综合文档

单目标定+双目标定+深度图获取+三维坐标计算

序列图像图像处理使用

视频序列中运动目标检测与跟踪方法的研究

最新资源