光场深度估计：基于EPI分析与LLE的新方法

162 浏览量更新于2024-08-26 收藏 1.02MB PDF 举报

"该文章是关于使用极平面图像分析和局部线性嵌入技术来估计光场深度的研究论文。在4D光场中，通过利用极平面图像（EPI）的特殊线性结构和局部线性嵌入（LLE）算法，提出了一种新的深度估计方法。这种方法可以高效地计算深度图，且对于每个处理的像素，通过找到EPI上对应线段的最佳斜率来实现本地化深度估计。" 本文的核心在于对4D光场的深度信息进行有效估计，而4D光场是捕获场景多角度视图的一种数据表示，能提供丰富的视觉信息。传统的光场深度估计方法往往计算复杂度高，而此论文提出的方法旨在解决这一问题。首先，极平面图像（EPI）是光场数据的一个二维投影，它在特定方向上展示了场景深度变化的信息。EPI的线性结构与场景深度密切相关，每一行或列对应于场景中的一个特定深度。通过分析这些线性结构，可以推断出对应的深度信息。然后，局部线性嵌入（LLE）是一种非线性降维技术，用于保留数据集的局部几何结构。在本文中，LLE被应用到EPI的线段上，用于寻找最佳斜率，这对应于场景中的真实深度。这个过程减少了计算复杂性，使得深度估计可以逐像素地快速完成。论文中，作者们构建了一个优化模型，用于最小化EPI线段和实际深度之间的差异。这个模型的优化目标是找到最接近真实场景的深度映射。通过这种方法，可以生成高质量的深度图，从而更好地理解4D光场中的三维信息。最后，尽管文章尚未经过最终编辑，但已获得IEEE Transactions on Circuits and Systems for Video Technology的接受，这意味着其研究内容和技术方法得到了同行评审的认可。引用这篇文章时，应遵循IEEE的出版和重分发规定。该研究为光场深度估计提供了一种创新且高效的解决方案，结合了EPI的几何特性与LLE的数学优势，有望在3D视觉、虚拟现实、增强现实等领域中有广泛应用。

1051-8215 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSVT.2016.2555778, IEEE

Transactions on Circuits and Systems for Video Technology

Light Field Depth Estimation via Epipolar Plane Image Analysis and

Locally Linear Embedding

Yongbing Zhang, Huijin Lv, Yebin Liu, Haoqian Wang, Xingzheng Wang, Qian Huang, Xinguang Xiang,

and Qionghai Dai



Abstract—In this paper, we propose a novel method for 4D

light field depth estimation exploiting the special linear structure

of epipolar plane image (EPI) and locally linear embedding

(LLE). Without high computational complexity, depth maps are

estimated locally by locating the optimal slope of each line

segmentation on EPIs, which are projected by corresponding

scene points. For each pixel to be processed, we build and then

minimize the matching cost that aggregates intensity pixel value,

gradient pixel value, spatial consistency as well as reliability

measure to select the optimal slope from a predefined set of

directions. Next, a sub-angle estimation method is proposed to

further refine the obtained optimal slope of each pixel.

Furthermore, based on a local reliability measure, all the pixels

are classified into reliable and unreliable pixels. For the

unreliable pixels, LLE is employed to propagate the missing

pixels by the reliable pixels based on the assumption of manifold

preserving property maintained by natural images. We

demonstrate the effectiveness of our approach on a number of

synthetic light field examples and real-world light field datasets,

and show that our experimental results can achieve higher

performance compared with the typical and recent state-of-the

art light field stereo matching methods.

Index Terms—Depth estimation, epipolar plane image (EPI),

light field, locally linear embedding (LLE).

I. INTRODUCTION

ight field (LF) is a function that describes the amount of

light flowing in every direction through every point in

space. Unlike traditional 2D images, a LF contains

information about not only the accumulated intensity at each

image point, but separated intensity value of light rays in all

directions, which allows a wide range of applications,

This work was partially supported by the the National High-tech R&D

Program of China (863 Program, 2015AA015901), the National Natural

Science Foundation of China under Grant 61571254, 61571259, 61300122,

U1301257&U1201255. This paper was recommended by Associate Editor

Peter Eisert. (corresponding author: H. Wang)

Y. Zhang, H. Lv, H. Wang, and X. Wang are with Graduate School at

Shenzhen, Tsinghua University, Shenzhen, China. (e-mail:

zhang.yongbing@sz.tsinghua.edu.cn; lvhj13@mails.tsinghua.edu.cn;

wanghaoqian@tsinghua.edu.cn; xingzheng.wang@sz.tsinghua.edu.cn).

Y. Liu and Q. Dai are with TNLIST and Department of Automation,

Tsinghua University, Beijing, China. (e-mail: liuyebin@tsinghua.edu.cn;

qhdai@ tsinghua.edu.cn).

Q. Huang is with College of Computer and Information, Hohai University,

Nanjing, China. (e-mail: huangqian@hhu.edu.cn).

X. Xiang is with School of Computer Science and Engineering, Nanjing

University of Science and Technology, Nanjing, China. (e-mail:

xgxiang@njust.edu.cn).

However, permission to use this material for any other purposes must be

obtained from the IEEE by sending an email to pubs-permissions@ieee.org.

especially in computer graphics, e.g. LF rendering, scene

reconstruction, synthetic aperture photography or 3D display.

LFs are typically produced either by rendering a 3D model

or by photographing a real scene. In either case, a large

collection of viewpoints must be obtained to produce the LF

views. Nowadays, there are many devices for capturing LFs

photographically such as camera arrays or a gantry consisting

of a single moving camera [1]. However, the camera arrays are

hardware-intensive and need a complex calibration procedure;

and the less expensive gantry consisting of a single moving

camera is limited to static scenes. Recently, plenoptic cameras

such as Lytro [2] and Raytrix [3] are becoming commercially

available, making it available to acquire a large number of LFs

for various scenes and can be applied in many specific

applications, in particular depth estimation.

The quality of depth maps has a significant influence in LF

related applications, however it is a great challenge to obtain a

dense and accurate depth map, due to its large number of views

in LF. To derive accurate and reliable depth maps, many

pioneering works for LF depth estimation have been done in

the literature. According to whether to use the epipolar plane

image (EPI, 2D slices of constant angle and spatial direction)

or not, the LF depth estimation can be simply divided into two

categories.

Depth estimation approaches employing EPI. To the best

of our knowledge, the first attempt to utilize EPI for depth

estimation was presented by Bolles et al. [4], who detect edges

in EPI and fit straight-line segments to edges afterwards to

estimate the 3D structure. However, the basic line fitting is not

robust enough and consequently the quality of reconstruction

is sparse and noisy. Another approach was proposed by

Criminisi et al. [5], who decomposed the scene into a set of

spatio-temporal layers and obtained the disparities by

exploiting the high degree of regularity in the EPI volume. To

achieve higher quality, Wanner and Goldlucke [6, 7] applied

structure tensor to yield high quality depth maps from 4D LFs.

It enables generation of depth maps with higher accuracy,

however the global optimization process is always

computational expensive, which hampers its practical usage.

Depth estimation approaches without employing EPI. Yu

et al. [8] encoded 3D line constraints and applied the

constrained Delaunay triangulation to implement the LF stereo

matching, however, this comes at a very high memory cost and

is vulnerable to severe occlusions. Chen et al. [9] introduced a

cost aggregation method based on the bilateral consistency

metric on the surface camera (SCam) [10]. However, since [9]

utilized the color of the reference pixel as the mean of the

bilateral filter, it is biased towards to the reference view, and

consequently has poorer performance when the input images

are noisy. Kim et al. [11] leveraged coherence in massive LFs

下载后可阅读完整内容，剩余8页未读，立即下载

weixin_38551837

粉丝: 4
资源: 922

光场深度估计：基于EPI分析与LLE的新方法

最新资源