统一方案：非对称立体视频超分辨率与深度估计

187 浏览量更新于2024-08-27 收藏 2.44MB PDF 举报

"该资源是一篇研究论文，探讨了非对称立体视频的超分辨率重建和深度估计的统一方法。作者团队包括Jing Zhang、Yang Cao、Zheng-Jun Zha、Zhigang Zheng、Chang Wen Chen和Zengfu Wang，文章已接受发表在IEEE Transactions on Circuits and Systems for Video Technology期刊上。" 本文主要关注的是非对称立体视频的处理技术，这是一种在现代3D视频应用中常见的形式。非对称立体视频指的是左右视角图像不对称的情况，这可能是由于摄像头的位置、视角或分辨率差异造成的。在传统的立体视频处理中，深度信息通常被视为已知的，但这增加了数据采集的复杂性。在论文中，作者提出了一种新颖的统一方案，旨在同时解决非对称立体视频的超分辨率重建和深度估计问题。超分辨率重建是指将低分辨率图像提升到更高分辨率的过程，这对于提高视频质量、减少压缩失真和增强细节至关重要。而深度估计则是确定场景中每个像素距离相机的远近，它是立体视觉和3D重建的基础。论文的方法创新之处在于它无需预先知道深度信息，而是通过算法来估计这些信息。这种联合优化的方法可以减少计算复杂性，并且可能提高结果的准确性和稳定性。通过将这两个任务整合在一个框架内，模型可以利用它们之间的内在联系，例如，深度信息可以帮助改善超分辨率的结果，反之亦然。论文中可能包含了深度学习模型的设计、训练策略、损失函数的选择以及实验验证等内容。作者可能通过对比实验展示了他们的方法与现有技术相比在超分辨率和深度估计方面的优势，比如在恢复细节、减少伪影、提高信噪比等方面的表现。此外，论文可能还讨论了实际应用中的挑战，如实时处理速度、内存需求和计算效率，并可能提出了针对性的解决方案。最后，可能还分析了未来的研究方向，包括更复杂的场景处理、多视图扩展以及与其他3D视频技术的融合。这篇论文为非对称立体视频的处理提供了一个重要的理论和实践贡献，对于3D视频处理和计算机视觉领域的研究者和技术开发者来说，具有很高的参考价值。

http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI

10.1109/TCSVT.2014.2367356, IEEE Transactions on Circuits and Systems for Video Technology

resolution of the right view. This method belongs to the

category of reconstruction-based methods, which set up an

energy function and then minimize it to obtain the optimal

solution. The energy function usually consists of a data term

and some other constraint terms.

Here, we start constructing the energy function by the

following data term. Mathematically,

data



SKI

− I

(low)



, (1)

where S is a down-sampling operator and K is a blurring

operator, I

is a variable referring to the expected full-

resolution right view of the N

frame and I

(low) is the

initial low-resolution input of the N

frame. k·k

denotes

the Euclidean norm. This is a common term used in the

reconstruction-based methods [23], [24]. It indeed enforces a

constraint on the expected full-resolution image I

so that it

is consistent with the low-resolution input after the blurring

and down-sampling process.

In addition, the high-frequency information of the left full-

resolution view can be used to enhance the resolution of right

view, since they share many scene points. Therefore, once the

correspondence between the left and right views is obtained,

we can add a mapping term to the energy function. Similar

disparity based pixel mapping strategy is also applied in the

methods in [10] and [17]. The explicit form of this term can

be denoted as:

map

(m,n)∈Λ



(m, n) − I



m, n + D

′

(m, n)





(2)

where Λ is the pixel index set of the image grid, I

is the

frame of left full-resolution view, D

′

denotes the stereo

correspondence of the N

frame (depth map

of I

relative to

). It can be obtained from the corresponding left view depth

map D

of I

relative to I

, and we use linear interpolating

to deal with the non-integer case. c

is a binary conﬁdence

value about D

′

(m, n). It is necessary since the depth map

may be not accurate, especially in the occlusion regions

and non-overlapping regions of two views. In this paper, we

determine c

by measuring the similarity (mean square error,

MSE) between the local patch centered at I

(m, n) and the

local patch centered at I



m, n + D

′

(m, n)



Besides the above observation about the point-to-point map-

ping between two views, there is another useful observation

about natural image, nonlocal prior [25]. This nonlocal prior is

based on such an observation that the image content is likely

to repeat itself within some neighborhood. This self-similarity

of natural image is beneﬁcial for solving super-resolution

problem, because it means that we can exploit the redundant

information hidden in the full-resolution view. Leveraging the

nonlocal prior, we enforce an additional nonlocal constraint

between the left view and right view under the guidance of

stereo correspondences. The explicit form of this nonlocal

regularization term is:

nonlocal

(m,n)∈Λ

(p,q)∈Ω

(

m,n+D

′

(m,n)

)

mn,pq



T I

(m, n) − T I

(p, q)



. (3)

Here Ω

(i, j) denotes the nonlocal neighborhood at position

(i, j), whose size is (2 × nr + 1) × (2 × nr + 1). T is a

Depth and disparity are two interdependent terms in stereo vision. We use

them interchangeably whenever appropriate.

vectorization patch extraction operator, and T I

(m, n) is the

vectorization representation of a patch centered at (m, n) on

image I

, and w

mn,pq

is the nonlocal weight calculated by

measuring the similarity (mean square error, MSE) between

patch T I

(m, n) and T I

(p, q) [25], [26].

As can be seen, minimizing the above energy function relies

on the calculation of stereo correspondence. We know that

the result of stereo matching algorithm depends on the co-

occurrence of distinct details in both views. The more distinct

details these two views share, the more reliable the matching

result is. However, since we only have the mixed-resolution

videos as inputs, the result of directly matching the left view

with the interpolated right view may not meet the expectation.

We need to restore the details of right view for obtaining a

reliable depth map. Therefore, we combine the calculation of

stereo correspondence and the super-resolution together, and

propose a uniﬁed function as follows:

= E

data

+ λ

map

+ λ

nonlocal

+λ

depth

, (4)

where λ

, λ

, and λ

are regularization parameters, E

depth

is the depth energy function. We will give its explicit form in

the following part.

C. Depth energy function

In [19], we proposed a region-based stereo matching algo-

rithm using cooperative optimization. This method can achieve

high-quality depth map with relatively high efﬁciency. In this

paper, we extend this method to the stereoscopic video case.

By using the temporal consistency of depth information in

stereoscopic video, this extension method can obtain tempo-

rally consistent depth maps. In the following part, we brieﬂy

describe the idea about region based stereo matching algorithm

using cooperative optimization. We recommend referring [19]

for the detailed description.

Supposing that R

,. . ., R

are regions obtained by the

Mean-shift segmentation algorithm [27], we deﬁne a total

energy function, which can be decomposed into the sum of

several subtarget energy functions. Mathematically,

depth

i∈Λ

seg

, (5)

where Λ

seg

is the index set of regions, E

is the energy

function of the i

region R

Next, we give the explicit form of every subtarget E

. Here,

we mainly concentrate on four aspects: data energy, occlusion

energy, smoothness energy, and temporal consistency energy.

Mathematically, we deﬁne the energy function of the i

region

as follows:

= E

data

+ E

occlusion

+ E

smooth

consistency

. (6)

The ﬁrst term is the data term. It evaluates the validity of the

depth at the position (m, n) in region R

by calculating the

color difference between two corresponding pixels. Its explicit

form is:

data

(m,n)∈V

,(p,q)∈V



(m, n) − I

(p, q)



∞

, (7)

where k·k

∞

denotes the maximum norm or inﬁnity norm, V

and V

denote the visible pixel sets [19], [28] on the current

region of the left and right images, respectively.

剩余13页未读，继续阅读

等你下课⊙▽⊙

粉丝: 291

统一方案：非对称立体视频超分辨率与深度估计

matlab开发-对称立体图像的双正则化基尺寸分辨率增强

非对称立体图像的双正则化图像分辨率增强：非对称立体图像的双正则化图像分辨率增强-matlab开发

matlabpam代码-iPASSR:[CVPRW2021]用于立体图像超分辨率的对称视差注意

ACNet:用于图像超分辨率的非对称CNN（有关Systmes，Man和Cyber​​netics的IEEE Transactions

非对称立体图像分辨率增强的matlab实现

ACNet: 非对称CNN在图像超分辨率中的应用

解决分辨率非对称立体匹配的特征度量一致性及性能优化策略

立体图像超分辨率：iPASSR模型的创新与PyTorch实现

3D视频新速率控制技术：非对称质量立体视频

声场重构深度学习方法：MATLAB代码与超分辨率

最新资源

ACNet:用于图像超分辨率的非对称CNN（有关Systmes，Man和Cybernetics的IEEE Transactions