图像质量评估：从错误可见性到结构相似性(SSIM)

需积分: 1 5 浏览量更新于2024-06-14 收藏 1.63MB PDF 举报

"ImageQualityAssessment:FromErrorVisibilitytoStructuralSimilarity" 是一篇关于图像质量评估的重要论文，最早提出了Structural Similarity Index (SSIM)的概念。该论文由Zhou Wang、Alan Bovik、Eero P. Simoncelli等人共同撰写，并发表在2004年的IEEE Transactions on Image Processing期刊上，具有广泛的影响力，被引用次数超过19,300次。 SSIM（结构相似性指数）是一种衡量图像质量的方法，它基于人类视觉系统的感知特性，旨在更准确地模拟人眼对图像失真敏感度的评估。传统的图像质量评估方法通常关注的是错误的可视性，即图像中的像素误差是否可见。然而，SSIM引入了更高级别的结构信息比较，这使得评估更加符合人类对图像质量的实际感知。论文中，作者指出，图像的质量不仅仅是关于像素级别的差异，更重要的是图像内容的结构保真度。SSIM通过计算两幅图像在亮度、对比度和结构三个方面的相似性来评估质量。它采用了统计方法，包括均值、方差和互相关，来衡量这些特征，从而得出一个介于-1和1之间的数值，1表示完全相同，负值表示反向，0表示无相似性。 SSIM的计算公式包括亮度相似度(L)，对比度相似度(C)和结构相似度(S)三部分，它们分别通过计算两幅图像对应像素点的平均值和方差的比值以及互相关系数来得到。最终的SSIM值是这三个分量的乘积，考虑了它们的加权和，形式如下： \[ SSIM(x,y) = l(x,y)^\alpha \cdot c(x,y)^\beta \cdot s(x,y)^\gamma \] 其中，\( l(x,y) \)，\( c(x,y) \)，和 \( s(x,y) \) 分别代表亮度、对比度和结构的相似度，而 \( \alpha \)，\( \beta \)，和 \( \gamma \) 是加权参数，确保各部分的贡献均衡。该论文的发布对图像处理和压缩领域的研究产生了深远影响，SSIM成为衡量图像质量和评估压缩算法性能的标准之一。此外，该工作还启发了后续的视觉质量评估模型的发展，如Multi-Scale Structural Similarity (MS-SSIM)和Visual Information Fidelity (VIF)等，它们进一步扩展了SSIM的理论框架，以适应更复杂的图像处理场景。在实际应用中，SSIM广泛应用于图像编码、视频编码、图像修复、图像增强等领域，帮助研究人员和工程师优化算法，以提高图像和视频的视觉质量，同时减少数据传输或存储的需求。尽管存在一些批评，认为SSIM可能忽视了某些视觉特性，但它仍然是一个非常有效的、与人类视觉感知相关的图像质量评估工具。

2 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 13, NO. 4, APRIL 2004

Reference

signal

Distorted

signal

Quality/

Distortion

Measure

Channel

Decomposition

Error

Normalization

.

Error

Pooling

Pre-

processing

CSF

Filtering

.

Fig. 1. A prototypical quality assessment system based on error sensitivity. Note that the CSF feature can be implemented either as a

separate stage (as shown) or within “Error Normalization”.

II. Image Quality Assessment Based on Error

Sensitivity

An image signal whose quality is being evaluated can

be thought of as a sum of an undistorted reference signal

and an error signal. A widely adopted assumption is that

the loss of perceptual quality is directly related to the vis-

ibility of the error signal. The simplest implementation

of this concept is the MSE, which objectively quantiﬁes

the strength of the error signal. But two distorted images

with the same MSE may have very diﬀerent types of errors,

some of which are much more visible than others. Most

perceptual image quality assessment approaches prop osed

in the literature attempt to weight diﬀerent aspects of the

error signal according to their visibility, as determined by

psychophysical measurements in humans or physiological

measurements in animals. This approach was pioneered

by Mannos and Sakrison [10], and has been extended by

many other researchers over the years. Reviews on image

and video quality assessment algorithms can be found in

[4], [11]–[13].

A. Framework

Fig. 1 illustrates a generic image quality assessment

framework based on error sensitivity. Most perceptual

quality assessment models can be described with a simi-

lar diagram, although they diﬀer in detail. The stages of

the diagram are as follows:

Pre-processing. This stage typically performs a variety

of basic operations to eliminate known distortions from the

images being compared. First, the distorted and reference

signals are properly scaled and aligned. Second, the signal

might be transformed into a color space (e.g., [14]) that is

more appropriate for the HVS. Third, quality assessment

metrics may need to convert the digital pixel values stored

in the computer memory into luminance values of pixels on

the display device through pointwise nonlinear transforma-

tions. Fourth, a low-pass ﬁlter simulating the point spread

function of the eye optics may be applied. Finally, the ref-

erence and the distorted images may be modiﬁed using a

nonlinear point operation to simulate light adaptation.

CSF Filtering. The contrast sensitivity function (CSF)

describes the sensitivity of the HVS to diﬀerent spatial and

temporal frequencies that are present in the visual stim-

ulus. Some image quality metrics include a stage that

weights the signal according to this function (typically im-

plemented using a linear ﬁlter that approximates the fre-

quency response of the CSF). However, many recent met-

rics choose to implement CSF as a base-sensitivity normal-

ization factor after channel decomp osition.

Channel Decomposition. The images are typically sep-

arated into subbands (commonly called “channels” in the

psychophysics literature) that are selective for spatial and

temporal frequency as well as orientation. While some

quality assessment methods implement sophisticated chan-

nel decompositions that are believed to be closely re-

lated to the neural responses in the primary visual cortex

[2], [15]–[19], many metrics use simpler transforms such as

the discrete cosine transform (DCT) [20], [21] or separa-

ble wavelet transforms [22]–[24]. Channel decompositions

tuned to various temporal frequencies have also been re-

ported for video quality assessment [5], [25].

Error Normalization. The error (diﬀerence) between the

decomposed reference and distorted signals in each channel

is calculated and normalized according to a certain masking

model, which takes into account the fact that the presence

of one image component will decrease the visibility of an-

other image component that is proximate in spatial or tem-

poral location, spatial frequency, or orientation. The nor-

malization mechanism weights the error signal in a channel

by a space-varying visibility threshold [26]. The visibility

threshold at each point is calculated based on the energy

of the reference and/or distorted coeﬃcients in a neighbor-

hood (which may include coeﬃcients from within a spatial

neighborhood of the same channel as well as other chan-

nels) and the base-sensitivity for that channel. The normal-

ization process is intended to convert the error into units of

just noticeable diﬀerence (JND). Some methods also con-

sider the eﬀect of contrast resp onse saturation (e.g., [2]).

Error Pooling. The ﬁnal stage of all quality metrics must

combine the normalized error signals over the spatial extent

of the image, and across the diﬀerent channels, into a single

value. For most quality assessment methods, pooling takes

the form of a Minkowski norm:

E ({e

l,k

}) =

l,k

1/β

(1)

where e

l,k

is the normalized error of the k-th coeﬃcient in

the l-th channel, and β is a constant exponent typically

chosen to lie between 1 and 4. Minkowski pooling may be

performed over space (index k) and then over frequency

(index l ), or vice-versa, with some non-linearity between

them, or possibly with diﬀerent exponents β. A spatial

剩余14页未读，继续阅读

Jacen.L

粉丝: 228
资源: 5

图像质量评估：从错误可见性到结构相似性(SSIM)

图像相似度检测

基于人眼视觉的结构相似度图像质量评价算法原理

java项目，课程设计-ssm病人跟踪治疗信息管理系统

liunx project 5

PostgreSQL DBA实战视频教程（完整10门课程合集）

计算机科学基础期末考试试题

c语言实验设备管理系统

提高图像在低光照条件下的清晰度和可见性，使用CNN的图像重建网络，来实现亮度调节，可用于小白学习

双哥微服务.md

fb000f5e-12c5-a46b-102a-f08bdfa015f1.json

最新资源