深度插拔式超分辨率处理任意模糊核的LR图像

37 浏览量更新于2023-10-20 收藏 12.65MB PDF 举报

身份认证购VIP最低享 7 折!

30元优惠券

Kai Zhang1,2, Wangmeng Zuo1,3,∗, Lei Zhang2,4cskaizhang@gmail.com, wmzuo@hit.edu.cn, cslzhang@comp.polyu.edu.hkhttps://github.com/cszn/DPSR16710任意模糊核的深度插拔式超分辨率01 哈尔滨工业大学计算机科学与技术学院，中国哈尔滨 2 香港理工大学计算机系，中国香港3 深圳鹏城实验室，中国深圳 4 阿里巴巴达摩院0摘要0尽管基于深度神经网络（DNN）的单图像超分辨率（SISR）方法正在迅速流行，但它们主要是针对广泛使用的双三次退化设计的，而且它们仍然面临着使用任意模糊核对低分辨率（LR）图像进行超分辨的基本挑战。与此同时，由于其模块化结构可以轻松插入去噪先验，插拔式图像恢复已被认为具有高度灵活性。在本文中，我们提出了一个基于插拔式框架的原则性公式和框架，通过扩展基于双三次退化的深度SISR，以处理具有任意模糊核的LR图像。具体而言，我们设计了一个新的SISR退化模型，以利用现有的盲去模糊方法进行模糊核估计。为了优化新的退化引起的能量函数，我们通过变量分裂技术推导出了一个插拔式算法，该算法允许我们插入任何超分辨率先验，而不仅仅是去噪先验作为模块的一部分。对合成和真实的LR图像进行定量和定性评估表明，所提出的深度插拔式超分辨率框架灵活且有效地处理模糊的LR图像。01. 引言0单图像超分辨率（SISR）的目标是估计给定低分辨率（LR）图像y的干净高分辨率（HR）对应图像x，这是一个具有高学术和实用价值的经典问题[3]。基本上，LR和HR图像之间的关系由退化模型来描述，该模型定义了LR图像是如何从HR图像退化的。经验和理论研究已经证明，准确的退化模型对于SISR的成功至关重要[20,60]。因此，首先回顾SISR的退化模型是很重要的。0� 通讯作者。0事实上，大多数现有的SISR方法都是在某种特定退化模型的假设下设计的。有两种广泛使用的退化模型。第一种被公认为SISR的通用退化模型，其表示为y = (x � k) ↓ s + n，(1)0其中x � k表示模糊核k与HR图像x之间的卷积，↓s是一个后续的下采样操作，缩放因子为s，n是加性白高斯噪声（AWGN），噪声水平为σ。这种退化模型已经广泛研究了基于模型优化方法的开发[13,19]。然而，这些方法大多数假设先验已知的模糊核，在实践中很难估计。尽管有一些工作，如[41,53]专注于估计模糊核，但他们的源代码不幸地没有公开。第二种也是最广泛使用的一种，我们称之为双三次退化，其形式化为0y = x ↓ s . (2)0在这里，除非另有说明，↓s代表双三次下采样器（Matlab默认函数imresize），缩放因子为s。由于其简单性，双三次退化模型已成为评估SISR方法的基准设置[29,36]。特别是，它极大地便利了强大的深度神经网络（DNN）用于SISR的开发[2]。然而，这种简单的退化模型在许多实际场景中不可避免地导致结果较差[20,65]。然而，很少有工作在扩展到更现实的退化模型上进行了研究。鉴于上述考虑，解决以下两个问题非常必要：1）设计一种替代的退化模型；2）将现有的基于DNN的双三次退化方法扩展到新的退化模型，以利用DNN的能力。为此，我们首先提出了一个简单而有效的退化模型，它假设LR图像是HR图像的双三次下采样、模糊和带噪版本。与2) Beyond bicubic degradation.In [20], the authorspointed out that an accurate estimate of the blur kernel ismore important than sophisticated image prior. Since then,several attempts have been made to tackle with LR imagesbeyond bicubic degradation. Zhang et al. [63] proposeda plug-and-play framework (IRCNN) to solve the energyfunction induced by Eqn. (1). Although in theory IRCNNcan handle arbitrary blur kernel (please refer to [13]), theblur kernel of such degradation model in practice is difﬁcultto estimate. Zhang et al. [65] proposed a general DNN-based solution (SRMD) which takes two key degradationparameters as input. However, SRMD only considers theGaussian blur kernels. In [51], the authors proposed a zero-shot super-resolution (ZSSR) method which trains image-speciﬁc DNN on the testing LR image and can also take thedegradation parameters, such as the estimated blur kernel,to improve the performance. While showing impressive re-sults for LR image with internal repetitive structures, ZSSRis less effective for severely blurred LR image.As discussed, the above methods have two main draw-backs. First, they have difﬁculty in blur kernel estimation.Second, they are generally designed for Gaussian-like blurkernel and thus cannot effectively handle severely blurredLR image. It should be noted that a deep blind SISR methodfor motion blur is proposed in [66]. However, it has limitedability to handle the distortion of arbitrary blur kernels.16720给定公式（1）给出的一般降级模型，我们提出的模型具有两个优点。首先，它推广了经过深入研究的双三次降级模型，该模型由公式（2）给出。其次，它使我们能够采用现有的盲去模糊方法来估计给定LR图像的模糊核。为了将基于DNN的SISR方法扩展到新的降级模型，我们提出了一个基于变量分割的迭代优化方案的基于原则的深度即插即用超分辨率（DPSR）框架。结果表明，模糊失真可以在傅里叶域中有效处理。因此，可以处理任意模糊核，这是本文的主要目标之一。此外，与现有的即插即用框架（[13]）不同，后者通常将现成的高斯去噪器作为模块化部分，我们提出的方法通过对任何现有的基于DNN的超分辨率方法进行小的修改来实现插件步骤。到目前为止，值得强调的是，我们主要关注的是针对任意均匀模糊核的非盲SISR，而不是针对任意非均匀模糊核的盲SISR。一方面，非盲SISR对于盲SISR非常重要，后者通常涉及交替更新模糊核和应用非盲SISR来更新超分辨率图像。虽然一些最近的工作尝试训练DNN直接估计盲去模糊的干净图像，但它们的实用性还需要进一步评估。另一方面，尽管非均匀模糊核倾向于是一个更现实的假设，但它过于复杂，仍然是图像去模糊的一个困难问题[31]。事实上，任意均匀模糊核的假设已经比简单的双三次核对于实际应用来说是一个更好的选择。简而言之，我们的工作是从现有的基于双三次降级的SISR向最终的盲SISR迈出的一个有价值的中间步骤。本文的贡献总结如下：0•提出了比双三次降级模型更现实的SISR降级模型。它考虑了任意模糊核，并且能够使用现有的去模糊方法进行模糊核估计。0•提出了一种深度即插即用超分辨率框架来解决具有新降级模型的SISR问题。DPSR适用于超过双三次降级的情况，并且可以处理具有任意模糊核的LR图像。0•提出的DPSR是基于原则的，因为迭代方案旨在解决由新降级引起的能量函数。0•提出的DPSR扩展了现有的即插即用框架，表明SISR的即插即用先验不仅限于高斯去噪器。02. 相关工作02.1. 基于DNN的SISR01）双三次降级。第一个基于DNN的SISR方法，称为SRCNN[17]，采用相对较浅的网络，并遵循以前的SISR方法，如A+ [55]和ScSR[61]，用双三次插值合成LR图像。从那时起，通过将降级模型固定为双三次降级，一些研究人员开始从不同的角度通过DNN改进SISR性能，包括PSNR和SSIM值、效率和感知视觉质量。为了在PSNR和SSIM方面提高SISR性能，Kim等人提出的VDSR网络[29]表明最直接的方法是增加网络深度。然而，VDSR在双三次插值的LR图像上操作，这阻碍了效率。为此，提出了FSRCNN [18]和ESPCN[50]，它们直接操作LR输入，并在网络末端采用上采样操作。考虑到在大尺度因子（例如4）下视觉结果往往过于平滑，[34, 49, 58]中使用了VGG[52]损失和生成对抗网络（GAN）[24]损失来改善感知视觉质量。虽然在双三次降级上取得了巨大成功[36, 44,68]，但由于降级模型不匹配，这些方法在大多数真实图像上表现不佳。2.2. Plug-and-play image restorationThe plug-and-play image restoration which was ﬁrst in-troduced in [15, 57, 69] has attracted signiﬁcant attention-s due to its ﬂexibility and effectiveness in handling vari-ous inverse problems. Its main idea is to unroll the energyfunction by variable splitting technique and replace the pri-or associated subproblem by any off-the-shelf Gaussian de-noiser. Different from traditional image restoration method-s which employ hand-crafted image priors, it can implicitlydeﬁne the plug-and-play prior by the denoiser. Remarkably,the denoiser can be learned by DNN with large capabilitywhich would give rise to promising performance.During the past few years, a ﬂurry of plug-and-playworks have been developed from the following aspect-s: 1) different variable splitting algorithms, such as half-quadratic splitting (HQS) algorithm [1], alternating di-rection method of multipliers (ADMM) algorithm [8],FISTA [4], and primal-dual algorithm [11, 42]; 2) differentapplications, such as Poisson denoising [47], demosaick-ing [26], deblurring [56], super-resolution [9, 13, 28, 63],and inpainting [40]; 3) different types of denoiser priors,such as BM3D [14, 21], DNN-based denoisers [6, 62] andtheir combinations [25]; and 4) theoretical analysis on theconvergence from the aspect of ﬁxed point [13, 37, 38] andNash equilibrium [10, 16, 45].To the best of our knowledge, existing plug-and-play im-age restoration methods mostly treat the Gaussian denoiseras the prior. We will show that, for the application of plug-and-play SISR, the prior is not limited to Gaussian denois-er. Instead, a simple super-resolver prior can be employedto solve a much more complex SISR problem.3. Method3.1. New degradation modelIn order to ease the blur kernel estimation, we proposethe following degradation modely = (x↓s) ⊗ k + n,(3)where ↓s is the bicubic downsampler with scale factor s.Simply speaking, Eqn. (3) conveys that the LR image y isa bicubicly downsampled, blurred and noisy version of aclean HR image x.Since existing methods widely use bicubic downsamplerto synthesize or augment LR image, it is a reasonable as-sumption that bicubicly downsampled HR image (i.e., x↓s)is also a clean image. Following this assumption, Eqn. (3)actually corresponds to a deblurring problem followed by aSISR problem with bicubic degradation. Thus, we can ful-ly employ existing well-studied deblurring methods to es-timate k. Clearly, this is a distinctive advantage over thedegradation model given by Eqn. (1).,(9)16730一旦定义了退化模型，下一步就是构建能量函数。根据最大后验概率（MAP）的原理，能量函数形式上给出如下0min x02σ^2∥y - (x↓s) � k∥^2 + λΦ(x), (4)02σ^2∥y - (x↓s) �k∥^2是由方程（3）的退化模型决定的数据拟合（似然）项，Φ(x)是正则化（先验）项，λ是正则化参数。对于判别学习方法，它们的推理模型实际上对应于一个能量函数，其中的退化模型由训练的LR和HR对隐式定义。这解释了为什么基于双三次退化训练的现有基于DNN的SISR方法在真实图像上表现不佳。03.2. 深度插拔式SISR0为了解决方程（4），我们首先采用变量分离技术引入一个辅助变量z，得到以下等价的约束优化形式：0ˆx = arg min 02σ^2∥y - z � k∥^2 + λΦ(x),0subject to z = x↓s. (5)0然后，我们使用半二次分裂（HQS）算法来解决方程（5）。注意，其他算法如ADMM也可以被利用。我们选择HQS是因为它简单。通常，HQS通过最小化以下问题来处理方程（5），该问题涉及一个额外的二次惩罚项0Lµ(x, z) = 102σ^2∥y - z � k∥^2 + λΦ(+ µ02∥z - x↓s∥^2,(6)0其中µ是惩罚参数，很大的µ将使得z近似于x↓s。通常，在以下迭代求解方程（6）的过程中，µ以非降序变化。0xk+1 = arg min x µ^2∥zk+1 - x↓s∥^2 + λΦ(x). (8)0可以看出，方程（7）和方程（8）是关于z和x的交替最小化问题。特别地，通过假设卷积是在循环边界条件下进行的，方程（7）有一个快速的封闭形式解0zk+1 =F^(-1)(F(k)F(k)+ µσ^2)0F(k)F(y) + µσ^2F(xk↓s)0λΦ(x)02∥y - (x↓s) � k∥^2.02σ^2∥y - (x↓s) � k∥^2而不是10xk+1 = arg min x 11/µ).(12)3.3. Deep super-resolver priorIn order to take advantage of the merits of DNN, weneed to specify the super-resolver network which shouldtake the noise level as input according to Eqn. (12). Inspiredby [23, 64], we only need to modify most of the existingDNN-based super-resolvers by taking an additional noiselevel map as input. Alternatively, one can directly adopt S-RMD as the super-resolver prior because its input alreadycontains the noise level map.Since SRResNet [34] is a well-known DNN-based super-resolver, in this paper we propose a modiﬁed SRResNet,namely SRResNet+, to plug in the proposed DPSR frame-work. SRResNet+ differs from SRResNet in several aspect-s. First, SRResNet+ additionally takes a noise level map Mas input. Second, SRResNet+ increases the number of fea-ture maps from 64 to 96. Third, SRResNet+ removes thebatch normalization layer [27] as suggested in [58].Before training a separate SRResNet+ model for eachscale factor, we need to synthesize the LR image and it-s noise level map from a given HR image. According tothe degradation model given by Eqn. (11), the LR imageis bicubicly downsampled from an HR image, and then cor-rupted by AWGN with a noise level σ from predeﬁned noiselevel range. For the corresponding noise level map, it hasthe same spatial size of LR image and all the elements are σ.Following [65], we set the noise level range to [0, 50]. Forthe HR images, we choose the 800 training images fromDIV2K dataset [2].We adopt Adam algorithm [30] to optimize SRResNet+by minimizing the ℓ1 loss function. The leaning rate startsfrom 10−4, then decreases by half every 5 × 105 iterationsand ﬁnally ends once it is smaller than 10−7. The mini-batch size is set to 16. The patch size of LR input is setto 48×48. The rotation and ﬂip based data augmentationis performed during training. We train the models with Py-Torch on a single GTX 1080 Ti GPU.Since this work mainly focuses on SISR with arbitraryblur kernels. We omit the comparison between SRResNet+and other methods on bicubic degradation.As a simplecomparison, SRResNet+ can outperform SRResNet [34] byan average PSNR gain of 0.15dB on Set5 [5].3.4. Comparison with related methodsIn this section, we emphasize the fundamental differ-ences between the proposed DPSR and several closely re-lated DNN-based methods.1) Cascaded deblurring and SISR.To super-resolve L-R image with arbitrary blur kernels, a heuristic method isto perform deblurring ﬁrst and then super-resolve the de-blurred LR image.However, such a cascaded two-stepmethod suffers from the drawback that the perturbation er-ror of the ﬁrst step would be ampliﬁed at the second step.On the contrary, DPSR optimizes the energy function givenby Eqn. (4) in an iterative manner. Thus, DPSR tends todeliver better performance.2) Fine-tuned SISR model with more training data.Per-haps the most straightforward way is to ﬁne-tune existingbicubic degradation based SISR models with more train-ing data generated by the new degradation model (i.e., E-qn. (3)), resulting in the so-called blind SISR. However, theperformance of such methods deteriorates seriously espe-cially when large complex blur kernels are considered, pos-sibly because the distortion of blur would further aggravatethe pixel-wise average problem [34]. As for DPSR, it takesthe blur kernel as input and can effectively handle the dis-tortion of blur via Eqn. (9).3) Extended SRMD or DPSR with end-to-end training.Inspired by SRMD [65], one may attempt to extend it byconsidering arbitrary blur kernels. However, it is difﬁcult16740其中F(∙)和F^(-1)(∙)表示快速傅里叶变换（FFT）和逆FFT，F(∙)表示F(∙)的复共轭。为了从贝叶斯的角度分析方程（8），我们将其改写为如下形式01/(µ + 1)^2∥zk+1 - x↓s∥^+ λΦ(x).02(σ^201/(µ +1).从另一个角度看，方程（10）解决了一个具0（10）显然，方程（10）对应于通过假设zk+1是由HR图像x经过双三次下采样得到，并且受到噪声水平σ的AWGN污染的超分辨率重建zk+1。0因此，可以将基于DNN的在广泛使用的双三次退化上训练的超分辨率重建器插入到方程（10）中以替换它。为简洁起见，方程（8）和方程（10）可以进一步改写为0y = x↓s + n. (11)01/(µ + 1). (12)0xk+1 = SR(zk+1, s, λ)0由于先前项Φ(x)在SR(∙)中隐式定义，我们将其称为超分辨率先验。到目前为止，我们已经看到方程（7）和方程（8）给出的两个子问题相对容易解决。实际上，它们也有明确的解释。一方面，由于模糊核k仅涉及封闭形式解，方程（7）解决了模糊的失真问题。换句话说，它将当前估计拉向一个不那么模糊的估计。另一方面，方程（8）将不那么模糊的图像映射到更清晰的高分辨率图像。经过几次交替迭代，预计最终重建的高分辨率图像不包含模糊和噪声。to sample enough blur kernels to cover the large kernel s-pace. In addition, it would require a large amount of timeto train a reliable model. By contrast, DPSR only needs totrain the models on the bicubic degradation, thus it involvesmuch less training time. Furthermore, while SRMD can ef-fectively handle the simple Gaussian kernels of size 15×15with many successive convolutional layers, it loses effec-tiveness to deal with large complex blur kernels. Instead,DPSR adopts a more concise and specialized modular byFFT via Eqn. (9) to eliminate the distortion of blur. Alter-natively, one may take advantage of the structure beneﬁts ofDPSR and resort to jointly training DPSR in an end-to-endmanner. However, we leave this to our future work.From the above discussions, we can conclude that ourDPSR is well-principled, structurally simple, highly inter-pretable and involves less training.167504.实验04.1.合成低分辨率图像0根据大多数图像恢复文献中的常见设置，我们使用带有真实值的合成数据来定量分析提出的DPSR，并与其他竞争方法进行相对公平的比较。0模糊核。为了全面评估所提出的DPSR对任意模糊核的有效性，我们考虑了三种广泛使用的模糊核类型，包括高斯模糊核、运动模糊核和圆盘（失焦）模糊核[12，59]。模糊核的规格见表1。图1显示了一些模糊核示例。请注意，模糊核的尺寸范围从5×5到35×35。如表2所示，我们进一步考虑了两个不同噪声水平的高斯噪声，即2.55（1%）和7.65（3%），用于缩放因子3。0表1.三种不同类型的模糊核。0类型 # 规格0高斯1608个各向同性高斯核，标准偏差均匀采样于区间[0.6,2]，以及8个从[65]中选择的各向异性高斯模糊核。0运动320从[35]中选择的8个模糊核及其通过随机旋转和翻转增强的8个模糊核；以及由[7]的发布代码生成的16个逼真的运动模糊核。0圆盘808个圆盘核，半径均匀采样于区间[1.8,6]。它们是由matlab函数fspecial('disk',r)生成的，其中r是半径。0（a）高斯0（b）运动0（c）圆盘0图1.（a）高斯模糊核示例，（b）运动模糊核示例和（c）圆盘模糊核示例。0参数设置。在方程（7）和方程（8）的交替迭代中，我们需要设置λ并调整µ以获得令人满意的性能。设置这些参数被认为是一项非平凡的任务[46]。然而，DPSR的参数设置通常很容易，遵循以下两个原则。首先，由于λ是固定的，并且可以被吸收到σ中，我们可以通过将σ乘以一个标量√来代替0λ，因此忽略方程（8）中的λ。其次，由于µ在迭代过程中具有非降序的特性，我们可以设置µ为1/µ49到一个小的σ相关值（例如max(2.55，σ)）进行总共15次迭代。0从方程（12）的非升序开始间接确定每次迭代中的µ。根据经验，一个好的经验法则是将λ设置为1/3，并指数级地减小µ。0从49到一个小的σ相关值（例如max(2.55，σ)）进行总共15次迭代。0比较方法。我们将提出的DPSR与六种方法进行比较，包括两种代表性的基于DNN的方法用于双三次降级（即VDSR[29]和R-CAN[67]），两种级联去模糊和SISR方法（即IRCNN+RCAN和DeblurGAN+RCAN），以及两种专门设计的用于模糊低分辨率图像的方法（即GFN [66]和ZSSR[51]）。具体而言，VDSR是第一个用于SISR的非常深的网络；RCAN由400多层组成，并实现了双三次降级的最先进性能；IRCNN是一种插入式方法，具有深度去噪器先验，可以处理非盲目图像去模糊；DeblurGAN[32]是一种基于生成对抗网络（GAN）[24]的深度盲目去模糊方法；GFN是一种用于联合盲目运动去模糊和超分辨率的基于DNN的方法；ZSSR是一种无监督的基于DNN的方法，可以超分辨模糊和噪声低分辨率图像。请注意，IRCNN、ZSSR和DPSR可以将模糊核和噪声水平作为输入。为了公平比较，我们将ZSSR修改为我们的新降级模型。0定量结果。在颜色BSD68数据集[39，48，62]上，不同降级设置下不同方法的PSNR和SSIM结果如表2所示，我们有几个观察结果。首先，尽管RCAN在双三次降级（参见[67]）方面的性能远远优于VDSR，但在复杂的降级设置下，它的性能与VDSR甚至双三次插值相当。这种现象也在[51，65]中报道过。其次，在经过IRCNN的去模糊步骤之后，IRCNN+RCAN可以显著提高PSNR和SSIM值。第三，DeblurGAN+RCAN和GFN导致性能较差，这可能归因于连续卷积层在处理大型复杂模糊的失真能力有限。第四，由于模糊低分辨率图像缺乏循环属性，ZSSR对于大型复杂模糊核的效果较差。最后，我们的DPSR实现了最佳性能，因为它直接优化给定降级的能量函数，并可以通过方程（9）有效处理模糊的失真。17.87/0.51023.13/0.69315.44/0.28017.43/0.49317.27/0.47423.75/0.73916760表2. 不同方法在彩色BSD68数据集上的不同退化设置下的平均PSNR和SSIM结果[39, 48, 62]。最好的两个结果分别用红色和蓝色突出显示。0退化设置方法0缩放核噪声双三次插值 VDSR RCAN IRCNN DeblurGAN GFN ZSSR DPSR 因子类型级别 +RCAN +RCAN0高斯 23.47/0.596 23.36/0.589 23.59/0.603 24.79/0.629 19.36/0.400 – 21.44/0.542 27.28/0.7630× 2 运动 0 19.84/0.449 19.86/.451 19.82/0.448 28.54/0.806 17.46/0.268 – 17.99/0.367 30.05/0.8690圆盘 21.85/0.507 21.86/0.508 21.85/0.507 25.48/0.671 19.33/0.370 – 21.25/0.490 28.61/0.8160高斯 22.11/0.526 22.03/0.520 22.20/0.532 23.52/0.566 18.18/0.347 – 18.97/0.442 25.22/0.6650运动 0 18.83/0.424 18.84/0.424 18.81/0.422 25.88/0.699 16.25/0.228 – 16.80/0.348 27.22/0.7690圆盘 20.70/0.464 20.70/0.465 20.69/0.464 23.82/0.594 18.28/0.336 – 19.05/0.430 26.19/0.7160高斯 2.55 22.05/0.513 21.95/0.

下载后可阅读完整内容，剩余1页未读，立即下载