Learning for Video Super-Resolution through HR Optical Flow Estimation
Longguang Wang, Yulan Guo, Zaiping Lin, Xinpu Deng, and Wei An
School of Electronic Science, National University of Defense Technology
Changsha 410073, China
{wanglongguang15, yulan.guo, linzaiping, dengxinpu, anwei}@nudt.edu.cn
Abstract
Video super-resolution (SR) aims to generate a sequence
of high-resolution (HR) frames with plausible and tempo-
rally consistent details from their low-resolution (LR) coun-
terparts. The generation of accurate correspondence plays
a significant role in video SR. It is demonstrated by tra-
ditional video SR methods that simultaneous SR of both
images and optical flows can provide accurate correspon-
dences and better SR results. However, LR optical flows
are used in existing deep learning based methods for cor-
respondence generation. In this paper, we propose an end-
to-end trainable video SR framework to super-resolve both
images and optical flows. Specifically, we first propose
an optical flow reconstruction network (OFRnet) to infer
HR optical flows in a coarse-to-fine manner. Then, mo-
tion compensation is performed according to the HR optical
flows. Finally, compensated LR inputs are fed to a super-
resolution network (SRnet) to generate the SR results. Ex-
tensive experiments demonstrate that HR optical flows pro-
vide more accurate correspondences than their LR coun-
terparts and improve both accuracy and consistency per-
formance. Comparative results on the Vid4 and DAVIS-
10 datasets show that our framework achieves the state-
of-the-art performance. The codes will be released soon
at: https://github.com/LongguangWang/SOF-VSR-Super-
Resolving-Optical-Flow-for-Video-Super-Resolution-.
1. Introduction
Super-resolution (SR) aims to generate high-resolution
(HR) images or videos from their low-resolution (LR) coun-
terparts. As a typical low-level computer vision problem,
SR has been widely investigated for decades [23, 5, 7]. Re-
cently, the prevalence of high-definition display further ad-
vances the development of SR. For single image SR, image
details are recovered using the spatial correlation in a sin-
gle frame. In contrast, inter-frame temporal correlation can
further be exploited for video SR.
Since temporal correlation is crucial to video SR, the
Groundtruth
SOF-VSRTDVSRVSRnet
Figure 1. Temporal profiles under ×4 configuration for VSRnet
[13], TDVSR [20] and our SOF-VSR on Calendar and City. Pur-
ple boxes represent corresponding temporal profiles. Our SOF-
VSR produces finer details in temporal profiles, which are more
consistent with the groundtruth.
key to success lies in accurate correspondence generation.
Numerous methods [6, 19, 22] have demonstrated that the
correspondence generation and SR problems are closely in-
terrelated and can boost each other’s accuracy. Therefore,
these methods integrate the SR of both images and opti-
cal flows in a unified framework. However, current deep
learning based methods [18, 13, 35, 2, 20, 21] mainly focus
on the SR of images, and use LR optical flows to provide
correspondences. Although LR optical flows can provide
sub-pixel correspondences in LR images, their limited ac-
curacy hinders the performance improvement for video SR,
especially for scenarios with large upscaling factors.
In this paper, we propose an end-to-end trainable video
SR framework to generate both HR images and optical
flows. The SR of optical flows provides accurate correspon-
dences, which not only improves the accuracy of each HR
image, but also achieves better temporal consistency. We
first introduce an optical flow reconstruction net (OFRnet)
to reconstruct HR optical flows in a coarse-to-fine manner.
These HR optical flows are then used to perform motion
compensation on LR frames. A space-to-depth transforma-
tion is therefore used to bridge the resolution gap between
HR optical flows and LR frames. Finally, the compensated
LR frames are fed to a super-resolution net (SRnet) to gen-
erate each HR frame. Extensive evaluation is conducted
4321
arXiv:1809.08573v2 [cs.CV] 25 Oct 2018