Video Frame Interpolation

Papers with Code: video-frame-interpolation

interpolation error (IE)

\[IE=\sqrt{\frac{1}N \sum_{x,y}\big(I(x,y) - I_{GT}(x,y)\big)^2}\]

from A Database and Evaluation Methodology for Optical Flow

Learning Image Matching by Simply Watching Video

Learning Image Matching by Simply Watching Video (ECCV 2016)
convolution encoder-decoder

Deep Voxel Flow

Video Frame Synthesis using Deep Voxel Flow (ICCV 2017)

voxel flow layer: a per-pixel, 3D optical flow vector across space and time in the input video. The final pixel is generated by trilinear interpolation across the input video volume (which is typically just two frames). Thus, for video interpolation, the final output pixel can be a blend of pixels from the previous and next frames. This voxel flow layer is similar to an optical flow field. However, it is only an intermediate layer, and its correctness is never directly evaluated. Thus, our method requires no optical flow supervision, which is challenging to produce at scale.

ASC

Video Frame Interpolation via Adaptive Separable Convolution (ICCV 2017)
REDS dataset use ASC to synthesize motion blur

DAIN

Depth-Aware Video Frame Interpolation (CVPR 2019) from Shanghai Jiao Tong University
pyTorch code | Papers with Code
based on MEMC-Net, with pre-trained PWC-Net, MegaDepth
new layer: Depth-Aware flow projection

module architecture
flow estimation PWC-Net
Depth Estimation hourglass, Megadepth
Context extraction one 7x7 convolution layer, then concatenate 2 residual blocks
kernel estimation U-net
Adaptive Warping Layer MEMC-Net/Adaptive Warping Layer

testing pre-trained model

GTX 1080 Ti 1280x720 about 2s per frame
issue: drifting inwards
jmspiewak said it is because of anomaly in the pretrained PWCNet model. The workaround of jmspiewak seems fix the issue. Need to study more about PWCNet

Zooming-Slow-Mo

Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video Super-Resolution (CVPR-2020)
pyTorch
video frame interpolation (VFI) and video super-resolution (VSR), i.e. temporal interpoliation and spatial super-resolution are intra-related. This paper propose a unified one-stage STVSR framework to handle 2 tasks simultaneously.

  1. temporally interpolate LR frame features in missing LR video frames capturing local temporal contexts by the proposed feature temporal interpoliation network
  2. propose a deformable ConvLSTM to align and aggregate temporal information simultaneously for better leveraging global temporal contexts. ref: DCNv2
  3. a deep reconstruction network is adopted to predict HR slow-motion video frames https://github.com/Mukosame/Zooming-Slow-Mo-CVPR-2020/raw/master/dump/framework.png
##### testing pre-trained model used 6m to process 360x640 120 frames -> 1440x2560 238 frames on GTX 1080 Ti

BIN

Blurry Video Frame Interpolation (CVPR 2020)
pyTorch 1.3 | result video frame deblur + interpolation with inter-pyramid recurrent module that adopts ConvLSTM units