Video Frame Interpolation¶

Optical Flow

Papers with Code: video-frame-interpolation

interpolation error (IE)¶

\[IE=\sqrt{\frac{1}N \sum_{x,y}\big(I(x,y) - I_{GT}(x,y)\big)^2}\]

from A Database and Evaluation Methodology for Optical Flow

Learning Image Matching by Simply Watching Video¶

Learning Image Matching by Simply Watching Video (ECCV 2016)
convolution encoder-decoder

Deep Voxel Flow¶

Video Frame Synthesis using Deep Voxel Flow (ICCV 2017)

voxel flow layer: a per-pixel, 3D optical flow vector across space and time in the input video. The final pixel is generated by trilinear interpolation across the input video volume (which is typically just two frames). Thus, for video interpolation, the final output pixel can be a blend of pixels from the previous and next frames. This voxel flow layer is similar to an optical flow field. However, it is only an intermediate layer, and its correctness is never directly evaluated. Thus, our method requires no optical flow supervision, which is challenging to produce at scale.

ASC¶

Video Frame Interpolation via Adaptive Separable Convolution (ICCV 2017)
REDS dataset use ASC to synthesize motion blur

MEMC-Net¶

MEMC-Net (TPAMI 2018)

DAIN¶

Depth-Aware Video Frame Interpolation (CVPR 2019) from Shanghai Jiao Tong University
pyTorch code | Papers with Code
based on MEMC-Net, with pre-trained PWC-Net, MegaDepth
new layer: Depth-Aware flow projection

module	architecture
flow estimation	PWC-Net
Depth Estimation	hourglass, Megadepth
Context extraction	one 7x7 convolution layer, then concatenate 2 residual blocks
kernel estimation	U-net
Adaptive Warping Layer	MEMC-Net/Adaptive Warping Layer

testing pre-trained model¶

GTX 1080 Ti 1280x720 about 2s per frame
issue: drifting inwards
jmspiewak said it is because of anomaly in the pretrained PWCNet model. The workaround of jmspiewak seems fix the issue. Need to study more about PWCNet

Zooming-Slow-Mo¶

Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video Super-Resolution (CVPR-2020)
pyTorch
video frame interpolation (VFI) and video super-resolution (VSR), i.e. temporal interpoliation and spatial super-resolution are intra-related. This paper propose a unified one-stage STVSR framework to handle 2 tasks simultaneously.

temporally interpolate LR frame features in missing LR video frames capturing local temporal contexts by the proposed feature temporal interpoliation network
propose a deformable ConvLSTM to align and aggregate temporal information simultaneously for better leveraging global temporal contexts. ref: DCNv2
a deep reconstruction network is adopted to predict HR slow-motion video frames

##### testing pre-trained model used 6m to process 360x640 120 frames -> 1440x2560 238 frames on GTX 1080 Ti

BIN¶

Blurry Video Frame Interpolation (CVPR 2020)
pyTorch 1.3 | result video frame deblur + interpolation with inter-pyramid recurrent module that adopts ConvLSTM units