MEMC-Net¶
Motion Estimation and Motion Compensation Driven Neural Network for Video Interpolation and Enhancement from Shanghai Jiao Tong University
- exploit motion estimation and motion compensation in a neural network
- propose an adaptive warping layer based on optical flow and compensation filters for synthesizing new pixels. This novel warping layer is fully differentiable such that the gradients can be back-propagated to both the ME and MC networks.
- To account for the occlusions, we estimate occlusion masks to adaptively blend the warped frames. Furthermore, the missing pixels in holes and unreliable pixels of the warped frames are processed by a post-processing CNN
- simultaneously estimate the flow and compensation kernels with respect to the original reference frames, then combine with adaptive warping layer
Motion Estimation and Motion Compensation Driven Neural Network¶
3.1 MEMC-Net Framework¶
3.2 Adaptive Warping Layer¶
warps images or feature based on given optical flow and local convolutional kernels
Foward pass
\(I(x) : \mathbb{Z}^2 \to \mathbb{R}^3\) denote the RGB image (from 2D coordinates to 3 RGB color real value)
\(f(x):= (u(x),v(x) \) represent the optical flow field, u(x), v(x) denote the horizontal and vertical part of 2D vector
\( k^l(x) = [k^l_\mathbf{r}(x)]_{H \times W} (r \in [-R+1,R]^2) \) indicate the interpolation kernel where R is the kernel size
For each kernel: Image part: shift by optical flow + Kernel
\(k^l_\mathbf{r}\): 16 (maps to 4x4) channels interpolation kernel learned in kernel estimation network
\(k^d_\mathbf{r}\): 4x4 coefficients computed from f(x)= u(x), v(x) fractional part
interpolate the 4x4 kernel \(k^l_\mathbf{r}\) with optical flow
backward pass
compute the gradients with respect to optical flow and interpolation kernels respectively. I do not understand how to differentiate \(I(x+f(x)+r)\) part… It seems it do not learn optional flow estimation, using pre-trained model directly, in this case adaptive wrapping layer only learn/fine tune the optical flow transform for kernel layer. From Hyper-parameter settings, it only fine tune pre-trained model with low learning rate.
3.3 Flow Project Layer¶
Let \(f_{t_x \to t_y}(x) \) be the motion vector field of coordinate x in frame \(I_{t_x} to I_{t_y}\) given \(f_{t-1 \to t+1}(y) \), find \(f_{t \to t-1}(x)\) and \(f_{t \to t+1} \) Project with outside-in strategy
4 video frame interpolation¶
All 4 branches take \(I_{t-1}, I_{t+1} \)as input
module | function/output | architecture |
---|---|---|
Motion estimation | estimate optical flow | FlowNetS |
Kernel estimation | 2 R^2=RxR coefficient maps | U-Net |
Mask estimation | 2-channel feature map | U-Net |
Context extraction | , warp by adaptive warping layer and fed to post-processing | ResNet18(for MEMC-Net*) |