GAN Metrics¶
Inception Score¶
proposed by Improved techniques for training GANs (NIPS 2016), section 4
apply the Inception model to every generated image to get the conditional label distribution \(p(y|x)\)
based on 2 assumptions
- Images that contain meaningful objects should have a conditional label distribution \(p(y|x)\) with low entropy.
i.e. real photo have higher probably belong to 1 class - The model to generate varied images, so the marginal \(\inf p(y|x = G(z)) dz\) should have high entropy.
i.e. a good generative model should output different classes uniformly
\( IS=e^{\mathbb{E}_{x~p_G} D_{KL}(p(y|x)||p(y)} \)
- \( x~p_G \) : sample generated by generator
- \( p(y|x) \): the conditional label distribution of generated sample x
- \( p(y) \): average conditional label distribution of all generated samples
- \( D_{KL}(P||Q) \) : Kullback–Leibler divergence larger IS -> larger KL-distance -> distribution of single generated sample is different from average -> (based on 2 assumptions) better generator
Disadvantage¶
- IS do not use the statistics of real world samples and compare it to the statistics of synthetic samples
Fréchet Inception Distance (FID)¶
introduced by GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium (NIPS 2017) section A1
Let \(p(.)\) be distribution of model samples, \(p_w(.)\) be distrubtion of samples from real world,
The Fréchet distance, also known as Wasserstein-2 distance, \(d(., .)\) between the Gaussian with mean and covariance \((m, \sigma)\) obtained from \(p(.)\) and the Gaussian \((m_w, \sigma_w)\) obtained from \(p_w(.)\)
\[d^2{((m,C),{m_w, C_w)} = ||m-m_w||^2_2 + Trace(\sigma+\sigma_w - 2 \sqrt{(\sigma \sigma_w)})\]
- the lower FID, the better GAN
Kernel Inception Distance (KID)¶
Demystifying mmd gans (ICLR 2018)
OpenReview
- the squared MMD between Inception representations, with polynomial kernel, \(k(x, y)={(\frac{1}{d}x^T y+1)}^3\) where d is the representation dimension
- similar to FID, also using Inception-v3, but KID does not assume a parametric form for the distribution of activation and is unbiased
- the lower KID, the better GAN