GAN Metrics

Inception Score

proposed by Improved techniques for training GANs (NIPS 2016), section 4
apply the Inception model to every generated image to get the conditional label distribution \(p(y|x)\)
based on 2 assumptions

  1. Images that contain meaningful objects should have a conditional label distribution \(p(y|x)\) with low entropy.
    i.e. real photo have higher probably belong to 1 class
  2. The model to generate varied images, so the marginal \(\inf p(y|x = G(z)) dz\) should have high entropy.
    i.e. a good generative model should output different classes uniformly

\( IS=e^{\mathbb{E}_{x~p_G} D_{KL}(p(y|x)||p(y)} \)

  • \( x~p_G \) : sample generated by generator
  • \( p(y|x) \): the conditional label distribution of generated sample x
  • \( p(y) \): average conditional label distribution of all generated samples
  • \( D_{KL}(P||Q) \) : Kullback–Leibler divergence larger IS -> larger KL-distance -> distribution of single generated sample is different from average -> (based on 2 assumptions) better generator

Disadvantage

  • IS do not use the statistics of real world samples and compare it to the statistics of synthetic samples

Fréchet Inception Distance (FID)

introduced by GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium (NIPS 2017) section A1
Let \(p(.)\) be distribution of model samples, \(p_w(.)\) be distrubtion of samples from real world, The Fréchet distance, also known as Wasserstein-2 distance, \(d(., .)\) between the Gaussian with mean and covariance \((m, \sigma)\) obtained from \(p(.)\) and the Gaussian \((m_w, \sigma_w)\) obtained from \(p_w(.)\)

\[d^2{((m,C),{m_w, C_w)} = ||m-m_w||^2_2 + Trace(\sigma+\sigma_w - 2 \sqrt{(\sigma \sigma_w)})\]
  • the lower FID, the better GAN

Kernel Inception Distance (KID)

Demystifying mmd gans (ICLR 2018)
OpenReview

  • the squared MMD between Inception representations, with polynomial kernel, \(k(x, y)={(\frac{1}{d}x^T y+1)}^3\) where d is the representation dimension
  • similar to FID, also using Inception-v3, but KID does not assume a parametric form for the distribution of activation and is unbiased
  • the lower KID, the better GAN