GAN Metrics¶

Inception Score¶

proposed by Improved techniques for training GANs (NIPS 2016), section 4
apply the Inception model to every generated image to get the conditional label distribution \(p(y|x)\)
based on 2 assumptions

Images that contain meaningful objects should have a conditional label distribution \(p(y|x)\) with low entropy.
i.e. real photo have higher probably belong to 1 class
The model to generate varied images, so the marginal \(\inf p(y|x = G(z)) dz\) should have high entropy.
i.e. a good generative model should output different classes uniformly

\( IS=e^{\mathbb{E}_{x~p_G} D_{KL}(p(y|x)||p(y)} \)

\( x~p_G \) : sample generated by generator
\( p(y|x) \): the conditional label distribution of generated sample x
\( p(y) \): average conditional label distribution of all generated samples
\( D_{KL}(P||Q) \) : Kullback–Leibler divergence larger IS -> larger KL-distance -> distribution of single generated sample is different from average -> (based on 2 assumptions) better generator

Disadvantage¶

IS do not use the statistics of real world samples and compare it to the statistics of synthetic samples

Fréchet Inception Distance (FID)¶

introduced by GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium (NIPS 2017) section A1
Let \(p(.)\) be distribution of model samples, \(p_w(.)\) be distrubtion of samples from real world, The Fréchet distance, also known as Wasserstein-2 distance, \(d(., .)\) between the Gaussian with mean and covariance \((m, \sigma)\) obtained from \(p(.)\) and the Gaussian \((m_w, \sigma_w)\) obtained from \(p_w(.)\)

\[d^2{((m,C),{m_w, C_w)} = ||m-m_w||^2_2 + Trace(\sigma+\sigma_w - 2 \sqrt{(\sigma \sigma_w)})\]

the lower FID, the better GAN

Kernel Inception Distance (KID)¶

Demystifying mmd gans (ICLR 2018)
OpenReview

the squared MMD between Inception representations, with polynomial kernel, \(k(x, y)={(\frac{1}{d}x^T y+1)}^3\) where d is the representation dimension
similar to FID, also using Inception-v3, but KID does not assume a parametric form for the distribution of activation and is unbiased
the lower KID, the better GAN