Issues¶

Overfitting¶

solutions: validation set & early stopping, weight regularization, dropout

Gradient Vanishing¶

solutions:

reduce number of layers
ReLU instead of sigmod/ tanh
skip connection, e.g. U-net

Gradient Explosion¶

solutions:

reduce number of layers
gradient clipping
weight regularization
skip connection, e.g. Residual module

Loss divergent¶

Internal Covariate Shift¶

reason: input of layer/activation is not i.i.d. when there is a change in the input distribution to our network. When the input distribution changes, hidden layers try to learn to adapt to the new distribution
cause problem: unstable learning
good solutions: Normalization bad solution: lower learning rate

Latent Space¶

better be independent and follow () distribution so that it could be manipulated