Issues¶
Overfitting¶
solutions: validation set & early stopping, weight regularization, dropout
Gradient Vanishing¶
solutions:
- reduce number of layers
- ReLU instead of sigmod/ tanh
- skip connection, e.g. U-net
Gradient Explosion¶
solutions:
- reduce number of layers
- gradient clipping
- weight regularization
- skip connection, e.g. Residual module
Loss divergent¶
Internal Covariate Shift¶
reason: input of layer/activation is not i.i.d. when there is a change in the input distribution to our network. When the input distribution changes, hidden layers try to learn to adapt to the new distribution
cause problem: unstable learning
good solutions: Normalization
bad solution: lower learning rate
Latent Space¶
better be independent and follow () distribution so that it could be manipulated