Optimizer¶
Momentum¶
AdaGrad¶
adaptive learning rate during training consider only the sum of the magnitude of previous gradients
AdaDelta¶
RMSprop (Root Mean Square Prop)¶
consider only the magnitude of gradient of immediate previous iteration
Adam¶
Momentum + RMSProp
EVE¶
Adam + further adjust learning rate by the ratio of change of input after updating weight
Cyclical Learning Rates¶
Cyclical Learning Rates for Training Neural Networks (WACV 2017)
Instead of monotonically decreasing the learning rate, this method lets the learning rate cyclically vary between reasonable boundary values
LR Range test¶
ractically eliminates the need to experimentally find the best values and schedule for the global learning rates
Note: Even though some said the performance of cyclical learning rate is not significant, the concept of LR range test used for other cycle/restart based optimizer
SGDR¶
SGDR: Stochastic Gradient Descent with Warm Restarts (ICLR 2017)
SGD with cyclical restart
avoid local minimium, useful in many model, e.g. EDVR