Activation Functions¶
Introduce non-linearity into the output of a neuron
Step¶
could be used for hash layer
Signum¶
could be used for hash layer
Sigmoid¶
a.k.a. logistic
saturated, monotonic
derivative non-monotonic
Tanh¶
saturated, monotonic
derivative non-monotonic
ReLU¶
Rectified linear unit non-saturated, monotonic, derivative monotonic fast and solve gradient vanishing, but might cause gradient explosion could be considered as dropout when input < 0
Leaky ReLU¶
PReLU - Parametric ReLU¶
Delving Deep into Rectifiers:Surpassing Human-Level Performance on ImageNet Classification
The momentum method is adopted when updating \(a_i\)
where µ is the momentum and \(\epsilon\) is the learning rate
ELU - Exponential Linear Uints (2015)¶
Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)
SELU - Scaled Exponential Linear Unit (NIPS 2017)¶
Self-Normalizing Neural Networks
usually use to replace normalization layer
α and λ are derived from the inputs. For standard scaled inputs (mean 0, stddev 1), the values are α=1.6732~, λ=1.0507~ Alexia Jolicoeur-Martineau said replacing batchNorm and ReLUs with SELUs help training high-resolution DCGAN
Softmax¶
mapping multiple neruals output to [0,1]
usually used in last layer to output probability of each label for classification