Tech/ Tricks¶

Data¶

data augmentation¶

horizontally flipping, random crops, color jittering

PCA Whitening¶

reduce dimension
i.i.d (independent and identically distributed)

Dataset balancing¶

set weighting for each class¶

OHEM (2016)¶

Training Region-based Object Detectors with Online Hard Example Mining

OHEM + sampling¶

Focal Loss¶

Dropout¶

AlexNet
solve: overfit

Regarlization¶

plenty the weight in loss function
solve: overfit

Normalization¶

Normalization
solve: Internal Covariate Shift

Skip Connection¶

Residual scaling¶

if the number of filters exceeded 1000, the residual variants started to exhibit instabilities and the network has just “died” early in the training, meaning that the last layer before the average pooling started to produce only zeros after a few tens of thousands of iterations.
caling down the residuals before adding them to the previous layer activation seemed to stabilize the training.

visual gradient vanishing¶

how-to-fix-these-vanishing-gradients

Batch size¶

Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour(2017) Solve difficulties of large minibatches with distributed synchronous stochastic gradient descent(SGD)

Read the Docs v: latest

Versions: latest; stable; develop

Downloads

On Read the Docs: Project Home; Builds