Transfer Learning¶

How transferable are features in deep neural networks? (NIPS 2014)
Example: Given labelled grey-scaled MNIST and unlabeled color MNIST, want to train model for classifier of color MNIST without labelling color MNIST.

Embedding¶

Domain Adaptive Neural Networks for Object Recognition (PRICAI 2014)
use Maximum Mean Discrepancy (MMD) as regularization to reduce the distribution mismatch between the source and target domains in the latent space.

Deep domain confusion: Maximizing for domain invariance (2014)
adaptation layer along with a domain confusion loss based on MMD
deeper than DaNN

Train big model first, then use it as teacher to teach small model (with faster inference speed)

Do Deep Nets Really Need to be Deep (NIPS 2014) learn value before softmax, could add some unlabelled data
Distilling the Knowledge in a Neural Network(NIPS 2014) learn soft target Better than training small model with labelled data directly. Probably because distillisaton prevent overfit

module	function
feature extractor	model to be transfered and tunned
label predictor	predict output
doman classifier	identify if target input within source input domain. If clasifier distinguish as new domain, high loss-> force feature extractor learn to mix 2 domain