Loss Functions

L2 & L1 Norm (regularization)

L2 - Square

\[\begin{split}&L_2(x)=x^2\\ &f(y,\hat{y})=\sum^N_{i=1} (y_i-\hat{y_i})^2\end{split}\]

L1 - Absolute

\[\begin{split}&L_1(x)=|x|\\ &f(y,\hat{y})=\sum^N_{i=1} |y_i-\hat{y}_i|\end{split}\]

Smooth L1

\[\begin{split}\text{smooth L}_1(x)= \begin{cases} 0.5x^2 & if |x|<1 \\ |x|-0.5 & otherwise \end{cases}\end{split}\]
\[\begin{split}&f(y,\hat{y})= \begin{cases} 0.5(y-\hat{y})^2 & \text{if } |y-\hat{y}|<1 \\ |y-\hat{y}|-0.5 & otherwise \end{cases}\end{split}\]

Regression Loss Functions

MSE - Mean Squared Error

\[f(y,\hat{y})=\dfrac{1}{d}\sum^N_{i=1} (y_i-\hat{y}_i)^2\]

MAE - Mean Absolute Error

\[f(y,\hat{y})=\dfrac{1}{d}\sum^N_{i=1} |y_i-\hat{y}_i|\]

MSE - Mean Squared Logarithmic Error

\[f(y,\hat{y})=\dfrac{1}{d}\sum^N_{i=1} (log(y_i+1)-log(\hat{y}_i+1))^2\]

Cosine Proximity

\[\newcommand{\vect}[1]{\boldsymbol{#1}} f(\vect{y},\hat{\vect{y}})=-\dfrac{\vect{y} \cdot \hat{\vect{y}}}{||\vect{y}||_2 \cdot ||\hat{\vect{y}}||_2}= \dfrac{\Sigma^n_{i=1}y_i\cdot\hat{y}_i}{\sqrt{\sum^n_{i=1}y_i^2}\cdot\sqrt{\sum^n_{i=1}\hat{y}_i^2}}\]

Binary Classification Loss Functions

Binary Cross-Entropy

\(\hat{y}\) is prediction, y is ground truth

\[f(y,\hat{y})=-\dfrac{1}{n}\sum^n_{i=1}[y_i log(\hat{y}_i)+(1-y_i) log(1-\hat{y}_i)]\]

Hinge Loss

max-margin objective

\[f(y,\hat{y})=\dfrac{1}{n}\sum^n_{i=1}max(0,m-y_i\cdot\hat{y}_i)\]

Squared Hinge Loss

\[f(y,\hat{y})=\dfrac{1}{n}\sum^n_{i=1}(max(0,m-y_i\cdot\hat{y}_i))^2\]

Multi-Class Classification Loss Functions

Multi-Class Cross-Entropy Loss

\[f(y,\hat{y})=-\sum^M_{c=1}y_c log(\hat{y}_c)\]

M: total number of class

Softmax Loss, Negative Logarithmic Likelihood, NLL

Cross Entropy Loss same as Log Softmax + NULL Probability of each class

\[f(s,\hat{y})=-\sum^M_{c=1}\hat{y}_c log(s_c) \]

\(\hat{y}\) is 1*M vector, the value of true class is 1, other value is 0, hence

\[f(s,\hat{y})=-\sum^M_{c=1}\hat{y}_c log(s_c) = -log(s_c)\]

, where c is the true label

Kullback Leibler Divergence Loss

=MLE (Max likelihood estimation) probability distribution

\[\begin{split}f(y_i,\hat{y}_i)& =\dfrac{1}{n}\sum^n_{i=1}D_{KL}(y_i||\hat{y}_i)\\ & =\dfrac{1}{n}\sum^n_{i=1}[y_i\cdot log(\dfrac{y_i}{\hat{y}_i})]\\ & =\dfrac{1}{n}\sum^n_{i=1}(y_i\cdot log(y_i))-\dfrac{1}{n}\sum^n_{i=1}(y_i\cdot log(\hat{y}_i))\end{split}\]

CNN Loss

Loss Functions for Image Restoration with Neural Networks

Content Loss

compare pixel by pixel Blurry results because Euclidean distance is minimized by averaging all plausible output e.g. MAE, MSE

Detection

Focal Loss

Focal Loss

Divergence loss (on probability)

measuring the similarity between two probability distributions

KL-Divergence

JS-Divergence (Jensen–Shannon divergence)

smoothen version of KL divergence via

\[D_{JS}(p||q)=\dfrac{1}{2}D_{KL}(p||\dfrac{p+q}{2})+\dfrac{1}{2}D_{KL}(q||\dfrac{p+q}{2})\]

Wasserstein distance= Earth-Mover(EM) distance

WGAN