Linear Regression¶
see also: wiki/Linear_regression Target: Finding the best fit line/curve to data
OLS - Ordinary Least Square¶
based on matrix/ vector formulation
\[ \begin{align}\begin{aligned}\newcommand{\vect}[1]{\boldsymbol{#1}}
y_i = \beta_1x_{i1}+\beta_2x_{i2}+...+\beta_px_{ip}+\epsilon\\y = \vect{X}\beta+\epsilon\\\hat{\vect{\beta}}=(\vect{X}^T\vect{X})^{-1}\vect{X}^T\vect{y}\end{aligned}\end{align} \]
see also: wiki/Ordinary_least_squares
Coefficient of determination - \(r^2\)¶
\[r^2 = 1-\dfrac{SS_{res}}{SS_{tot}}= 1-\dfrac{\sum_i(\hat{y}_i-y_i)^2}{\sum_i(\hat{y}_i-\bar{\hat{y}})^2}\]
Measure of how well the model approximate the real data \(r^2\) perfect match: \(r^2=0\) always return expected y regardless of inputs: \(r^2=1\)
Advantage:¶
- exact solution
- fast (when dataset is small)
Disadvantage:¶
- not scalable. Required lot of memory when dataset is large
Gradient Descent¶
library implementation¶
scikit-learn model sklearn.linear_model.LinearRegression
// TODO: where should it put?