BME | Machine Learning - Linear Regression

Linear Regression

Input Space

\mathcal{X}=\mathbb{R}^d

Label Space/Output Space ( $y$ ranges from 0 to 1)

\mathcal{Y}=\mathbb{R}

Hypothesis Class $F$

\mathcal{F}:\{x \mapsto\ w^\top x+b | w\in\mathbb{R}^d,b\in\mathbb{R}\}

Loss Function ( $l_2$ -loss) Square Loss

\ell(f(x),y)=(f(x)-y)^2\\

Loss Function (Absolute Loss)

\ell(f(x),y)=|f(x)-y|

Empirical Risk Minimizer (Convex)

\hat R(w)=\frac{1}{m}\sum_{i=1}^m(y_i-w^\top x_i)^2

As the above function is convex, we can find the minimum by taking its derivative and setting it to zero. $\hat w$ obtained from ERM looks like this (if $X^TX$ is invertible):

\hat w=(X^\top X)^{-1}X^\top Y

Regularization

Regularization: To prevent $X^TX$ from becoming singular, resulting in large $w$ , we introduce a penalty function.

\hat G=\hat R(w)+\lambda \psi(w)

Here, $\psi(w)$ commonly takes values such as $\|w\|_1$ or $\|w\|_2^2$ .

Ridge Regression | $\psi=||\omega||_2^2$

\hat G(w)=\frac{1}{m}\sum_{i=1}^m(y_i-w^\top x_i)^2+\lambda\|w\|_2^2\\ \hat w_\lambda=(X^\top X+\lambda m I)^{-1}X^\top Y

LASSO Regression | $\psi=||\omega||_1$ | When we require sparse solutions

\hat G(w)=\frac{1}{m}\sum_{i=1}^m(y_i-w^\top x_i)^2+\lambda\|w\|_1\\

Disclaimer: This blog content is intended as class notes and is solely for sharing purposes. Some images and content are sourced from textbooks, teacher presentations, and the internet. If there are any copyright infringements, please contact aursus.blog@gmail.com for removal.