Linear Regression
Input Space
X=Rd
Label Space/Output Space (y ranges from 0 to 1)
Y=R
Hypothesis Class F
F:{x↦ w⊤x+b∣w∈Rd,b∈R}
Loss Function (l2-loss) Square Loss
ℓ(f(x),y)=(f(x)−y)2
Loss Function (Absolute Loss)
ℓ(f(x),y)=∣f(x)−y∣
Empirical Risk Minimizer (Convex)
R^(w)=m1i=1∑m(yi−w⊤xi)2
As the above function is convex, we can find the minimum by taking its derivative and setting it to zero. w^ obtained from ERM looks like this (if XTX is invertible):
w^=(X⊤X)−1X⊤Y
Regularization
Regularization: To prevent XTX from becoming singular, resulting in large w, we introduce a penalty function.
G^=R^(w)+λψ(w)
Here, ψ(w) commonly takes values such as ∥w∥1 or ∥w∥22.
Ridge Regression | ψ=∣∣ω∣∣22
G^(w)=m1i=1∑m(yi−w⊤xi)2+λ∥w∥22w^λ=(X⊤X+λmI)−1X⊤Y
LASSO Regression | ψ=∣∣ω∣∣1 | When we require sparse solutions
G^(w)=m1i=1∑m(yi−w⊤xi)2+λ∥w∥1
Disclaimer: This blog content is intended as class notes and is solely for sharing purposes. Some images and content are sourced from textbooks, teacher presentations, and the internet. If there are any copyright infringements, please contact aursus.blog@gmail.com for removal.