BME | Machine Learning - Duality
Purpose: Generally, algorithms such as MLE can only optimize unconstrained expressions, such as maxω1m∑l(ω;xi,ji)\max_\omega \frac{1}{m}\sum l(\omega; x_i, j_i)maxωm1∑l(ω;xi,ji). However, when encountering algorithms with constraints, one needs to employ Duality, LP, QP to solve such problems. Additionally, Duality, LP, QP can also address the first type of problems without constraints.
Duality Problem
Problem Type
minωJ(ω)s.t. ci(ω)≤0, for i∈jei(ω)=0, for i∈ϵ——(0)\min_\omega J(\omega)\\
\ ...
BME | Machine Learning - Support Vector Machine SVM
Linear SVM
Set up
Given n data points (x1,y1),...,(xn,yn)(x_1, y_1), ..., (x_n, y_n)(x1,y1),...,(xn,yn), where yiy_iyi is 1 or -1, indicating which class xix_ixi belongs to. Each xix_ixi is a p-dimensional real vector.
The goal of SVM is to find a maximum-margin hyperplane that separates all points xix_ixi based on yi=1y_i=1yi=1 and yi=−1y_i=-1yi=−1, ensuring the maximum distance from the hyperplane to the nearest points of both groups.
SVM Objective: Find a hyperplane that maximizes t ...
BME | Machine Learning - Independent Component Analysis (ICA)
Concept
Purpose: Source separation
input: x⊂Rdx\subset \mathbb{R}^dx⊂Rd, with n x’s. $ x^{(i)}.\text{size} = (d,1) $
output: s⊂Rds\subset \mathbb{R}^ds⊂Rd, where s should not be Gaussian distributed, otherwise s cannot be solved
function: f(x)=s, s.t. x(i)=As(i) or x=Asf(x)=s, \text{ s.t. } x^{(i)}=As^{(i)}\text{ or }x=Asf(x)=s, s.t. x(i)=As(i) or x=As
Here, A is called the mixing matrix, W is called the unmixing matrix, W=A−1W=A^{-1}W=A−1, s=Wxs=Wxs=Wx, sj(i)=ωjTx(i)s_j^{(i)}=\omega_j^Tx^{( ...
BME | Machine Learning - Principal Component Analysis (PCA)
Concept
Purpose: Dimensionality Reduction
input x⊂Rdx \subset R^dx⊂Rd, {x(i);i=1,...,n}\{x^{(i)}; i = 1, ..., n\}{x(i);i=1,...,n}
output Z⊂RkZ \subset R^kZ⊂Rk
F:X→ZF: X \rightarrow ZF:X→Z, k<<dk<<dk<<d
Here, iii denotes the number of samples (total nnn), and jjj denotes the number of features (total ddd).
Preprocessing|Data Preprocessing
Purpose: Normalize all features to data with mean=E[xj(i)]\mathbb{E}[x_j^{(i)}]E[xj(i)]=0 and variance=Var[xj(i)]Var[x_j^{(i)}]Var[xj(i)]=1
...
hexo - Bilingual hexo + butterfly
Introduction
To create a blog that can be switched between English and Chinese, I researched various tutorials online. Most of them utilize the next theme or the hexo-generator-i18n plugin.
After some searching, I found a solution for bilingual switching by referring to this link.
The general idea involves creating two repositories on GitHub, one for each language, and setting up config.yml and _config.butterfly.yml for both the Chinese and English versions. Locally, two folders, source-en and s ...
BME | Machine Learning - Gaussian Mixture & EM
GMM(Gaussian Mixture Model)
What is a Gaussian Mixture Model?
Firstly, we have k Gaussian models, denoted as N(μ1,σ12),N(μ2,σ22),...,N(μk,σk2)N(\mu_1, \sigma_1^2), N(\mu_2, \sigma_2^2), ..., N(\mu_k, \sigma_k^2)N(μ1,σ12),N(μ2,σ22),...,N(μk,σk2).
According to certain proportions, we sampled some data from the Gaussian function N(μj,σj2)N(\mu_j, \sigma_j^2)N(μj,σj2) with proportions ϕj\phi_jϕj.
Empirically, it can be observed that ∑j=1kϕj=1\sum_{j=1}^{k}\phi_j=1∑j=1kϕj=1, which means th ...
BME | Machine Learning - K means Clustering
K means Clustering
Assumptions
There are kkk subsets C1,C2,C3,...,CkC_1, C_2, C_3, ..., C_kC1,C2,C3,...,Ck, and 1,...,n1, ..., n1,...,n data points, and all data points satisfy the following conditions:
Each point belongs to a cluster.
Each cluster is non-overlapping.
Defining Distance
Z(C1,⋯ ,Ck)=∑l=1k12∣Cl∣∑i,j∈Cl∥xi−xj∥22Z ( C _ { 1 } , \cdots , C _ { k } ) = \sum _ {l = 1 } ^ { k } \frac { 1 } { 2 | C _ { l } | } \sum _ {i , j \in C _ { l }} { \| x _ { i } - x _ j\|_2 ^ { 2 }}
Z(C1,⋯ ...
BME | Machine Learning - Logistic Regression
Logistic Regression
Utilization: Curve fitting, classification into two classes, typically deals with data points.
Applicability: Can be applied universally. If data separation can be achieved with a single line, logistic regression can be used.
Input Space
X=Rd\mathcal{X}=\mathbb{R}^d
X=Rd
Label Space/Output Space (y ranges from 0 to 1)
Y=[0,1]\mathcal{Y}=[0,1]
Y=[0,1]
Hypothesis Class FFF
F:={x↦sigmoid(w⊤x+b) ∣ w∈Rd,b∈R},where sigmoid(a)=11+exp(−a)\mathcal{F}:=\{x \mapsto \text{sigmoid}(w^\to ...
BME | Machine Learning - Linear Regression
Linear Regression
Input Space
X=Rd\mathcal{X}=\mathbb{R}^d
X=Rd
Label Space/Output Space (yyy ranges from 0 to 1)
Y=R\mathcal{Y}=\mathbb{R}
Y=R
Hypothesis Class FFF
F:{x↦ w⊤x+b∣w∈Rd,b∈R}\mathcal{F}:\{x \mapsto\ w^\top x+b | w\in\mathbb{R}^d,b\in\mathbb{R}\}
F:{x↦ w⊤x+b∣w∈Rd,b∈R}
Loss Function (l2l_2l2-loss) Square Loss
ℓ(f(x),y)=(f(x)−y)2\ell(f(x),y)=(f(x)-y)^2\\
ℓ(f(x),y)=(f(x)−y)2
Loss Function (Absolute Loss)
ℓ(f(x),y)=∣f(x)−y∣\ell(f(x),y)=|f(x)-y|
ℓ(f(x),y)=∣f(x)−y∣
Empirical Risk Minimizer ...
BME | Machine Learning - Binary Classification (0-1 Distribution)
Binary Classification
Assumptions
Linearly separable data.
Binary classification, meaning the data can be divided into two classes.
Objective
To categorize the data into two classes based on labels.
Set up
Input space
X=Rd\mathcal{X} = \mathbb{R}^d
X=Rd
Label space / Output space
Y={−1,1}\mathcal{Y} = \{-1, 1\}
Y={−1,1}
Classifier | Hypothesis class F\mathcal{F}F
F:={x↦sign(w⊤x+b) ∣ w∈Rd,b∈R}\mathcal{F} := \{x \mapsto \text{sign}(\mathbf{w}^\top \mathbf{x} + \mathbf{b})\ |\ \mathbf{w}\in\mathb ...