生物医学工程 | 机器学习 - Binary Classification （0-1分布）

Binary Classification

假设

可线性分割的数据
Binary classification，即数据是可以分为两类的

目的

将数据按照label分为两个类别

Set up

input space

\mathcal{X}=\mathbb{R}^d

label space / output space

\mathcal{Y}=\{-1,1\}

分类器｜classifier｜hypothesis class $\mathcal{F}$

\mathcal{F}:=\{x \mapsto sign(\mathbf{w}^\top \mathbf{x}+\mathbf{b})\ |\ \mathbf{w}\in\mathbb{R}^d,\mathbf{b}\in\mathbb{R}\}

其中 $\omega$ 是weight vector； $b$ 是bias，sign是符号函数，具体定义如下：

\text{sign}(a)=\left\{ \begin{aligned} &1 \qquad &&\text{if}\ a\geq0\\ &-1 \qquad &&\text{otherwise} \end{aligned} \right.

为了简化后续的计算，我们希望classifier只含有w但没有b，因此我们通过增加一个维度，将b放在w的第二个维度作为常数看待，具体如下：

x \mapsto \left[ \begin{matrix}x\\1 \end{matrix}\right] \quad w \mapsto \left[ \begin{matrix}w\\b \end{matrix}\right]

原始classifier简化为

\mathcal{F}:=\{x \mapsto sign(\mathbf{w}^\top \mathbf{x})\ \}

Empricial Risk Minizer

我们使用如下的loss function来求ERM

\ell(f(x),y)=\left\{ \begin{aligned} &0 \qquad &&\text{if}\ f(x)=y\\ &1 \qquad &&\text{otherwise} \end{aligned} \right.

该式子也被称为0-1 loss，1代表预测错误，0代表预测正确。该式子也可以用指示函数 $\mathbb{I}$ （Indicator function）表示：

\mathbb{I}[f(x)\neq y] = \left\{ \begin{aligned} &1 \qquad &&\text{if}\ f(x)\neq y\\ &0 \qquad &&\text{if}\ f(x)= y \end{aligned} \right.

通过将classifier带入loss function，可以求出ERM如下

\begin{aligned} \min_w \hat{err}(w)&=\min_w \frac{1}{m}\sum_{i=1}^m \mathbb{I}[\text{sign}(w^\top x_i)\neq y_i]\\ &= \min_w \frac{1}{m}\sum_{i=1}^m \mathbb{I}[\text{sign}(y_iw^\top x_i<0)] \end{aligned}

能够使ERM最小的w被称为 $\hat w$ ：

\begin{aligned} \hat w&=\arg\min_w \frac{1}{m}\sum_{i=1}^m \mathbb{I}[\text{sign}(w^\top x_i)\neq y_i]\\ &=\arg \min_w \frac{1}{m}\sum_{i=1}^m \mathbb{I}[\text{sign}(y_iw^\top x_i<0)] \end{aligned}

当数据集是线性可分割的，则 $\hat w$ 被称为 $w^*$ ，此时的 $w^*$ 对任意点而言，均满足 $y_i=sign(\omega_*^\top x_i)$ ，且此时的 $\omega_*^\top x_i$ 是两种类别的分界线

Perceptron 算法

目的

更新w直到所有数据点都能正确分类

适用范围

数据点要求完全线性可分割
一旦出现类似XOR的数据或者线性不可分割（Non-linearly separable data）的数据，则无法使用perceptron
1. XOR和Non-linearly的数据可以用kernel SVM分割

算法

更新规则

如果分类错误，即 $\text{sign}(w^\top x_i)\neq y_i$ 或者 $\text{sign}(y_iw^\top x_i<0)$ ，则按照规则 $w_{t+1}=w_t+y_ix_i$ 进行更新

如果 $y_i=1$ 则 $w_{t+1}^\top x_i = w_t^\top x_i +\|x_i\|^2 > w_t^\top x_i$ ，w向着positive的方向更新

如果 $y_i=-1$ 则 $w_{t+1}^\top x_i = w_t^\top x_i -\|x_i\|^2 < w_t^\top x_i$ ，w向着negative的方向更新

margin $\gamma$ 和算法收敛Convergence

前提条件：所有点都在单位圆内 $||x_i||_2≤1$ ，并且此时的 $||\omega_*||=1$

结论：Perceptron algorithm至多进行 $1/\gamma^2$ 轮就会收敛，且会返回 $\text{sign}(\omega^Tx_i)=y_i$ 的分类器，此时所有点都被正确分类

证明：参考https://machine-learning-upenn.github.io/assets/notes/Lec2.pdf

**margin $\gamma$ **：表示所有点到超平面（hyperplane）的最小距离

\gamma=\min_{i\in[m]}\frac{|w_*^\top x_i|}{\|w\|}=\min_{i\in[m]}|w_*^\top x_i|

reference：

https://www.cs.cornell.edu/courses/cs4780/2022sp/notes/LectureNotes06.html

https://machine-learning-upenn.github.io/calendar/

声明：此blog内容为上课笔记，仅为分享使用。部分图片和内容取材于课本、老师课件、网络。如果有侵权，请联系aursus.blog@gmail.com删除。