BME | Machine Learning - Principal Component Analysis (PCA)

Concept

Purpose: Dimensionality Reduction

input $x \subset R^d$ , $\{x^{(i)}; i = 1, ..., n\}$

output $Z \subset R^k$

$F: X \rightarrow Z$ , $k<<d$

Here, $i$ denotes the number of samples (total $n$ ), and $j$ denotes the number of features (total $d$ ).

Preprocessing｜Data Preprocessing

Purpose: Normalize all features to data with mean= $\mathbb{E}[x_j^{(i)}]$ =0 and variance= $Var[x_j^{(i)}]$ =1

Method:

x_j^{(i)}\leftarrow \frac{x_j^{(i)}-\mu_j}{\sigma_j}\\ \text{where, }\mu_j=\frac{1}{n}\sum_{i=1}^n x_j^{(i)},\ \sigma^2_j=\frac{1}{n}\sum_{i=1}^n(x_{j}^{(i)}-\mu_j)^2

If we can ensure that different features have the same scale, it can be represented as:

? Parameter estimation utilizes the EM algorithm

Computing Major Axis of Variation $u$

$u$ is a unit vector, $x$ represents the sample, denoted as different points on the graph.

During projection (reducing 2D to 1D), to retain more information, we choose the projection data’s variance, i.e., the projection points’ distribution that is more uniform.

As normalization has been performed, $u$ must necessarily pass through the origin.

Maximizing Projection Variance

The projection of point $x$ onto vector $u$ is $x^Tu$ ,

Calculation of Projection:

||Proj_u(x)||_2=|x^Tu|

Calculation of Variance:

Var(Proj_u(x))=|x^Tu|^2

Variance of all points $x$ projected onto a single $u$ :

\frac{1}{n}\sum_{i=1}^n(x^{(i)^T}u)^2=\frac{1}{n}\sum_{i=1}^nu^Tx^{(i)}x^{(i)^T}u\\=u^T(\frac{1}{n}\sum_{i=1}^nx^{(i)}x^{(i)^T})u =u^T\hat\Sigma u

If $||u||_2=1$ , the principal eigenvector can be simplified as $\hat\Sigma = \frac{1}{n}\sum_{i=1}^nx^{(i)}x^{(i)^T}$ , i.e.,

\max_{||u||_2=1}u^T\hat\Sigma u =\lambda_1

Where $\lambda\_1$ represents the first eigenvalue of $\hat\Sigma$ (principal eigenvalues), and in this case, $u$ represents the 1st principal eigenvector of $\hat\Sigma$ .

Summary

The best 1D direction for $x$ is the 1st eigenvector of covariance, i.e., $v_1\bar\Sigma$ . PyTorch’s torch.lobpcg function can be utilized.

If we want to project data to 1D, select $u$ to be the principal eigenvector of $\Sigma$ .

Projected data after reducing dimensions: $X.mm(V)$ , where $V$ represents the first $k$ eigenvectors, with size (d, k). Here, $X$ goes from $m \times d$ dimensions to $m \times k$ .

If we aim to project data into $k$ dimensions:

Method 1: Choose $u_1, u_2, ..., u_k$ to be the top $k$ eigenvectors of $\Sigma$ .

\max_{u_1,u_2,...,u_k}\sum_{i=1}^u\left|\left|\left[\begin{matrix}u_1^Tx^{(i)}\\u_2^Tx^{(i)}\\...\\u_k^Tx^{(i)} \end{matrix}\right] \right|\right|_2^2=v_1,v_2,...,v_k

Method 2:

\min_{u_1,u_2,...,u_k}\sum_{i=1}^n||x_i-\bar x_i||_2^2\\ =\sum_{i=1}^n||x_i-\sum_{l=1}^k(u^T_lx_i)u_l||_2^2

Here, $\bar x_i$ represents the reconstruction example given by basic components.

Note: The content in this blog is class notes shared for educational purposes only. Some images and content are sourced from textbooks, teacher materials, and the internet. If there is any infringement, please contact aursus.blog@gmail.com for removal.

Concept

Preprocessing｜Data Preprocessing

Computing Major Axis of Variation uuu

Maximizing Projection Variance

Summary

Computing Major Axis of Variation $u$