Concept

Purpose: Dimensionality Reduction

input xRdx \subset R^d, {x(i);i=1,...,n}\{x^{(i)}; i = 1, ..., n\}

output ZRkZ \subset R^k

F:XZF: X \rightarrow Z, k<<dk<<d

Here, ii denotes the number of samples (total nn), and jj denotes the number of features (total dd).

Preprocessing|Data Preprocessing

Purpose: Normalize all features to data with mean=E[xj(i)]\mathbb{E}[x_j^{(i)}]=0 and variance=Var[xj(i)]Var[x_j^{(i)}]=1

Method:

xj(i)xj(i)μjσjwhere, μj=1ni=1nxj(i), σj2=1ni=1n(xj(i)μj)2x_j^{(i)}\leftarrow \frac{x_j^{(i)}-\mu_j}{\sigma_j}\\ \text{where, }\mu_j=\frac{1}{n}\sum_{i=1}^n x_j^{(i)},\ \sigma^2_j=\frac{1}{n}\sum_{i=1}^n(x_{j}^{(i)}-\mu_j)^2

If we can ensure that different features have the same scale, it can be represented as:

? Parameter estimation utilizes the EM algorithm

Computing Major Axis of Variation uu

uu is a unit vector, xx represents the sample, denoted as different points on the graph.

During projection (reducing 2D to 1D), to retain more information, we choose the projection data’s variance, i.e., the projection points’ distribution that is more uniform.

As normalization has been performed, uu must necessarily pass through the origin.

Maximizing Projection Variance

The projection of point xx onto vector uu is xTux^Tu,

Calculation of Projection:

Proju(x)2=xTu||Proj_u(x)||_2=|x^Tu|

Calculation of Variance:

Var(Proju(x))=xTu2Var(Proj_u(x))=|x^Tu|^2

Variance of all points xx projected onto a single uu:

1ni=1n(x(i)Tu)2=1ni=1nuTx(i)x(i)Tu=uT(1ni=1nx(i)x(i)T)u=uTΣ^u\frac{1}{n}\sum_{i=1}^n(x^{(i)^T}u)^2=\frac{1}{n}\sum_{i=1}^nu^Tx^{(i)}x^{(i)^T}u\\=u^T(\frac{1}{n}\sum_{i=1}^nx^{(i)}x^{(i)^T})u =u^T\hat\Sigma u

If u2=1||u||_2=1, the principal eigenvector can be simplified as Σ^=1ni=1nx(i)x(i)T\hat\Sigma = \frac{1}{n}\sum_{i=1}^nx^{(i)}x^{(i)^T}, i.e.,

maxu2=1uTΣ^u=λ1\max_{||u||_2=1}u^T\hat\Sigma u =\lambda_1

Where λ_1\lambda\_1 represents the first eigenvalue of Σ^\hat\Sigma (principal eigenvalues), and in this case, uu represents the 1st principal eigenvector of Σ^\hat\Sigma.

Summary

The best 1D direction for xx is the 1st eigenvector of covariance, i.e., v1Σˉv_1\bar\Sigma. PyTorch’s torch.lobpcg function can be utilized.

If we want to project data to 1D, select uu to be the principal eigenvector of Σ\Sigma.

Projected data after reducing dimensions: X.mm(V)X.mm(V), where VV represents the first kk eigenvectors, with size (d, k). Here, XX goes from m×dm \times d dimensions to m×km \times k.

If we aim to project data into kk dimensions:

Method 1: Choose u1,u2,...,uku_1, u_2, ..., u_k to be the top kk eigenvectors of Σ\Sigma.

maxu1,u2,...,uki=1u[u1Tx(i)u2Tx(i)...ukTx(i)]22=v1,v2,...,vk\max_{u_1,u_2,...,u_k}\sum_{i=1}^u\left|\left|\left[\begin{matrix}u_1^Tx^{(i)}\\u_2^Tx^{(i)}\\...\\u_k^Tx^{(i)} \end{matrix}\right] \right|\right|_2^2=v_1,v_2,...,v_k

Method 2:

minu1,u2,...,uki=1nxixˉi22=i=1nxil=1k(ulTxi)ul22\min_{u_1,u_2,...,u_k}\sum_{i=1}^n||x_i-\bar x_i||_2^2\\ =\sum_{i=1}^n||x_i-\sum_{l=1}^k(u^T_lx_i)u_l||_2^2

Here, xˉi\bar x_i represents the reconstruction example given by basic components.


Note: The content in this blog is class notes shared for educational purposes only. Some images and content are sourced from textbooks, teacher materials, and the internet. If there is any infringement, please contact aursus.blog@gmail.com for removal.