Concept

Purpose: Source separation

input: xRdx\subset \mathbb{R}^d, with n x’s. $ x^{(i)}.\text{size} = (d,1) $

output: sRds\subset \mathbb{R}^d, where s should not be Gaussian distributed, otherwise s cannot be solved

function: f(x)=s, s.t. x(i)=As(i) or x=Asf(x)=s, \text{ s.t. } x^{(i)}=As^{(i)}\text{ or }x=As

Here, A is called the mixing matrix, W is called the unmixing matrix, W=A1W=A^{-1}, s=Wxs=Wx, sj(i)=ωjTx(i)s_j^{(i)}=\omega_j^Tx^{(i)}

W=[ω1Tω2T...ωdT]W = \left[ \begin{matrix} -\omega_1^T-\\-\omega_2^T-\\...\\-\omega_d^T- \end{matrix} \right]

ICA Algorithm

Maximum Likelihood Estimation|MLE Algorithm

Assuming sources are independent of each other, then

p(s)=j=1dps(sj)p(s) = \prod_{j=1}^dp_s(s_j)

Background Knowledge 1: px(x)p_x(x) is the density of x, psp_s is the density of s, and since x=W1sx=W^{-1}s, then px(x)=ps(Wx)Wp_x(x)=p_s(Wx)|W|

Based on Knowledge 1, substitute x=As=W1sx=As=W^{-1}s into

p(s)=j=1dps(ωjTx)Wp(s) =\prod_{j=1}^dp_s(\omega_j^Tx)\cdot|W|

Background Knowledge 2: Cumulative Distribution Function (CDF) F is defined as F(z0)=P(zz0)=z0pz(z)dzF(z_0)=P(z\geq z_0)=\int_{-\infty}^{z_0}p_z(z)dz, density pz(z)=F(z)p_z(z)=F'(z)

Background Knowledge 3: For the sigmoid function, g(s)=g(s)(1g(s))g'(s)=g(s)(1-g(s))

Background Knowledge 4: WW=W(W1)T\nabla_W|W|=|W|(W^{-1})^T

To specify the density of sis_i, we choose a specific CDF, which can be any CDF other than Gaussian. For computational convenience (as per Knowledge 3), we choose the sigmoid function g(s)=1/(1+es)g(s)=1/(1+e^{-s}), then as per Knowledge 2, substitute it to get ps(s)=g(s)p_s(s)=g'(s), and obtain the likelihood

l(W;x)=i=1n(j=1dlogg(ωjTx(i))+logW)\mathscr{l}(W;x)=\sum_{i=1}^n(\sum_{j=1}^d\log g'(\omega_j^Tx^{(i)})+\log|W|)

W|W| refers to det(W)det(W)

Iteration

To maximize l(W;x)\mathscr{l}(W;x), using Knowledge 3 and 4, iterate on W. The iterative algorithm formula (for a fixed i) is:

W:=W+αWl(W;x)=W+α([12g(ω1Tx(i))12g(ω2Tx(i))...12g(ωdTx(i))]x(i)T+(WT)1)W:=W+\alpha\nabla_W\mathscr{l}(W;x) =W+\alpha\left(\left[\begin{matrix} 1-2g(\omega_1^Tx^{(i)}) \\ 1-2g(\omega_2^Tx^{(i)}) \\ ...\\ 1-2g(\omega_d^Tx^{(i)}) \end{matrix}\right] x^{(i)T}+(W^T)^{-1} \right)

Convergence

Continue until convergence of the algorithm, i.e., W no longer changes. Then, we can obtain the original sources through s(i)=Wx(i)s^{(i)}=Wx^{(i)}

References

https://www.emerald.com/insight/content/doi/10.1016/j.aci.2018.08.006/full/html


Note: The content in this blog is class notes shared for educational purposes only. Some images and content are sourced from textbooks, teacher materials, and the internet. If there is any infringement, please contact aursus.blog@gmail.com for removal.