# 写在前面

介绍核主成分分析算法(Kernel Principal Component Analysis)[1]

# 主成分分析

给定NN个样本xiRpx_i \in \mathbb{R}^p,假设ixi=0\sum_i x_i =0,样本的协方差矩阵为Σ=1NixixiTRp×p\Sigma=\frac{1}{N}\sum_i x_i x_i^T \in \mathbb{R}^{p \times p},对其作特征值分解:

Σ=UΛUT=kλkukukT,UUT=Ip\Sigma = U \Lambda U^T = \sum_k \lambda_k u_k u_k^T, UU^T = I_p

得到降维投影映射yi=UkTxiy_i = U_k^T x_i,其中UkRp×kU_k\in \mathbb{R}^{p\times k}为前kk个特征列向量构成的子矩阵,显然

1NiyiyiT=1NiUkTxixiTUk=UkTΣUk=UkTUΛUTUkT=Λk\frac{1}{N}\sum_i y_i y_i^T = \frac{1}{N} \sum_i U_k^T x_i x_i^T U_k = U_k^T \Sigma U_k = U_k^T U \Lambda U^T U_k^T =\Lambda_k

其中ΛkRk×k\Lambda_k \in \mathbb{R}^{k \times k}为前kk个最大的特征值构成的对角矩阵。记样本矩阵X=[x1,x2,,xN]Rp×NX=[x_1,x_2,\dots,x_N]\in \mathbb{R}^{p \times N},以上问题可以用如下优化问题来概述

minUkUkT=IkixiPk(xi)2=mintr(XPk(X))T(XPk(X))=mintr(XT(IPk)T(IPk)Y)=mintr(XXT(IPk)2)=mintr(XXT(IPk))=mintr(XXT)tr(XXTUkUkT)=maxtr(UkTXXTUk)=maxtr(UkTΣUk)=maxtr(Σk)=i=1kλk\begin{aligned} &\min\limits_{U_k U_k^T = I_k} \sum_i \|x_i-{\mathcal{P}}_k(x_i)\|^2 \\ & = \min tr(X-{\mathcal{P}}_k(X))^T(X-{\mathcal{P}}_k(X))\\ & = \min tr(X^T(I-{\mathcal{P}}_k)^T(I-{\mathcal{P}}_k)Y)\\ & = \min tr(XX^T(I-{\mathcal{P}}_k)^2)\\ & = \min tr(XX^T(I-{\mathcal{P}}_k))\\ & = \min tr(XX^T) - tr(XX^T U_k U_k^T)\\ & = \max tr(U_k^T XX^T U_k)\\ & = \max tr(U_k^T \Sigma U_k) \\ & =\max tr(\Sigma_k)= \sum_{i=1}^k \lambda_k \end{aligned}

其中Pk=UkUkT{\mathcal{P}}_k = U_k U_k^T是到子空间的投影映射。此时UkU_k的最优解为前kk个特征向量构成的列酉阵。

# 线性到非线性

# 核主成分分析

向量内积a,b=aTb\langle a, b\rangle = a^T b,求解协方差矩阵的特征方程

λu=Σu=(1NixixiT)u=1Nixi,uxi\lambda u = \Sigma u = (\frac{1}{N}\sum_i x_i x_i^T) u = \frac{1}{N}\sum_i \langle x_i, u \rangle x_i

  • λxi,u=xi,Σu\lambda \langle x_i, u\rangle = \langle x_i, \Sigma u\rangle

  • u=ixi,uλNxi=iαixiu = \sum_i \frac{\langle x_i,u \rangle}{\lambda N} x_i = \sum_i \alpha_i x_i

引入非线性映射Φ:RpF,xX\Phi :{\mathbb{R}^p \to F, x\mapsto X},假设iΦ(xi)=0\sum_i \Phi(x_i)=0成立,否则可进行中心化处理Φ(xi)=Φ(xi)1NjΦ(xj)\Phi(x_i)=\Phi(x_i)-\frac{1}{N}\sum_j \Phi(x_j)。对应协方差矩阵为

Σˉ=1NiΦ(xi)Φ(xi)T\bar{\Sigma} = \frac{1}{N}\sum_i \Phi(x_i)\Phi(x_i)^T

对应特征值方程为λU=ΣˉU\lambda U = \bar{\Sigma} U,表明特征向量位于Φ(x1),,Φ(xN)\Phi(x_1),\cdots,\Phi(x_N)所张成的空间内,可得以下两个结论:

  • λΦ(xi),U=Φ(xi),ΣˉU\lambda \langle \Phi(x_i), U\rangle = \langle\Phi(x_i), \bar{\Sigma} U\rangle

  • U=jαjΦ(xj)U = \sum_j \alpha_j \Phi(x_j)

可得

λjαjΦ(xi),Φ(xj)=1Nj,kαjΦ(xi),Φ(xk)Φ(xk),Φ(xj)\lambda \sum_j \alpha_j \langle \Phi(x_i), \Phi(x_j)\rangle = \frac{1}{N}\sum_{j,k} \alpha_j\langle\Phi(x_i), \Phi(x_k)\rangle \langle\Phi(x_k), \Phi(x_j)\rangle

Kij=Φ(xi),Φ(xj),λ~=NλK_{ij} = \langle\Phi(x_i), \Phi(x_j)\rangle, \tilde{\lambda} = N \lambda

Kα=λ~αK \alpha = \tilde{\lambda}\alpha

由于特征向量的单位化,即UTU=1U^T U = 1,可得

i,jαiαjΦ(xi),Φ(xj)=αTKα=λ~αTα=1\sum_{i,j} \alpha_i \alpha_j \langle\Phi(x_i), \Phi(x_j)\rangle = \alpha^T K \alpha = \tilde{\lambda} \alpha^T \alpha =1

对于新的样本点tt,特征空间对应点Φ(t)\Phi(t)的投影可表示为

U,Φ(t)=jαjΦ(xj),Φ(t)=jαjK(xj,t)\langle U, \Phi(t)\rangle = \sum_j \alpha_j \langle \Phi(x_j), \Phi(t)\rangle = \sum_j \alpha_j K(x_j,t)

# 计算步骤

  1. 计算矩阵KK
  2. 计算特征向量并归一化
  3. 计算新的样本点对应的特征向量

# 核函数

不同核函数具有不同的意义,后面会专门写一个核方法、核技巧的总结。

# Kernel PCA性质(继承于PCA)

  • qq个主成分比其他任意qq个主成分表示的方差是最大的
  • qq个主成分比其他任意qq个主成分表示的均方误差是最小的
  • 主成分具有无关性
  • qq个主成分比其他任意qq个主成分表示的互信息最多

# 参考文献


  1. Scholkopf B, Smola A J, Muller K, et al. Nonlinear component analysis as a kernel eigenvalue problem[J]. Neural Computation, 1998, 10(5): 1299-1319. ↩︎