写在前面
在奇异谱分析的综述里看到有专门一块介绍Karhunen-Loeve变换,所以就仔细看了下相关的介绍,发下张贤达老师的矩阵书里有一小节介绍了离散版本的Karhunen-Loeve变换,比wiki上用函数的角度解释的清楚,比国外教材也详细,搬运了一下,其实本质上与主成分分析一致,熟悉特征分解的都能看懂。
正交基的选取
信号可使用新的正交基函数表示,并期望具有一些优良性质:
最小均方误差
对信号x=[x1,…,xM]T使用酉变换w=QHx,其中Q−1=QH。记为Q∈UM,其中UM表示为M阶酉矩阵。则x可表示称系数w的线性组合
x=Qw=i=1∑Mwiqi
下面使用前m个系数w1,…,wm来逼近信号x
x^=i=1∑mwiqi
对应的误差
em=x−x^=i=m+1∑Mwiqi
对正交基单位化约束,即qiHqi=1。记自相关矩阵Rx=E{xxH}。均方误差
Em=E{emHem}=i=m+1∑MqiHE{∣wi∣2}qi=i=m+1∑ME{∣wi∣2}qiHqi=i=m+1∑ME{∣wi∣2}=i=m+1∑MqiHRxqi
其中最后一个等式是利用wi=qiHx得到。因此最小化均方误差对应于如下优化问题
Q∈UMmini=m+1∑MqiHRxqis.t.qiHqi=1,∀i
Karhunen-Loeve变换
构造拉格朗日函数
L(qi,λi)=i=m+1∑MqiHRxqi+i=m+1∑Mλi(1−qiHqi)
分别对变量qi求偏导并设为0,即∂qi∗∂L=0,可得
Rxqi=λiqi,i=m+1,…,M
该变换就是Karhunen-Loeve变换。这个特征方程表示取前m个单位正交基近似原信号时,最小的方差对应的乘子与正交基分别是信号自相关矩阵Rx的后M−m个特征值与特征向量,那么前m个正交基应该是Rx的前m个特征向量。
离散Karhunen-Loeve变换
对自相关矩阵Rx进行特征值分解
Rx=i=1∑MλiuiuiH
由于特征值或奇异值的能量分布通常会集中于少数几个,因此可选择K个大特征值,从而忽略其他M−K个较小的特征值。对应的k阶离散Karhunen-Loeve展开,即信号的近似表示
x^=i=1∑Kwiui
记U=[u1,…,uK]由K个特征值对应的特征向量组成,有w=UHx。均方误差可用特征值表示
EK=i=K+1∑MuiHRxui=i=K+1∑MuiH(i=1∑MλiuiuiH)ui=i=K+1∑Mλi
这些较小的特征值保证均方误差EK很小。
离散Karhunen-Loeve反变换
一旦获得了正交基u1,…,uK与对应的系数w1,…,wK,可重构出信号x^=∑i=1Kwiui。该方式可应用至信号的编码与解码。如果发射方与接收方提前都具有特征向量的信息,只需传输K个编码系数即可,这样不仅保证数据的安全,也降低发射数据的长度。
与主成分分析的异同
来自wiki对Karhunen-Loeve变换的解释
"The above expansion into uncorrelated random variables is also known as the Karhunen–Loève expansion or Karhunen–Loève decomposition. The empirical version (i.e., with the coefficients computed from a sample) is known as the Karhunen–Loève transform (KLT), principal component analysis, proper orthogonal decomposition (POD), Empirical orthogonal functions (a term used in meteorology and geophysics), or the Hotelling transform."
说明Karhunen-Loeve变换与主成分分析联系密切,区别只在于处理的矩阵:
因为样本均值为0时,协方差阵等于自相关矩阵,两个方法本质上等价。但是样本均值不为0时,协方差矩阵等于去均值后信号的自相关矩阵。如同下面的解释:
PCA depend on the scaling of the variables and applicability of PCA is limited by certain assumptions made in its derivation. The claim that the PCA used for dimensionality reduction preserves most of the information of the data is misleading. Indeed, without any assumption on the signal model, PCA cannot help to reduce the amount of information lost during dimensionality reduction, where information was measured using Shannon entropy.
The coefficients in the KLT are random variables and the expansion basis depends on the process. In fact, the orthogonal basis functions used in this representation are determined by the covariance function of the process. KLT adapts to the process in order to produce the best possible basis for its expansion.it reduces the total mean-square error resulting of its truncation. Because of this property, it is often said that the KL transform optimally compacts the energy. The main implication and difficulty of the KL transformation is computing the eigenvectors of the linear operator associated to the covariance function, which are given by the solutions to the integral equation.
The integral equation thus reduces to a simple matrix eigenvalue problem, which explains why the PCA has such a broad domain of applications.
借用博客的观点:
Karhunen-Loeve定理
令Xt为概率空间(Ω,F,P)上的零均值且平方可积的随机过程,在闭的有界区间[a,b]内,具有连续的协方差函数KX(s,t),令ek是平方可积空间L2([a,b])上由线性算子TKX的特征函数构成的正交基,对应的特征值记为λk,则
- KX(s,t)是一个Mercer核函数:KX(s,t)=∑k=1∞λkek(s)ek(t)
- Xt可由特征函数ek(t)展开表示Xt=∑k=1∞Zkek(t)
- 无穷级数关于t在L2收敛SN=∑k=1NZkek(t)→0,t→0
- 变量则可表示为Xt在ek(t)上的投影Zk=∫abXtek(t)dt
- 零均值性E(Zk)=0,∀k∈N
- 不相关性E(ZiZj)=δijλj,∀i,j∈N
小结
- Karhunen-Loeve变换与小波变换、傅里叶变换不一样的地方在于自适应的正交基函数。
- 信号处理领域对应Karhunen-Loeve变换,机器学习领域对应主成分分析。
- 线性Karhunen-Loeve变换可以看作离散Karhunen-Loeve变换的连续版本,但是非线性Karhunen-Loeve变换就看不太懂了。
- wiki上的东西非常赞,推荐有数学功底的人看,后面的例子与应用都具有工科特色,后续还会继续更新。
References