# 写在前面

在大牛Yuejie Chi的主页[1]上找到一个讲谱方法的小册子[2],准备系统地学习下。

# 子空间度量

# 旋转模糊

记子空间 U\mathcal{U}U{\mathcal{U}}^{\star} 的正交基分别为 U\boldsymbol{U}U\boldsymbol{U}^{\star} 。为了测量子空间 U\mathcal{U}U{\mathcal{U}}^{\star} 的距离,一个自然的想法是构造出一个度量UU\vert\vert\vert{\boldsymbol{U}-{\boldsymbol{U}}^{\star}}\vert\vert\vert,其中\vert\vert\vert{\cdot}\vert\vert\vert是感兴趣的一种范式,例如谱范数或Frobenius范数。然而这样的度量通常没有考虑全局的旋转模糊,即对任意旋转矩阵ROr×r\boldsymbol{R}\in\mathcal{O}^{r\times r},矩阵UR\boldsymbol{U}\boldsymbol{R}的列也是子空间U\mathcal{U}的有效正交基。这意味着下面情况可能出现:当子空间 U\mathcal{U}U{\mathcal{U}}^{\star} 相同时,其距离度量不为零,即

UU0\vert\vert\vert{\boldsymbol{U}-{\boldsymbol{U}^{\star}}}\vert\vert\vert \neq 0

这表明,测量两个子空间的接近程度的任何有意义的度量,都应适当考虑旋转模糊。

# 距离度量

  • 最佳旋转距离

计算距离之前找到最佳旋转矩阵R\boldsymbol R,在最佳旋转时测量距离

dist(U,U)minROr×rURU\mathsf{dist}_{\vert\vert\vert{\cdot}\vert\vert\vert}\big(\boldsymbol{U},{\boldsymbol{U}}^{\star}\big) \doteq \min _{\boldsymbol{R}\in \mathcal{O}^{r\times r}} \vert\vert\vert\boldsymbol{U} \boldsymbol{R} - \boldsymbol{U}^{\star}\vert\vert\vert

  • 投影矩阵之间的距离

由于子空间U\mathcal{U}的投影矩阵UU\boldsymbol{U}\boldsymbol{U}^{\top}是唯一的,且不受旋转的影响,即

UU=URRU,ROr×r\boldsymbol{U}\boldsymbol{U}^{\top}=\boldsymbol{U}\boldsymbol{R}\boldsymbol{R}^{\top}\boldsymbol{U}^{\top}, \forall \boldsymbol{R}\in \mathcal{O}^{r\times r}

这种旋转不变性可用来定义子空间 U\mathcal{U}U{\mathcal{U}}^{\star} 之间的距离

distp,(U,U)UUUU\mathsf{dist}_{\mathsf{p},\vert\vert\vert{\cdot}\vert\vert\vert}\big(\boldsymbol{U},{\boldsymbol{U}}^{\star}\big) \doteq \vert\vert\vert\boldsymbol{U} \boldsymbol{U}^{\top}- {\boldsymbol{U}}^{\star} {\boldsymbol{U}}^{\star\top}\vert\vert\vert

  • 通过主角进行几何构造

设矩阵UU\boldsymbol{U}^{\top}{\boldsymbol{U}}^{\star}的奇异值为σ1σ2σr0\sigma_{1}\geq \sigma_{2}\geq \cdots\geq \sigma_{r} \geq 0,由于

UUUU=1\| \boldsymbol{U}^{\top}{\boldsymbol{U}}^{\star} \| \leq \| \boldsymbol{U} \| \, \| {\boldsymbol{U}}^{\star} \|=1

所有的奇异值{σi}i=1r\{\sigma_{i}\}_{i=1}^r在区间[0,1][0,1]内。因此我们可以定义两个子空间之间的主角(或规范角)

θiarccos(σi)for all 1ir,\theta_{i} \doteq \arccos \left(\sigma_{i}\right)\qquad\text{for all }1\leq i\leq r,

这些角度满足

0θ1θrπ/2.0\leq\theta_{1}\leq \cdots\leq\theta_{r}\leq\pi/2.

有了这些角度,就可以测量子空间之间的距离

distsin,(U,U)sinΘ\mathsf{dist}_{\mathsf{sin},\vert\vert\vert{\cdot}\vert\vert\vert}\big(\boldsymbol{U},{\boldsymbol{U}}^{\star}\big) \doteq \vert\vert\vert\sin\boldsymbol{\Theta}\vert\vert\vert

其中Θ=diag(θ1,θ2,,θr),sinΘ=diag(sinθ1,sinθ2,,sinθr)\boldsymbol{\Theta}=\text{diag}(\theta_{1},\theta_{2},\ldots,\theta_{r}),\sin\boldsymbol{\Theta}=\text{diag}(\sin\theta_{1},\sin\theta_{2},\ldots,\sin\theta_{r})

# 度量之间的关系

  • 2rn2r\leq n时,矩阵UUUU\boldsymbol{U}\boldsymbol{U}^{\top}-{\boldsymbol{U}}^{\star}{\boldsymbol{U}}^{\star\top}的奇异值为

sinθr,sinθr,sinθr1,sinθr1,,sinθ1,sinθ12r,0,0,,0n2r.\underbrace{\sin\theta_{r},\, \sin\theta_{r},\, \sin\theta_{r-1},\, \sin\theta_{r-1},\,\cdots,\,\sin\theta_{1},\,\sin\theta_{1}}_{2r},\;\underbrace{0,\,0,\,\cdots,\,0}_{n-2r}.

这意味着投影矩阵的差和子空间之间的主角具有明确的联系。

  • 1rn1\le r \le n时,度量distp,(,)\mathsf{dist}_{\mathsf{p},\vert\vert\vert{\cdot}\vert\vert\vert}\big(\cdot,\cdot\big)distsin,(,)\mathsf{dist}_{\mathsf{sin},\vert\vert\vert{\cdot}\vert\vert\vert}\big(\cdot,\cdot\big)具有如下等式关系

UUUU=sinΘ=UU=UU12UUUUF=sinΘF=UUF=UUF\begin{aligned} \big\Vert \boldsymbol{U}\boldsymbol{U}^{\top}-{\boldsymbol{U}}^{\star}{\boldsymbol{U}}^{\star\top}\big\Vert & =\left\Vert \sin\boldsymbol{\Theta}\right\Vert = \big\Vert \boldsymbol{U}_{\perp}^{\top}\boldsymbol{U}^{\star}\big\Vert = \big\Vert \boldsymbol{U}^{\top}\boldsymbol{U}_{\perp}^{\star}\big\Vert\\ \tfrac{1}{\sqrt{2}} \big\Vert \boldsymbol{U}\boldsymbol{U}^{\top}-{\boldsymbol{U}}^{\star}{\boldsymbol{U}}^{\star\top}\big\Vert _{\mathrm{F}} & =\left\Vert \sin\boldsymbol{\Theta}\right\Vert _{\mathrm{F}} = \big\Vert \boldsymbol{U}_{\perp}^{\top}\boldsymbol{U}^{\star}\big\Vert_{\mathrm{F}} = \big\Vert \boldsymbol{U}^{\top}\boldsymbol{U}_{\perp}^{\star} \big\Vert_{\mathrm{F}} \end{aligned}

  • 1rn1\le r \le n时,度量dist(,)\mathsf{dist}_{\vert\vert\vert{\cdot}\vert\vert\vert}\big(\cdot,\cdot\big)distp,(,)\mathsf{dist}_{\mathsf{p},\vert\vert\vert{\cdot}\vert\vert\vert}\big(\cdot,\cdot\big)具有如下不等式关系

UUUUminROr×rURU2UUUU12UUUUFminROr×rURUFUUUUF\begin{aligned} \|\boldsymbol{U}\boldsymbol{U}^{\top} - \boldsymbol{U}^{\star}\boldsymbol{U}^{\star\top} \| &\leq \min_{\boldsymbol{R}\in \mathcal{O}^{r\times r}}\big\|\boldsymbol{U}\boldsymbol{R}-\boldsymbol{U}^{\star}\big\| \leq \sqrt{2} \|\boldsymbol{U}\boldsymbol{U}^{\top} - \boldsymbol{U}^{\star}\boldsymbol{U}^{\star\top} \| \\ \tfrac{1}{\sqrt{2}} \|\boldsymbol{U}\boldsymbol{U}^{\top} - \boldsymbol{U}^{\star}\boldsymbol{U}^{\star\top} \|_{\mathrm{F}} &\leq \min_{\boldsymbol{R}\in\mathcal{O}^{r\times r}}\left\Vert \boldsymbol{U}\boldsymbol{R}-\boldsymbol{U}^{\star}\right\Vert _{\mathrm{F}} \leq \|\boldsymbol{U}\boldsymbol{U}^{\top} - \boldsymbol{U}^{\star}\boldsymbol{U}^{\star\top} \|_{\mathrm{F}} \end{aligned}

# 特征子空间的扰动分析

n×nn\times n的实对称矩阵M\boldsymbol{M}^{\star}M=M+E\boldsymbol{M}=\boldsymbol{M}^{\star}+\boldsymbol{E} ,对其进行特征分解

M=i=1nλiuiui=[UU][Λ00Λ][UU]M=i=1nλiuiui=[UU][Λ00Λ][UU]\begin{aligned} \boldsymbol{M}^{\star} &=\sum_{i=1}^{n}\lambda_{i}^{\star}\boldsymbol{u}_{i}^{\star}\boldsymbol{u}_{i}^{\star\top} =\left[\begin{array}{cc} \boldsymbol{U}^{\star} & \boldsymbol{U}_{\perp}^{\star}\end{array}\right]\left[\begin{array}{cc} \boldsymbol{\Lambda}^{\star} & \boldsymbol{0}\\ \boldsymbol{0} & \boldsymbol{\Lambda}_{\perp}^{\star} \end{array}\right]\left[\begin{array}{c} \boldsymbol{U}^{\star\top}\\ \boldsymbol{U}_{\perp}^{\star\top} \end{array}\right]\\ \boldsymbol{M} &=\sum_{i=1}^{n}\lambda_{i}\boldsymbol{u}_{i}\boldsymbol{u}_{i}^{\top} =\left[\begin{array}{cc} \boldsymbol{U} & \boldsymbol{U}_{\perp}\end{array}\right]\left[\begin{array}{cc} \boldsymbol{\Lambda} & \boldsymbol{0}\\ \boldsymbol{0} & \boldsymbol{\Lambda}_{\perp} \end{array}\right]\left[\begin{array}{c} \boldsymbol{U}^{\top}\\ \boldsymbol{U}_{\perp}^{\top} \end{array}\right] \end{aligned}

其中分块矩阵按秩rr来选择。

U=[u1,,ur]Rn×r,U=[ur+1,,un]Rn×(nr),Λ=diag([λ1,,λr]),Λ=diag([λr+1,,λn]).\begin{aligned} \boldsymbol{U} &=[\boldsymbol{u}_{1},\cdots,\boldsymbol{u}_{r}]\in\mathbb{R}^{n\times r}, \\ \boldsymbol{U}_{\perp} &=[\boldsymbol{u}_{r+1},\cdots,\boldsymbol{u}_{n}]\in\mathbb{R}^{n\times (n-r)},\\ \boldsymbol{\Lambda}&=\text{diag}\big([\lambda_{1},\cdots,\lambda_{r}]\big),\\ \boldsymbol{\Lambda}_{\perp} &=\text{diag}\big([\lambda_{r+1},\cdots,\lambda_{n}]\big). \end{aligned}

矩阵 U,U,Λ,Λ\boldsymbol{U}^{\star}, \boldsymbol{U}_{\perp}^{\star}, \boldsymbol{\Lambda}^{\star}, \boldsymbol{\Lambda}_{\perp}^{\star} 的定义类似。

# Davis-Kahan sinΘ\sin\Theta定理

对常数α,βR\alpha,\beta\in \mathbb{R},eigengapΔ>0\Delta>0,假设下面条件成立

eigenvalues(Λ)[α,β],eigenvalues(Λ)(,αΔ][β+Δ,)\begin{aligned} \mathsf{eigenvalues}(\boldsymbol{\Lambda}^{\star}) &\subseteq [\alpha,\beta] ,\\ \mathsf{eigenvalues}(\boldsymbol{\Lambda}_{\perp}) &\subseteq (-\infty, \alpha-\Delta] \cup [\beta+\Delta, \infty) \end{aligned}

dist(U,U)2sinΘ2EUΔ2EΔ;distF(U,U)2sinΘF2EUFΔ2rEΔ.\begin{aligned} \mathsf{dist}\big(\boldsymbol{U},\boldsymbol{U}^{\star}\big) & \leq\sqrt{2}\|\sin\boldsymbol{\Theta}\|\leq\frac{\sqrt{2}\big\|\boldsymbol{E}\boldsymbol{U}^{\star}\big\|}{\Delta}\leq\frac{\sqrt{2}\|\boldsymbol{E}\|}{\Delta};\\ \mathsf{dist}_{\mathrm{F}}\big(\boldsymbol{U},\boldsymbol{U}^{\star}\big) & \leq\sqrt{2}\|\sin\boldsymbol{\Theta}\|_{\mathrm{F}}\leq\frac{\sqrt{2}\big\|\boldsymbol{E}\boldsymbol{U}^{\star}\big\|_{\mathrm{F}}}{\Delta}\leq\frac{\sqrt{2r}\|\boldsymbol{E}\|}{\Delta}. \end{aligned}

当条件替换如下时,定理仍然成立

eigenvalues(Λ)(,αΔ][β+Δ,);eigenvalues(Λ)[α,β].\begin{aligned} \mathsf{eigenvalues}(\boldsymbol{\Lambda}^{\star}) &\subseteq (-\infty, \alpha-\Delta] \cup [\beta+\Delta, \infty) ; \\ \mathsf{eigenvalues}(\boldsymbol{\Lambda}_{\perp}) &\subseteq [\alpha,\beta]. \end{aligned}

在如下意义下,该定理可被推广至酉不变范数\vert\vert\vert{\cdot}\vert\vert\vert

sinΘEUΔ\vert\vert\vert\sin\boldsymbol{\Theta}\vert\vert\vert \leq\frac{\vert\vert\vert\boldsymbol{E}\boldsymbol{U}^\star\vert\vert\vert}{\Delta}

如果E<(11/2)(λrλr+1)\|\boldsymbol{E}\|< (1-1/\sqrt{2}) (|\lambda_{r}^{\star}|-|\lambda_{r+1}^{\star}|),有

dist(U,U)2sinΘ2EUλrλr+12Eλrλr+1;distF(U,U)2sinΘF2EUFλrλr+12rEλrλr+1.\begin{aligned} \mathsf{dist}\big(\boldsymbol{U},\boldsymbol{U}^{\star}\big) & \leq\sqrt{2}\|\sin\boldsymbol{\Theta}\|\leq\frac{2 \big\|\boldsymbol{E}\boldsymbol{U}^{\star}\big\|}{|\lambda_{r}^{\star}|-|\lambda_{r+1}^{\star}|}\leq\frac{2\|\boldsymbol{E}\|}{|\lambda_{r}^{\star}|-|\lambda_{r+1}^{\star}|};\\ \mathsf{dist}_{\mathrm{F}}\big(\boldsymbol{U},\boldsymbol{U}^{\star}\big) & \leq\sqrt{2}\|\sin\boldsymbol{\Theta}\|_{\mathrm{F}}\leq\frac{2 \big\|\boldsymbol{E}\boldsymbol{U}^{\star}\big\|_{\mathrm{F}}}{|\lambda_{r}^{\star}|-|\lambda_{r+1}^{\star}|}\leq\frac{2\sqrt{r}\|\boldsymbol{E}\|}{|\lambda_{r}^{\star}|-|\lambda_{r+1}^{\star}|}. \end{aligned}

# 奇异子空间的扰动分析

现在不再限制矩阵的对称性和方阵结构,考虑一般的矩阵M=M+ERn1×n2\boldsymbol{M}=\boldsymbol{M}^{\star}+\boldsymbol{E}\in\mathbb{R}^{n_1 \times n_2},不妨设n1n2n_1\leq n_2,对矩阵进行奇异值分解

M=i=1n1σiuivi=[UU][Σ000Σ0][VV]M=i=1n1σiuivi=[UU][Σ000Σ0][VV]\begin{aligned} \boldsymbol{M}^{\star} & =\sum_{i=1}^{n_1}\sigma_{i}^{\star}\boldsymbol{u}_{i}^{\star}\boldsymbol{v}_{i}^{\star\top} =\left[\begin{array}{cc} \boldsymbol{U}^{\star} & \boldsymbol{U}_{\perp}^{\star}\end{array}\right]\left[\begin{array}{ccc} \boldsymbol{\Sigma}^{\star} & \boldsymbol{0} & \boldsymbol{0}\\ \boldsymbol{0} & \boldsymbol{\Sigma}_{\perp}^{\star} & \boldsymbol{0} \end{array}\right]\left[\begin{array}{c} \boldsymbol{V}^{\star\top}\\ \boldsymbol{V}_{\perp}^{\star\top} \end{array}\right]\\ \boldsymbol{M} & =\sum_{i=1}^{n_1}\sigma_{i}\boldsymbol{u}_{i}\boldsymbol{v}_{i}^{\top} =\left[\begin{array}{cc} \boldsymbol{U} & \boldsymbol{U}_{\perp}\end{array}\right]\left[\begin{array}{ccc} \boldsymbol{\Sigma} & \boldsymbol{0} & \boldsymbol{0}\\ \boldsymbol{0} & \boldsymbol{\Sigma}_{\perp} & \boldsymbol{0} \end{array}\right]\left[\begin{array}{c} \boldsymbol{V}^{\top}\\ \boldsymbol{V}_{\perp}^{\top} \end{array}\right] \end{aligned}

其中分块矩阵按秩rr来选择。

Σ=diag([σ1,,σr]),Σ=diag([σr+1,,σn1]),U=[u1,,ur]Rn1×r,U=[ur+1,,un1]Rn1×(n1r),V=[v1,,vr]Rn2×r,V=[vr+1,,vn2]Rn2×(n2r).\begin{aligned} \boldsymbol{\Sigma} &=\mathsf{diag}\big([\sigma_{1},\cdots,\sigma_{r}]\big),\\ \boldsymbol{\Sigma}_{\perp} &=\mathsf{diag}\big([\sigma_{r+1},\cdots,\sigma_{n_1}]\big),\\ \boldsymbol{U} &=[\boldsymbol{u}_{1},\cdots,\boldsymbol{u}_{r}]\in \mathbb{R}^{n_1\times r},\\ \boldsymbol{U}_{\perp} &=[\boldsymbol{u}_{r+1},\cdots,\boldsymbol{u}_{n_1}]\in \mathbb{R}^{n_1\times (n_1-r)},\\ \boldsymbol{V} &=[\boldsymbol{v}_{1},\cdots,\boldsymbol{v}_{r}]\in \mathbb{R}^{n_2\times r},\\ \boldsymbol{V}_{\perp} &=[\boldsymbol{v}_{r+1},\cdots,\boldsymbol{v}_{n_2}] \in \mathbb{R}^{n_2\times (n_2-r)}. \end{aligned}

矩阵Σ,Σ,U,U,V,V\boldsymbol{\Sigma}^{\star},\boldsymbol{\Sigma}_{\perp}^{\star},\boldsymbol{U}^{\star},\boldsymbol{U}_{\perp}^{\star},\boldsymbol{V}^{\star},\boldsymbol{V}_{\perp}^{\star}的定义类似。

# Wedin's sinΘ\sin\Theta定理

如果E<σrσr+1\|\boldsymbol{E}\|<\sigma_{r}^{\star}-\sigma_{r+1}^{\star},则

max{dist(U,U),dist(V,V)}2max{EU,EV}σrσr+1E;max{distF(U,U),distF(V,V)}2max{EUF,EVF}σrσr+1E.\begin{aligned} \max\left\{ \mathsf{dist}\big(\boldsymbol{U},\boldsymbol{U}^{\star}\big),\mathsf{dist}\big(\boldsymbol{V},\boldsymbol{V}^{\star}\big)\right\} & \leq\frac{ \sqrt{2} \max\big\{ \|\boldsymbol{E}^{\top}\boldsymbol{U}^{\star}\|,\|\boldsymbol{E}\boldsymbol{V}^{\star}\|\big\} }{\sigma_{r}^{\star}-\sigma_{r+1}^{\star}-\|\boldsymbol{E}\|};\\ \max\left\{ \mathsf{dist}_{\mathrm{F}}\big(\boldsymbol{U},\boldsymbol{U}^{\star}\big),\mathsf{dist}_{\mathrm{F}}\big(\boldsymbol{V},\boldsymbol{V}^{\star}\big)\right\} & \leq\frac{\sqrt{2}\max\big\{ \|\boldsymbol{E}^{\top}\boldsymbol{U}^{\star}\|_{\mathrm{F}},\|\boldsymbol{E}\boldsymbol{V}^{\star}\|_{\mathrm{F}}\big\} }{\sigma_{r}^{\star}-\sigma_{r+1}^{\star}-\|\boldsymbol{E}\|}. \end{aligned}

进一步,如果E<(11/2)(σrσr+1)\|\boldsymbol{E}\|< (1-1/\sqrt{2})(\sigma_{r}^{\star}-\sigma_{r+1}^{\star}),有

max{dist(U,U),dist(V,V)}2Eσrσr+1,max{distF(U,U),distF(V,V)}2rEσrσr+1,\begin{aligned} \max\left\{ \mathsf{dist}\big(\boldsymbol{U},\boldsymbol{U}^{\star}\big),\mathsf{dist}\big(\boldsymbol{V},\boldsymbol{V}^{\star}\big)\right\} & \leq\frac{ 2 \|\boldsymbol{E}\| }{\sigma_{r}^{\star}-\sigma_{r+1}^{\star}}, \\ \max\left\{ \mathsf{dist}_{\mathrm{F}}\big(\boldsymbol{U},\boldsymbol{U}^{\star}\big),\mathsf{dist}_{\mathrm{F}}\big(\boldsymbol{V},\boldsymbol{V}^{\star}\big)\right\} & \leq \frac{ 2\sqrt{r}\|\boldsymbol{E}\| }{\sigma_{r}^{\star}-\sigma_{r+1}^{\star}}, \end{aligned}

# 小结

后面还有一系列扰动理论及其应用,有时间再看,先接触下。

# References


  1. Yuejie Chi homepage: http://users.ece.cmu.edu/~yuejiec/index.html (opens new window) ↩︎

  2. Y. Chen, Y. Chi, J. Fan, and C. Ma. Spectral Methods for Data Science: A Statistical Perspective (opens new window). Foundation and Trends in Machine Learning, arXiv:2012.08496v1 (opens new window), 2020. ↩︎