写在前面 在大牛Yuejie Chi的主页上找到一个讲谱方法的小册子,准备系统地学习下。
子空间度量 旋转模糊 记子空间 U \mathcal{U} U 和 U ⋆ {\mathcal{U}}^{\star} U ⋆ 的正交基分别为 U \boldsymbol{U} U 和 U ⋆ \boldsymbol{U}^{\star} U ⋆ 。为了测量子空间 U \mathcal{U} U 和 U ⋆ {\mathcal{U}}^{\star} U ⋆ 的距离,一个自然的想法是构造出一个度量∣ ∣ ∣ U − U ⋆ ∣ ∣ ∣ \vert\vert\vert{\boldsymbol{U}-{\boldsymbol{U}}^{\star}}\vert\vert\vert ∣ ∣ ∣ U − U ⋆ ∣ ∣ ∣ ,其中∣ ∣ ∣ ⋅ ∣ ∣ ∣ \vert\vert\vert{\cdot}\vert\vert\vert ∣ ∣ ∣ ⋅ ∣ ∣ ∣ 是感兴趣的一种范式,例如谱范数或Frobenius范数。然而这样的度量通常没有考虑全局的旋转模糊,即对任意旋转矩阵R ∈ O r × r \boldsymbol{R}\in\mathcal{O}^{r\times r} R ∈ O r × r ,矩阵U R \boldsymbol{U}\boldsymbol{R} U R 的列也是子空间U \mathcal{U} U 的有效正交基。这意味着下面情况可能出现:当子空间 U \mathcal{U} U 和 U ⋆ {\mathcal{U}}^{\star} U ⋆ 相同时,其距离度量不为零,即
∣ ∣ ∣ U − U ⋆ ∣ ∣ ∣ ≠ 0 \vert\vert\vert{\boldsymbol{U}-{\boldsymbol{U}^{\star}}}\vert\vert\vert \neq 0
∣ ∣ ∣ U − U ⋆ ∣ ∣ ∣ = 0
这表明,测量两个子空间的接近程度的任何有意义的度量,都应适当考虑旋转模糊。
距离度量 计算距离之前找到最佳旋转矩阵R \boldsymbol R R ,在最佳旋转时测量距离
d i s t ∣ ∣ ∣ ⋅ ∣ ∣ ∣ ( U , U ⋆ ) ≐ min R ∈ O r × r ∣ ∣ ∣ U R − U ⋆ ∣ ∣ ∣ \mathsf{dist}_{\vert\vert\vert{\cdot}\vert\vert\vert}\big(\boldsymbol{U},{\boldsymbol{U}}^{\star}\big) \doteq \min _{\boldsymbol{R}\in \mathcal{O}^{r\times r}} \vert\vert\vert\boldsymbol{U} \boldsymbol{R} - \boldsymbol{U}^{\star}\vert\vert\vert
d i s t ∣ ∣ ∣ ⋅ ∣ ∣ ∣ ( U , U ⋆ ) ≐ R ∈ O r × r min ∣ ∣ ∣ U R − U ⋆ ∣ ∣ ∣
由于子空间U \mathcal{U} U 的投影矩阵U U ⊤ \boldsymbol{U}\boldsymbol{U}^{\top} U U ⊤ 是唯一的,且不受旋转的影响,即
U U ⊤ = U R R ⊤ U ⊤ , ∀ R ∈ O r × r \boldsymbol{U}\boldsymbol{U}^{\top}=\boldsymbol{U}\boldsymbol{R}\boldsymbol{R}^{\top}\boldsymbol{U}^{\top}, \forall \boldsymbol{R}\in \mathcal{O}^{r\times r}
U U ⊤ = U R R ⊤ U ⊤ , ∀ R ∈ O r × r
这种旋转不变性可用来定义子空间 U \mathcal{U} U 和 U ⋆ {\mathcal{U}}^{\star} U ⋆ 之间的距离
d i s t p , ∣ ∣ ∣ ⋅ ∣ ∣ ∣ ( U , U ⋆ ) ≐ ∣ ∣ ∣ U U ⊤ − U ⋆ U ⋆ ⊤ ∣ ∣ ∣ \mathsf{dist}_{\mathsf{p},\vert\vert\vert{\cdot}\vert\vert\vert}\big(\boldsymbol{U},{\boldsymbol{U}}^{\star}\big) \doteq \vert\vert\vert\boldsymbol{U} \boldsymbol{U}^{\top}- {\boldsymbol{U}}^{\star} {\boldsymbol{U}}^{\star\top}\vert\vert\vert
d i s t p , ∣ ∣ ∣ ⋅ ∣ ∣ ∣ ( U , U ⋆ ) ≐ ∣ ∣ ∣ U U ⊤ − U ⋆ U ⋆ ⊤ ∣ ∣ ∣
设矩阵U ⊤ U ⋆ \boldsymbol{U}^{\top}{\boldsymbol{U}}^{\star} U ⊤ U ⋆ 的奇异值为σ 1 ≥ σ 2 ≥ ⋯ ≥ σ r ≥ 0 \sigma_{1}\geq \sigma_{2}\geq \cdots\geq \sigma_{r} \geq 0 σ 1 ≥ σ 2 ≥ ⋯ ≥ σ r ≥ 0 ,由于
∥ U ⊤ U ⋆ ∥ ≤ ∥ U ∥ ∥ U ⋆ ∥ = 1 \| \boldsymbol{U}^{\top}{\boldsymbol{U}}^{\star} \| \leq \| \boldsymbol{U} \| \, \| {\boldsymbol{U}}^{\star} \|=1
∥ U ⊤ U ⋆ ∥ ≤ ∥ U ∥ ∥ U ⋆ ∥ = 1
所有的奇异值{ σ i } i = 1 r \{\sigma_{i}\}_{i=1}^r { σ i } i = 1 r 在区间[ 0 , 1 ] [0,1] [ 0 , 1 ] 内。因此我们可以定义两个子空间之间的主角(或规范角)
θ i ≐ arccos ( σ i ) for all 1 ≤ i ≤ r , \theta_{i} \doteq \arccos \left(\sigma_{i}\right)\qquad\text{for all }1\leq i\leq r,
θ i ≐ arccos ( σ i ) for all 1 ≤ i ≤ r ,
这些角度满足
0 ≤ θ 1 ≤ ⋯ ≤ θ r ≤ π / 2. 0\leq\theta_{1}\leq \cdots\leq\theta_{r}\leq\pi/2.
0 ≤ θ 1 ≤ ⋯ ≤ θ r ≤ π / 2 .
有了这些角度,就可以测量子空间之间的距离
d i s t s i n , ∣ ∣ ∣ ⋅ ∣ ∣ ∣ ( U , U ⋆ ) ≐ ∣ ∣ ∣ sin Θ ∣ ∣ ∣ \mathsf{dist}_{\mathsf{sin},\vert\vert\vert{\cdot}\vert\vert\vert}\big(\boldsymbol{U},{\boldsymbol{U}}^{\star}\big) \doteq \vert\vert\vert\sin\boldsymbol{\Theta}\vert\vert\vert
d i s t s i n , ∣ ∣ ∣ ⋅ ∣ ∣ ∣ ( U , U ⋆ ) ≐ ∣ ∣ ∣ sin Θ ∣ ∣ ∣
其中Θ = diag ( θ 1 , θ 2 , … , θ r ) , sin Θ = diag ( sin θ 1 , sin θ 2 , … , sin θ r ) \boldsymbol{\Theta}=\text{diag}(\theta_{1},\theta_{2},\ldots,\theta_{r}),\sin\boldsymbol{\Theta}=\text{diag}(\sin\theta_{1},\sin\theta_{2},\ldots,\sin\theta_{r}) Θ = diag ( θ 1 , θ 2 , … , θ r ) , sin Θ = diag ( sin θ 1 , sin θ 2 , … , sin θ r ) 。
度量之间的关系 当2 r ≤ n 2r\leq n 2 r ≤ n 时,矩阵U U ⊤ − U ⋆ U ⋆ ⊤ \boldsymbol{U}\boldsymbol{U}^{\top}-{\boldsymbol{U}}^{\star}{\boldsymbol{U}}^{\star\top} U U ⊤ − U ⋆ U ⋆ ⊤ 的奇异值为 sin θ r , sin θ r , sin θ r − 1 , sin θ r − 1 , ⋯ , sin θ 1 , sin θ 1 ⏟ 2 r , 0 , 0 , ⋯ , 0 ⏟ n − 2 r . \underbrace{\sin\theta_{r},\, \sin\theta_{r},\, \sin\theta_{r-1},\, \sin\theta_{r-1},\,\cdots,\,\sin\theta_{1},\,\sin\theta_{1}}_{2r},\;\underbrace{0,\,0,\,\cdots,\,0}_{n-2r}.
2 r sin θ r , sin θ r , sin θ r − 1 , sin θ r − 1 , ⋯ , sin θ 1 , sin θ 1 , n − 2 r 0 , 0 , ⋯ , 0 .
这意味着投影矩阵的差和子空间之间的主角具有明确的联系。
当1 ≤ r ≤ n 1\le r \le n 1 ≤ r ≤ n 时,度量d i s t p , ∣ ∣ ∣ ⋅ ∣ ∣ ∣ ( ⋅ , ⋅ ) \mathsf{dist}_{\mathsf{p},\vert\vert\vert{\cdot}\vert\vert\vert}\big(\cdot,\cdot\big) d i s t p , ∣ ∣ ∣ ⋅ ∣ ∣ ∣ ( ⋅ , ⋅ ) 和d i s t s i n , ∣ ∣ ∣ ⋅ ∣ ∣ ∣ ( ⋅ , ⋅ ) \mathsf{dist}_{\mathsf{sin},\vert\vert\vert{\cdot}\vert\vert\vert}\big(\cdot,\cdot\big) d i s t s i n , ∣ ∣ ∣ ⋅ ∣ ∣ ∣ ( ⋅ , ⋅ ) 具有如下等式关系 ∥ U U ⊤ − U ⋆ U ⋆ ⊤ ∥ = ∥ sin Θ ∥ = ∥ U ⊥ ⊤ U ⋆ ∥ = ∥ U ⊤ U ⊥ ⋆ ∥ 1 2 ∥ U U ⊤ − U ⋆ U ⋆ ⊤ ∥ F = ∥ sin Θ ∥ F = ∥ U ⊥ ⊤ U ⋆ ∥ F = ∥ U ⊤ U ⊥ ⋆ ∥ F \begin{aligned}
\big\Vert \boldsymbol{U}\boldsymbol{U}^{\top}-{\boldsymbol{U}}^{\star}{\boldsymbol{U}}^{\star\top}\big\Vert
& =\left\Vert \sin\boldsymbol{\Theta}\right\Vert = \big\Vert \boldsymbol{U}_{\perp}^{\top}\boldsymbol{U}^{\star}\big\Vert = \big\Vert \boldsymbol{U}^{\top}\boldsymbol{U}_{\perp}^{\star}\big\Vert\\
\tfrac{1}{\sqrt{2}} \big\Vert \boldsymbol{U}\boldsymbol{U}^{\top}-{\boldsymbol{U}}^{\star}{\boldsymbol{U}}^{\star\top}\big\Vert _{\mathrm{F}}
& =\left\Vert \sin\boldsymbol{\Theta}\right\Vert _{\mathrm{F}}
= \big\Vert \boldsymbol{U}_{\perp}^{\top}\boldsymbol{U}^{\star}\big\Vert_{\mathrm{F}} = \big\Vert \boldsymbol{U}^{\top}\boldsymbol{U}_{\perp}^{\star} \big\Vert_{\mathrm{F}}
\end{aligned}
∥ ∥ ∥ U U ⊤ − U ⋆ U ⋆ ⊤ ∥ ∥ ∥ 2 1 ∥ ∥ ∥ U U ⊤ − U ⋆ U ⋆ ⊤ ∥ ∥ ∥ F = ∥ sin Θ ∥ = ∥ ∥ ∥ U ⊥ ⊤ U ⋆ ∥ ∥ ∥ = ∥ ∥ ∥ U ⊤ U ⊥ ⋆ ∥ ∥ ∥ = ∥ sin Θ ∥ F = ∥ ∥ ∥ U ⊥ ⊤ U ⋆ ∥ ∥ ∥ F = ∥ ∥ ∥ U ⊤ U ⊥ ⋆ ∥ ∥ ∥ F
当1 ≤ r ≤ n 1\le r \le n 1 ≤ r ≤ n 时,度量d i s t ∣ ∣ ∣ ⋅ ∣ ∣ ∣ ( ⋅ , ⋅ ) \mathsf{dist}_{\vert\vert\vert{\cdot}\vert\vert\vert}\big(\cdot,\cdot\big) d i s t ∣ ∣ ∣ ⋅ ∣ ∣ ∣ ( ⋅ , ⋅ ) 和d i s t p , ∣ ∣ ∣ ⋅ ∣ ∣ ∣ ( ⋅ , ⋅ ) \mathsf{dist}_{\mathsf{p},\vert\vert\vert{\cdot}\vert\vert\vert}\big(\cdot,\cdot\big) d i s t p , ∣ ∣ ∣ ⋅ ∣ ∣ ∣ ( ⋅ , ⋅ ) 具有如下不等式关系 ∥ U U ⊤ − U ⋆ U ⋆ ⊤ ∥ ≤ min R ∈ O r × r ∥ U R − U ⋆ ∥ ≤ 2 ∥ U U ⊤ − U ⋆ U ⋆ ⊤ ∥ 1 2 ∥ U U ⊤ − U ⋆ U ⋆ ⊤ ∥ F ≤ min R ∈ O r × r ∥ U R − U ⋆ ∥ F ≤ ∥ U U ⊤ − U ⋆ U ⋆ ⊤ ∥ F \begin{aligned}
\|\boldsymbol{U}\boldsymbol{U}^{\top} - \boldsymbol{U}^{\star}\boldsymbol{U}^{\star\top} \|
&\leq
\min_{\boldsymbol{R}\in \mathcal{O}^{r\times r}}\big\|\boldsymbol{U}\boldsymbol{R}-\boldsymbol{U}^{\star}\big\|
\leq \sqrt{2} \|\boldsymbol{U}\boldsymbol{U}^{\top} - \boldsymbol{U}^{\star}\boldsymbol{U}^{\star\top} \| \\
\tfrac{1}{\sqrt{2}} \|\boldsymbol{U}\boldsymbol{U}^{\top} - \boldsymbol{U}^{\star}\boldsymbol{U}^{\star\top} \|_{\mathrm{F}}
&\leq
\min_{\boldsymbol{R}\in\mathcal{O}^{r\times r}}\left\Vert \boldsymbol{U}\boldsymbol{R}-\boldsymbol{U}^{\star}\right\Vert _{\mathrm{F}}
\leq \|\boldsymbol{U}\boldsymbol{U}^{\top} - \boldsymbol{U}^{\star}\boldsymbol{U}^{\star\top} \|_{\mathrm{F}}
\end{aligned}
∥ U U ⊤ − U ⋆ U ⋆ ⊤ ∥ 2 1 ∥ U U ⊤ − U ⋆ U ⋆ ⊤ ∥ F ≤ R ∈ O r × r min ∥ ∥ ∥ U R − U ⋆ ∥ ∥ ∥ ≤ 2 ∥ U U ⊤ − U ⋆ U ⋆ ⊤ ∥ ≤ R ∈ O r × r min ∥ U R − U ⋆ ∥ F ≤ ∥ U U ⊤ − U ⋆ U ⋆ ⊤ ∥ F
特征子空间的扰动分析 设n × n n\times n n × n 的实对称矩阵M ⋆ \boldsymbol{M}^{\star} M ⋆ 和M = M ⋆ + E \boldsymbol{M}=\boldsymbol{M}^{\star}+\boldsymbol{E} M = M ⋆ + E ,对其进行特征分解
M ⋆ = ∑ i = 1 n λ i ⋆ u i ⋆ u i ⋆ ⊤ = [ U ⋆ U ⊥ ⋆ ] [ Λ ⋆ 0 0 Λ ⊥ ⋆ ] [ U ⋆ ⊤ U ⊥ ⋆ ⊤ ] M = ∑ i = 1 n λ i u i u i ⊤ = [ U U ⊥ ] [ Λ 0 0 Λ ⊥ ] [ U ⊤ U ⊥ ⊤ ] \begin{aligned}
\boldsymbol{M}^{\star}
&=\sum_{i=1}^{n}\lambda_{i}^{\star}\boldsymbol{u}_{i}^{\star}\boldsymbol{u}_{i}^{\star\top}
=\left[\begin{array}{cc}
\boldsymbol{U}^{\star} & \boldsymbol{U}_{\perp}^{\star}\end{array}\right]\left[\begin{array}{cc}
\boldsymbol{\Lambda}^{\star} & \boldsymbol{0}\\
\boldsymbol{0} & \boldsymbol{\Lambda}_{\perp}^{\star}
\end{array}\right]\left[\begin{array}{c}
\boldsymbol{U}^{\star\top}\\
\boldsymbol{U}_{\perp}^{\star\top}
\end{array}\right]\\
\boldsymbol{M}
&=\sum_{i=1}^{n}\lambda_{i}\boldsymbol{u}_{i}\boldsymbol{u}_{i}^{\top}
=\left[\begin{array}{cc}
\boldsymbol{U} & \boldsymbol{U}_{\perp}\end{array}\right]\left[\begin{array}{cc}
\boldsymbol{\Lambda} & \boldsymbol{0}\\
\boldsymbol{0} & \boldsymbol{\Lambda}_{\perp}
\end{array}\right]\left[\begin{array}{c}
\boldsymbol{U}^{\top}\\
\boldsymbol{U}_{\perp}^{\top}
\end{array}\right]
\end{aligned}
M ⋆ M = i = 1 ∑ n λ i ⋆ u i ⋆ u i ⋆ ⊤ = [ U ⋆ U ⊥ ⋆ ] [ Λ ⋆ 0 0 Λ ⊥ ⋆ ] [ U ⋆ ⊤ U ⊥ ⋆ ⊤ ] = i = 1 ∑ n λ i u i u i ⊤ = [ U U ⊥ ] [ Λ 0 0 Λ ⊥ ] [ U ⊤ U ⊥ ⊤ ]
其中分块矩阵按秩r r r 来选择。
U = [ u 1 , ⋯ , u r ] ∈ R n × r , U ⊥ = [ u r + 1 , ⋯ , u n ] ∈ R n × ( n − r ) , Λ = diag ( [ λ 1 , ⋯ , λ r ] ) , Λ ⊥ = diag ( [ λ r + 1 , ⋯ , λ n ] ) . \begin{aligned}
\boldsymbol{U} &=[\boldsymbol{u}_{1},\cdots,\boldsymbol{u}_{r}]\in\mathbb{R}^{n\times r}, \\
\boldsymbol{U}_{\perp} &=[\boldsymbol{u}_{r+1},\cdots,\boldsymbol{u}_{n}]\in\mathbb{R}^{n\times (n-r)},\\
\boldsymbol{\Lambda}&=\text{diag}\big([\lambda_{1},\cdots,\lambda_{r}]\big),\\
\boldsymbol{\Lambda}_{\perp} &=\text{diag}\big([\lambda_{r+1},\cdots,\lambda_{n}]\big).
\end{aligned}
U U ⊥ Λ Λ ⊥ = [ u 1 , ⋯ , u r ] ∈ R n × r , = [ u r + 1 , ⋯ , u n ] ∈ R n × ( n − r ) , = diag ( [ λ 1 , ⋯ , λ r ] ) , = diag ( [ λ r + 1 , ⋯ , λ n ] ) .
矩阵 U ⋆ , U ⊥ ⋆ , Λ ⋆ , Λ ⊥ ⋆ \boldsymbol{U}^{\star}, \boldsymbol{U}_{\perp}^{\star}, \boldsymbol{\Lambda}^{\star}, \boldsymbol{\Lambda}_{\perp}^{\star} U ⋆ , U ⊥ ⋆ , Λ ⋆ , Λ ⊥ ⋆ 的定义类似。
Davis-Kahan sin Θ \sin\Theta sin Θ 定理 对常数α , β ∈ R \alpha,\beta\in \mathbb{R} α , β ∈ R ,eigengapΔ > 0 \Delta>0 Δ > 0 ,假设下面条件成立
e i g e n v a l u e s ( Λ ⋆ ) ⊆ [ α , β ] , e i g e n v a l u e s ( Λ ⊥ ) ⊆ ( − ∞ , α − Δ ] ∪ [ β + Δ , ∞ ) \begin{aligned}
\mathsf{eigenvalues}(\boldsymbol{\Lambda}^{\star}) &\subseteq [\alpha,\beta] ,\\
\mathsf{eigenvalues}(\boldsymbol{\Lambda}_{\perp}) &\subseteq (-\infty, \alpha-\Delta] \cup [\beta+\Delta, \infty)
\end{aligned}
e i g e n v a l u e s ( Λ ⋆ ) e i g e n v a l u e s ( Λ ⊥ ) ⊆ [ α , β ] , ⊆ ( − ∞ , α − Δ ] ∪ [ β + Δ , ∞ )
则
d i s t ( U , U ⋆ ) ≤ 2 ∥ sin Θ ∥ ≤ 2 ∥ E U ⋆ ∥ Δ ≤ 2 ∥ E ∥ Δ ; d i s t F ( U , U ⋆ ) ≤ 2 ∥ sin Θ ∥ F ≤ 2 ∥ E U ⋆ ∥ F Δ ≤ 2 r ∥ E ∥ Δ . \begin{aligned}
\mathsf{dist}\big(\boldsymbol{U},\boldsymbol{U}^{\star}\big) & \leq\sqrt{2}\|\sin\boldsymbol{\Theta}\|\leq\frac{\sqrt{2}\big\|\boldsymbol{E}\boldsymbol{U}^{\star}\big\|}{\Delta}\leq\frac{\sqrt{2}\|\boldsymbol{E}\|}{\Delta};\\
\mathsf{dist}_{\mathrm{F}}\big(\boldsymbol{U},\boldsymbol{U}^{\star}\big) & \leq\sqrt{2}\|\sin\boldsymbol{\Theta}\|_{\mathrm{F}}\leq\frac{\sqrt{2}\big\|\boldsymbol{E}\boldsymbol{U}^{\star}\big\|_{\mathrm{F}}}{\Delta}\leq\frac{\sqrt{2r}\|\boldsymbol{E}\|}{\Delta}.
\end{aligned}
d i s t ( U , U ⋆ ) d i s t F ( U , U ⋆ ) ≤ 2 ∥ sin Θ ∥ ≤ Δ 2 ∥ ∥ ∥ E U ⋆ ∥ ∥ ∥ ≤ Δ 2 ∥ E ∥ ; ≤ 2 ∥ sin Θ ∥ F ≤ Δ 2 ∥ ∥ ∥ E U ⋆ ∥ ∥ ∥ F ≤ Δ 2 r ∥ E ∥ .
当条件替换如下时,定理仍然成立
e i g e n v a l u e s ( Λ ⋆ ) ⊆ ( − ∞ , α − Δ ] ∪ [ β + Δ , ∞ ) ; e i g e n v a l u e s ( Λ ⊥ ) ⊆ [ α , β ] . \begin{aligned}
\mathsf{eigenvalues}(\boldsymbol{\Lambda}^{\star}) &\subseteq (-\infty, \alpha-\Delta] \cup [\beta+\Delta, \infty) ; \\
\mathsf{eigenvalues}(\boldsymbol{\Lambda}_{\perp}) &\subseteq [\alpha,\beta].
\end{aligned}
e i g e n v a l u e s ( Λ ⋆ ) e i g e n v a l u e s ( Λ ⊥ ) ⊆ ( − ∞ , α − Δ ] ∪ [ β + Δ , ∞ ) ; ⊆ [ α , β ] .
在如下意义下,该定理可被推广至酉不变范数∣ ∣ ∣ ⋅ ∣ ∣ ∣ \vert\vert\vert{\cdot}\vert\vert\vert ∣ ∣ ∣ ⋅ ∣ ∣ ∣
∣ ∣ ∣ sin Θ ∣ ∣ ∣ ≤ ∣ ∣ ∣ E U ⋆ ∣ ∣ ∣ Δ \vert\vert\vert\sin\boldsymbol{\Theta}\vert\vert\vert
\leq\frac{\vert\vert\vert\boldsymbol{E}\boldsymbol{U}^\star\vert\vert\vert}{\Delta}
∣ ∣ ∣ sin Θ ∣ ∣ ∣ ≤ Δ ∣ ∣ ∣ E U ⋆ ∣ ∣ ∣
如果∥ E ∥ < ( 1 − 1 / 2 ) ( ∣ λ r ⋆ ∣ − ∣ λ r + 1 ⋆ ∣ ) \|\boldsymbol{E}\|< (1-1/\sqrt{2}) (|\lambda_{r}^{\star}|-|\lambda_{r+1}^{\star}|) ∥ E ∥ < ( 1 − 1 / 2 ) ( ∣ λ r ⋆ ∣ − ∣ λ r + 1 ⋆ ∣ ) ,有
d i s t ( U , U ⋆ ) ≤ 2 ∥ sin Θ ∥ ≤ 2 ∥ E U ⋆ ∥ ∣ λ r ⋆ ∣ − ∣ λ r + 1 ⋆ ∣ ≤ 2 ∥ E ∥ ∣ λ r ⋆ ∣ − ∣ λ r + 1 ⋆ ∣ ; d i s t F ( U , U ⋆ ) ≤ 2 ∥ sin Θ ∥ F ≤ 2 ∥ E U ⋆ ∥ F ∣ λ r ⋆ ∣ − ∣ λ r + 1 ⋆ ∣ ≤ 2 r ∥ E ∥ ∣ λ r ⋆ ∣ − ∣ λ r + 1 ⋆ ∣ . \begin{aligned}
\mathsf{dist}\big(\boldsymbol{U},\boldsymbol{U}^{\star}\big) & \leq\sqrt{2}\|\sin\boldsymbol{\Theta}\|\leq\frac{2 \big\|\boldsymbol{E}\boldsymbol{U}^{\star}\big\|}{|\lambda_{r}^{\star}|-|\lambda_{r+1}^{\star}|}\leq\frac{2\|\boldsymbol{E}\|}{|\lambda_{r}^{\star}|-|\lambda_{r+1}^{\star}|};\\
\mathsf{dist}_{\mathrm{F}}\big(\boldsymbol{U},\boldsymbol{U}^{\star}\big) & \leq\sqrt{2}\|\sin\boldsymbol{\Theta}\|_{\mathrm{F}}\leq\frac{2 \big\|\boldsymbol{E}\boldsymbol{U}^{\star}\big\|_{\mathrm{F}}}{|\lambda_{r}^{\star}|-|\lambda_{r+1}^{\star}|}\leq\frac{2\sqrt{r}\|\boldsymbol{E}\|}{|\lambda_{r}^{\star}|-|\lambda_{r+1}^{\star}|}.
\end{aligned}
d i s t ( U , U ⋆ ) d i s t F ( U , U ⋆ ) ≤ 2 ∥ sin Θ ∥ ≤ ∣ λ r ⋆ ∣ − ∣ λ r + 1 ⋆ ∣ 2 ∥ ∥ ∥ E U ⋆ ∥ ∥ ∥ ≤ ∣ λ r ⋆ ∣ − ∣ λ r + 1 ⋆ ∣ 2 ∥ E ∥ ; ≤ 2 ∥ sin Θ ∥ F ≤ ∣ λ r ⋆ ∣ − ∣ λ r + 1 ⋆ ∣ 2 ∥ ∥ ∥ E U ⋆ ∥ ∥ ∥ F ≤ ∣ λ r ⋆ ∣ − ∣ λ r + 1 ⋆ ∣ 2 r ∥ E ∥ .
奇异子空间的扰动分析 现在不再限制矩阵的对称性和方阵结构,考虑一般的矩阵M = M ⋆ + E ∈ R n 1 × n 2 \boldsymbol{M}=\boldsymbol{M}^{\star}+\boldsymbol{E}\in\mathbb{R}^{n_1 \times n_2} M = M ⋆ + E ∈ R n 1 × n 2 ,不妨设n 1 ≤ n 2 n_1\leq n_2 n 1 ≤ n 2 ,对矩阵进行奇异值分解
M ⋆ = ∑ i = 1 n 1 σ i ⋆ u i ⋆ v i ⋆ ⊤ = [ U ⋆ U ⊥ ⋆ ] [ Σ ⋆ 0 0 0 Σ ⊥ ⋆ 0 ] [ V ⋆ ⊤ V ⊥ ⋆ ⊤ ] M = ∑ i = 1 n 1 σ i u i v i ⊤ = [ U U ⊥ ] [ Σ 0 0 0 Σ ⊥ 0 ] [ V ⊤ V ⊥ ⊤ ] \begin{aligned}
\boldsymbol{M}^{\star} & =\sum_{i=1}^{n_1}\sigma_{i}^{\star}\boldsymbol{u}_{i}^{\star}\boldsymbol{v}_{i}^{\star\top} =\left[\begin{array}{cc}
\boldsymbol{U}^{\star} & \boldsymbol{U}_{\perp}^{\star}\end{array}\right]\left[\begin{array}{ccc}
\boldsymbol{\Sigma}^{\star} & \boldsymbol{0} & \boldsymbol{0}\\
\boldsymbol{0} & \boldsymbol{\Sigma}_{\perp}^{\star} & \boldsymbol{0}
\end{array}\right]\left[\begin{array}{c}
\boldsymbol{V}^{\star\top}\\
\boldsymbol{V}_{\perp}^{\star\top}
\end{array}\right]\\
\boldsymbol{M} & =\sum_{i=1}^{n_1}\sigma_{i}\boldsymbol{u}_{i}\boldsymbol{v}_{i}^{\top} =\left[\begin{array}{cc}
\boldsymbol{U} & \boldsymbol{U}_{\perp}\end{array}\right]\left[\begin{array}{ccc}
\boldsymbol{\Sigma} & \boldsymbol{0} & \boldsymbol{0}\\
\boldsymbol{0} & \boldsymbol{\Sigma}_{\perp} & \boldsymbol{0}
\end{array}\right]\left[\begin{array}{c}
\boldsymbol{V}^{\top}\\
\boldsymbol{V}_{\perp}^{\top}
\end{array}\right]
\end{aligned}
M ⋆ M = i = 1 ∑ n 1 σ i ⋆ u i ⋆ v i ⋆ ⊤ = [ U ⋆ U ⊥ ⋆ ] [ Σ ⋆ 0 0 Σ ⊥ ⋆ 0 0 ] [ V ⋆ ⊤ V ⊥ ⋆ ⊤ ] = i = 1 ∑ n 1 σ i u i v i ⊤ = [ U U ⊥ ] [ Σ 0 0 Σ ⊥ 0 0 ] [ V ⊤ V ⊥ ⊤ ]
其中分块矩阵按秩r r r 来选择。
Σ = d i a g ( [ σ 1 , ⋯ , σ r ] ) , Σ ⊥ = d i a g ( [ σ r + 1 , ⋯ , σ n 1 ] ) , U = [ u 1 , ⋯ , u r ] ∈ R n 1 × r , U ⊥ = [ u r + 1 , ⋯ , u n 1 ] ∈ R n 1 × ( n 1 − r ) , V = [ v 1 , ⋯ , v r ] ∈ R n 2 × r , V ⊥ = [ v r + 1 , ⋯ , v n 2 ] ∈ R n 2 × ( n 2 − r ) . \begin{aligned}
\boldsymbol{\Sigma} &=\mathsf{diag}\big([\sigma_{1},\cdots,\sigma_{r}]\big),\\
\boldsymbol{\Sigma}_{\perp} &=\mathsf{diag}\big([\sigma_{r+1},\cdots,\sigma_{n_1}]\big),\\
\boldsymbol{U} &=[\boldsymbol{u}_{1},\cdots,\boldsymbol{u}_{r}]\in \mathbb{R}^{n_1\times r},\\
\boldsymbol{U}_{\perp} &=[\boldsymbol{u}_{r+1},\cdots,\boldsymbol{u}_{n_1}]\in \mathbb{R}^{n_1\times (n_1-r)},\\
\boldsymbol{V} &=[\boldsymbol{v}_{1},\cdots,\boldsymbol{v}_{r}]\in \mathbb{R}^{n_2\times r},\\
\boldsymbol{V}_{\perp} &=[\boldsymbol{v}_{r+1},\cdots,\boldsymbol{v}_{n_2}] \in \mathbb{R}^{n_2\times (n_2-r)}.
\end{aligned}
Σ Σ ⊥ U U ⊥ V V ⊥ = d i a g ( [ σ 1 , ⋯ , σ r ] ) , = d i a g ( [ σ r + 1 , ⋯ , σ n 1 ] ) , = [ u 1 , ⋯ , u r ] ∈ R n 1 × r , = [ u r + 1 , ⋯ , u n 1 ] ∈ R n 1 × ( n 1 − r ) , = [ v 1 , ⋯ , v r ] ∈ R n 2 × r , = [ v r + 1 , ⋯ , v n 2 ] ∈ R n 2 × ( n 2 − r ) .
矩阵Σ ⋆ , Σ ⊥ ⋆ , U ⋆ , U ⊥ ⋆ , V ⋆ , V ⊥ ⋆ \boldsymbol{\Sigma}^{\star},\boldsymbol{\Sigma}_{\perp}^{\star},\boldsymbol{U}^{\star},\boldsymbol{U}_{\perp}^{\star},\boldsymbol{V}^{\star},\boldsymbol{V}_{\perp}^{\star} Σ ⋆ , Σ ⊥ ⋆ , U ⋆ , U ⊥ ⋆ , V ⋆ , V ⊥ ⋆ 的定义类似。
Wedin's sin Θ \sin\Theta sin Θ 定理 如果∥ E ∥ < σ r ⋆ − σ r + 1 ⋆ \|\boldsymbol{E}\|<\sigma_{r}^{\star}-\sigma_{r+1}^{\star} ∥ E ∥ < σ r ⋆ − σ r + 1 ⋆ ,则
max { d i s t ( U , U ⋆ ) , d i s t ( V , V ⋆ ) } ≤ 2 max { ∥ E ⊤ U ⋆ ∥ , ∥ E V ⋆ ∥ } σ r ⋆ − σ r + 1 ⋆ − ∥ E ∥ ; max { d i s t F ( U , U ⋆ ) , d i s t F ( V , V ⋆ ) } ≤ 2 max { ∥ E ⊤ U ⋆ ∥ F , ∥ E V ⋆ ∥ F } σ r ⋆ − σ r + 1 ⋆ − ∥ E ∥ . \begin{aligned}
\max\left\{ \mathsf{dist}\big(\boldsymbol{U},\boldsymbol{U}^{\star}\big),\mathsf{dist}\big(\boldsymbol{V},\boldsymbol{V}^{\star}\big)\right\}
& \leq\frac{ \sqrt{2} \max\big\{ \|\boldsymbol{E}^{\top}\boldsymbol{U}^{\star}\|,\|\boldsymbol{E}\boldsymbol{V}^{\star}\|\big\} }{\sigma_{r}^{\star}-\sigma_{r+1}^{\star}-\|\boldsymbol{E}\|};\\
\max\left\{ \mathsf{dist}_{\mathrm{F}}\big(\boldsymbol{U},\boldsymbol{U}^{\star}\big),\mathsf{dist}_{\mathrm{F}}\big(\boldsymbol{V},\boldsymbol{V}^{\star}\big)\right\}
& \leq\frac{\sqrt{2}\max\big\{ \|\boldsymbol{E}^{\top}\boldsymbol{U}^{\star}\|_{\mathrm{F}},\|\boldsymbol{E}\boldsymbol{V}^{\star}\|_{\mathrm{F}}\big\} }{\sigma_{r}^{\star}-\sigma_{r+1}^{\star}-\|\boldsymbol{E}\|}.
\end{aligned}
max { d i s t ( U , U ⋆ ) , d i s t ( V , V ⋆ ) } max { d i s t F ( U , U ⋆ ) , d i s t F ( V , V ⋆ ) } ≤ σ r ⋆ − σ r + 1 ⋆ − ∥ E ∥ 2 max { ∥ E ⊤ U ⋆ ∥ , ∥ E V ⋆ ∥ } ; ≤ σ r ⋆ − σ r + 1 ⋆ − ∥ E ∥ 2 max { ∥ E ⊤ U ⋆ ∥ F , ∥ E V ⋆ ∥ F } .
进一步,如果∥ E ∥ < ( 1 − 1 / 2 ) ( σ r ⋆ − σ r + 1 ⋆ ) \|\boldsymbol{E}\|< (1-1/\sqrt{2})(\sigma_{r}^{\star}-\sigma_{r+1}^{\star}) ∥ E ∥ < ( 1 − 1 / 2 ) ( σ r ⋆ − σ r + 1 ⋆ ) ,有
max { d i s t ( U , U ⋆ ) , d i s t ( V , V ⋆ ) } ≤ 2 ∥ E ∥ σ r ⋆ − σ r + 1 ⋆ , max { d i s t F ( U , U ⋆ ) , d i s t F ( V , V ⋆ ) } ≤ 2 r ∥ E ∥ σ r ⋆ − σ r + 1 ⋆ , \begin{aligned}
\max\left\{ \mathsf{dist}\big(\boldsymbol{U},\boldsymbol{U}^{\star}\big),\mathsf{dist}\big(\boldsymbol{V},\boldsymbol{V}^{\star}\big)\right\}
& \leq\frac{ 2 \|\boldsymbol{E}\| }{\sigma_{r}^{\star}-\sigma_{r+1}^{\star}}, \\
\max\left\{ \mathsf{dist}_{\mathrm{F}}\big(\boldsymbol{U},\boldsymbol{U}^{\star}\big),\mathsf{dist}_{\mathrm{F}}\big(\boldsymbol{V},\boldsymbol{V}^{\star}\big)\right\}
& \leq \frac{ 2\sqrt{r}\|\boldsymbol{E}\| }{\sigma_{r}^{\star}-\sigma_{r+1}^{\star}},
\end{aligned}
max { d i s t ( U , U ⋆ ) , d i s t ( V , V ⋆ ) } max { d i s t F ( U , U ⋆ ) , d i s t F ( V , V ⋆ ) } ≤ σ r ⋆ − σ r + 1 ⋆ 2 ∥ E ∥ , ≤ σ r ⋆ − σ r + 1 ⋆ 2 r ∥ E ∥ ,
小结 后面还有一系列扰动理论及其应用,有时间再看,先接触下。
References