# Power Iteration

Recall the singular value decomposition (SVD) of a matrix $A = UDV^{T} = \sum_{i=1}^{r}\sigma_i \boldsymbol{u^{(i)}}\boldsymbol{v^{(i)^{T}}}$.

We showed that $V_{k} = span\{\boldsymbol{v^{(1)}}, ..., \boldsymbol{v^{(k)}}\}$ is the best-fit subspace for $A$, so

$V_{k} = \arg\min_{V: dim(V) \leq k}\sum_{i=1}^{m}d(A_{(i)}, V)^2$

We have that $A_{(i)}V_{k}V_{k}^{T}$ is the projection of $A_{(i)}$ to $V_{k}$, and so $AV_{k}V_{k}^{T}$ is the projection of $A$ to $V_{k}$.

Let $A_k$ be defined as $AV_{k}V_{k}^{T} = \sum_{i=1}^{k}\sigma_i \boldsymbol{u^{(i)}}\boldsymbol{v^{(i)^{T}}}$.

Theorem 1.
$\|A-A_{k}\|_{F} = \min_{B: rank(B) \leq k}\|A-B\|_{F}$
AND
$\|A-A_{k}\|_{2} = \max_{\|\boldsymbol{v}\| = 1}\|(A-A_{k}) \cdot \boldsymbol{v}\| = \sigma_{k+1} = \min_{B: rank(B) \leq k}\|A-B\|_{2}$

Proof.
We proved part (1) last lecture. We will prove part (2) here. Note that

$A-A_{k} = \sum_{i=k+1}^{r}\boldsymbol{u^{(i)}}\boldsymbol{v^{(i)^{T}}}$
and
$\|A-A_{k}\|_{2} = \sigma_{k+1}$.

We need to show that for any $B: rank(B)=k$,
$B=WZ^{T}$ where $W, Z$ have $\leq k$ columns and $\|A-B\|_{2} \geq \sigma_{k+1}$.

Since we know that $Z$ has only $k$ columns, $\exists \boldsymbol{v} \in span\{\boldsymbol{v^{(1)}},...,\boldsymbol{v^{(k+1)}}\}$ such that
$Z^{T}\boldsymbol{v} = 0$,
$\boldsymbol{v} = \sum_{i=1}^{k+1}\alpha_{i}\boldsymbol{v^{(i)}}$, and
$\|\boldsymbol{v}\|=1$.

$(A-B) \cdot \boldsymbol{v} = A \cdot \boldsymbol{v} = \sum_{i=1}^{k+1}\sigma_{i}\alpha_{i}\boldsymbol{u^{(i)}}$
$\|(A-B) \cdot \boldsymbol{v}\|^{2} = \sum_{i=1}^{k+1}\sigma_{i}^{2}\alpha_{i}^{2} \geq \sigma_{k+1}^{2}$. $\square$

We’ve seen that the SVD is useful in more than one way. Now we will look at how to compute it, a central question in numerical analysis. A simple method, power iteration, is presented here:

Set $\boldsymbol{x} \leftarrow e_i$ for $i=1,2,\ldots,n$
Repeat $t$ times:{
$\boldsymbol{x} \leftarrow A^{T}A\boldsymbol{x}$
$\boldsymbol{x} \leftarrow \frac{\boldsymbol{x}}{\|\boldsymbol{x}\|}$
}

Theorem 2.
Assume $(x\cdot v^{(1)})^2 \ge \frac{1}{n}$. After $t = \frac{\log n}{2\varepsilon}$ iterations, $\boldsymbol{x}$ satisfies $\|A\boldsymbol{x}\| \geq (1-\varepsilon) \sigma_{1}$.

Proof.
$\boldsymbol{x} = \sum_{i=1}^{r}\alpha_{i}\boldsymbol{v^{(i)}}$
$A^{T}A = \sum_{i=1}^{r}\sigma_{i}^{2}\boldsymbol{v^{(i)}}\boldsymbol{v^{(i)^{T}}}$
$(A^{T}A)^l = \sum_{i=1}^{r}\sigma_{i}^{2l}\boldsymbol{v^{(i)}}\boldsymbol{v^{(i)^{T}}}$

$\frac{\boldsymbol{x}^{T}(A^{T}A)^{l}A^{T}A(A^{T}A)^{l}\boldsymbol{x}}{\boldsymbol{x}^{T}(A^{T}A)^{l}(A^{T}A)^{l}\boldsymbol{x}} = \frac{\sum_{i=1}^{r}\sigma_{i}^{4l+2}\alpha_{i}^{2}}{\sum_{i=1}^{r}\sigma_{i}^{4l}\alpha_{i}^{2}}$

Since $\sigma_{1} \geq \sigma_{2} \geq ...$, the weight on higher $\sigma_{i}$ is increasing.

$\sum \sigma_{i}^{4l}\alpha_{i}^{2} = \sum \sigma_{i}^{4l}(\alpha_{i}^{2})^{\frac{2l}{2l+1}}(\alpha_{i}^{2})^{\frac{1}{2l+1}}$
$\leq (\sum \sigma_{i}^{4l+2}\alpha_{i}^{2})^{\frac{2l}{2l+1}}(\sum \alpha_{i}^{2})^{\frac{1}{2l+1}}$
$= (\sum \sigma_{i}^{4l+2}\alpha_{i}^{2})^{\frac{2l}{2l+1}}$

$\therefore \frac{\boldsymbol{x^{l^{T}}}A^{T}A\boldsymbol{x^{l}}}{\boldsymbol{x^{l^{T}}}\boldsymbol{x^{l}}} \geq (\sum \sigma_{i}^{4l+2}\alpha_{i}^{2})^{\frac{1}{2l+1}} \geq \alpha_{1}^{\frac{2}{2l+1}}\sigma_{1}^{2}$.

Therefore, after $l$ iterations,
$\frac{\|A\boldsymbol{x^{l}}\|^{2}}{\|\boldsymbol{x^{l}}\|^{2}} \geq (\frac{1}{n})^{\frac{1}{2l+1}}\sigma_{1}^{2}$
$= e^{-\frac{\ln n}{2l+1}}\sigma_{1}^{2}$

Setting $2l + 1 = \frac{\ln n}{\varepsilon} \geq e^{-\varepsilon}\sigma_{1}^{2} \geq (1-\varepsilon)\sigma_{1}^{2}.$ $\square$

We note that the choice of the initial $x$ in the algorithm, one of the axis vectors $e_i$, must satisfy the assumption that $(e_i \cdot v^{(1)})^2 \ge \frac{1}{n}$. Instead of trying all $n$ basis vectors, we could also start with a randomÂ unit vector.

Claim.
$\mathbb{E}(\alpha_{1}^{2}) = \frac{1}{n}$

Proof.
$\sum x_{i}^{2} = 1$ symmetric.

So, $\mathbb{E}(x_{1}^{2}) = \mathbb{E}(x_{2}^{2}) = ... = \mathbb{E}(x_{n}^{2})$.

$\implies n\mathbb{E}(x_{1}^{2}) = 1 \implies \mathbb{E}(x_{1}^{2}) = \alpha_{1}^{2} = \frac{1}{n}$. $\square$

Moreover, $\alpha_1^2 \ge \frac{1}{n}$ with some constant probability. So a few runs of the algorithm will suffice with high probability.