Last time we saw that is achieved by where depended on the singular value decomposition of .
This time we will explore singular value decomposition further.
Any real matrix can be written as where have orthonormal columns and is a diagonal matrix with entries where .
is a left singular vector
is a right singular vector
Proof. (Linear Algebra)
So, each is an eigenvector of
and each is an eigenvector of
For any eigenvector of ,
So is the top eigenvector of , and hence, the top right singular vector of .
Consider the objective function .
Intuitively we can see that if is a point on the surface of a sphere, defined by , then at any local optimum , is parallel to . The inverse is also true.
This relationship is represented by .
By the above, we can see that all local optima of are eigenvectors. So, is achieved by an eigenvector of , or a right singular vector of .
Can we reason similarly about ?
Then, are right singular vectors of .
We can view the rows of as points in . Then, is the best-fit line (that goes through the origin), or the best-fit one-dimensional subspace. This is represented by,
is given by the Pythagorean Theorem. So, minimizing requires that we proportionally maximize .
Then, is the best-fit k-dimensional subspace for A,
We will do an induction on .
: Lemma 1
Suppose Theorem 2 is true for (k-1)-dimensional subspace. So, is the best-fit (k-1)-dimensional subspace.
Let be the best k-dimensional subspace.
We know that such that since .
since is best (k-1)-dimensional subspace, by IH
since is the best among vectors orthogonal to
Proof of Theorem 1.
We want to show that .
The LHS and RHS are both matrices. Since and , we get that for every