Last time we saw that is achieved by where depended on the singular value decomposition of .

This time we will explore singular value decomposition further.

*Theorem 1.*

Any real matrix can be written as where have orthonormal columns and is a diagonal matrix with entries where .

*Note.*

or equivalently,

is a left singular vector

is a right singular vector

*Lemma 1.*

*Proof. (Linear Algebra)*

So, each is an eigenvector of

and each is an eigenvector of

For any eigenvector of ,

define

Then,

and .

So is the top eigenvector of , and hence, the top right singular vector of .

*Proof. (Calculus)*

Consider the objective function .

Then, .

Intuitively we can see that if is a point on the surface of a sphere, defined by , then at any local optimum , is parallel to . The inverse is also true.

This relationship is represented by .

By the above, we can see that all local optima of are eigenvectors. So, is achieved by an eigenvector of , or a right singular vector of .

Can we reason similarly about ?

*Lemma 2.*

Then, are right singular vectors of .

We can view the rows of as points in . Then, is the best-fit line (that goes through the origin), or the best-fit one-dimensional subspace. This is represented by,

*Proof.*

is given by the Pythagorean Theorem. So, minimizing requires that we proportionally maximize .

*Theorem 2.*

Then, is the best-fit k-dimensional subspace for A,

*Proof.*

We will do an induction on .

: Lemma 1

Suppose Theorem 2 is true for (k-1)-dimensional subspace. So, is the best-fit (k-1)-dimensional subspace.

Let be the best k-dimensional subspace.

We know that such that since .

So,

since is best (k-1)-dimensional subspace, by IH

since is the best among vectors orthogonal to

.

*Proof of Theorem 1.*

We want to show that .

The LHS and RHS are both matrices. Since and , we get that for every