Last time we saw that is achieved by
where
depended on the singular value decomposition of
.
This time we will explore singular value decomposition further.
Theorem 1.
Any real matrix
can be written as
where
have orthonormal columns and
is a diagonal matrix with entries
where
.
Note.
or equivalently,
is a left singular vector
is a right singular vector
Lemma 1.
Proof. (Linear Algebra)
So, each is an eigenvector of
and each is an eigenvector of
For any eigenvector of
,
define
Then,
and .
So is the top eigenvector of
, and hence, the top right singular vector of
.
Proof. (Calculus)
Consider the objective function .
Then, .
Intuitively we can see that if is a point on the surface of a sphere, defined by
, then at any local optimum
,
is parallel to
. The inverse is also true.
This relationship is represented by .
By the above, we can see that all local optima of are eigenvectors. So,
is achieved by an eigenvector of
, or a right singular vector of
.
Can we reason similarly about ?
Lemma 2.
Then, are right singular vectors of
.
We can view the rows of
as points in
. Then,
is the best-fit line (that goes through the origin), or the best-fit one-dimensional subspace. This is represented by,
Proof.
is given by the Pythagorean Theorem. So, minimizing
requires that we proportionally maximize
.
Theorem 2.
Then, is the best-fit k-dimensional subspace for A,
Proof.
We will do an induction on .
: Lemma 1
Suppose Theorem 2 is true for (k-1)-dimensional subspace. So, is the best-fit (k-1)-dimensional subspace.
Let be the best k-dimensional subspace.
We know that such that
since
.
So,
since
is best (k-1)-dimensional subspace, by IH
since
is the best among vectors orthogonal to
.
Proof of Theorem 1.
We want to show that .
The LHS and RHS are both matrices. Since and
, we get that
for every