Let’s think about the problem of solving where
is an
matrix. Let’s instead look at minimizing the objective function
.
First, we’ll calculate the gradient and set it equal to zero, since we know that must be the global minimum of our objective,
Then, if has full rank (or, equivalently,
has full column rank), then
is the unique solution.
The proof that this minimum is unique is left as an exercise.
So, the optimal achieves
.
Lemma 1.
is the orthogonal projection from
to
.
Proof.
Take any column of , say
. Then,
Now consider any , so
.
Then, .
So, for any vector ,
is the projection of
onto
.
What if the columns of are not linearly independent, and so,
is not well defined?
Let’s get familiar with what’s known as the Singular Value Decomposition,
, with the following definitions,
orthonormal columns in
orthonormal columns in
Theorem 1.
For any , the singular values are unique. The singular values are paired
such that ,
AND
Fact.
If is nonsingular, then
.
Proof.
If is nonsingular,
. Therefore,
. We know
.
To continue our search for , we will need to be familiar with the pseudo-inverse of
,
. Note the following properties,
which we will use in the following,
Theorem 2.
Proof.
We know that the solution must satisfy . Let’s verify that
satisfies this condition.
But, is this solution unique?
Remember, has orthonormal columns; so, it’s columns form a basis of
.
are the coordinates of
along $\vec{u_i}$.
The projection of onto
is exactly
.
So, is the unique minimum solution.