Let’s think about the problem of solving where is an matrix. Let’s instead look at minimizing the objective function .

First, we’ll calculate the gradient and set it equal to zero, since we know that must be the global minimum of our objective,

Then, if has full rank (or, equivalently, has full column rank), then is the unique solution.

The proof that this minimum is unique is left as an exercise.

So, the optimal achieves .

*Lemma 1.*

is the orthogonal projection from to .

*Proof.*

Take any column of , say . Then,

Now consider any , so .

Then, .

So, for any vector ,

is the projection of onto .

What if the columns of are not linearly independent, and so, is not well defined?

Let’s get familiar with what’s known as the Singular Value Decomposition,

, with the following definitions,

orthonormal columns in

orthonormal columns in

*Theorem 1.*

For any , the singular values are unique. The singular values are paired such that ,

AND

*Fact.*

If is nonsingular, then .

*Proof.*

If is nonsingular, . Therefore, . We know .

To continue our search for , we will need to be familiar with the pseudo-inverse of , . Note the following properties,

which we will use in the following,

*Theorem 2.*

*Proof.*

We know that the solution must satisfy . Let’s verify that satisfies this condition.

But, is this solution unique?

Remember, has orthonormal columns; so, it’s columns form a basis of .

are the coordinates of along $\vec{u_i}$.

The projection of onto is exactly .

So, is the unique minimum solution.