# Regression

Let’s think about the problem of solving $\min||Ax-b||$ where $A$ is an $m \times n$ matrix. Let’s instead look at minimizing the objective function $f(x) = \|Ax-b\|^2$.

First, we’ll calculate the gradient and set it equal to zero, since we know that must be the global minimum of our objective,
$\nabla f = 2A^{T}(Ax-b) = 0$
$\implies A^{T}Ax = A^{T}b$

Then, if $A^{T}A$ has full rank (or, equivalently, $A$ has full column rank), then $x = (A^{T}A)^{-1}A^{T}b$ is the unique solution.

$\min \|Ax-b\| = dist(b,S)$

The proof that this minimum is unique is left as an exercise.

So, the optimal $\hat{x}$ achieves $A\hat{x} = \hat{b}$.

Lemma 1.
$A(A^{T}A)^{-1}A^{T}$ is the orthogonal projection from $\mathbb{R}^m$ to $S = span\{\vec{a_1} \dots \vec{a_n}\}$.

Proof.
Take any column of $A$, say $\vec{a_i} = A\vec{e_i}$. Then, $A(A^{T}A)^{-1}A^{T}\vec{a_i} = A\vec{e_i} = \vec{a_i}$
Now consider any $y \bot \{\vec{a_i} \dots \vec{a_n}\}$, so $y \cdot \vec{a_i} = 0\ \forall i = 0 \dots n$.
Then, $A(A^{T}A)^{-1}A^{T}y = 0$.
So, for any vector $y \in \mathbb{R}^m$, $y = \sum_{i}\alpha_{i}\vec{a_i} + y'$
$A(A^{T}A)^{-1}A^{T}y = A(A^{T}A)^{-1}A^{T}(A\sum_{i}\alpha_{i}\vec{e_i})$
$= \sum_{i}\alpha_{i}\vec{a_i}$ is the projection of $y$ onto $span\{\vec{a_1} \dots \vec{a_n}\}$. $\square$

What if the columns of $A$ are not linearly independent, and so, $(A^{T}A)^{-1}$ is not well defined?

Let’s get familiar with what’s known as the Singular Value Decomposition,
$A = UDV^T$, with the following definitions,
$r = rank(A)$
$\sigma_1 \geq \dots \geq \sigma_r > 0$
$U = \{\vec{u_1} \dots \vec{u_r}\}$ orthonormal columns in $\mathbb{R}^m$
$D = Diag(\sigma_1 \dots \sigma_r)$
$V = \{\vec{v_1} \dots \vec{v_r}\}$ orthonormal columns in $\mathbb{R}^n$

Theorem 1.
For any $A \in \mathbb{R}^{m \times n}$, the singular values are unique. The singular values are paired $u_{i}, v_{i}$ such that ,
$Av_i = \sigma_{i}\vec{u_i}$
AND
$\vec{u_i}^{T}A = \sigma_{i}\vec{v_i}$

Fact.
If $A$ is nonsingular, then $A^{-1} = VD^{-1}U^{T}$.

Proof.
If $A$ is nonsingular, $m = n = r$. Therefore, $AA^{-1} = UDV^{T}VD^{-1}U^{T} = UU^{T}$. We know $U^{T}U = I \implies U^T = U^{-1} \implies UU^T = I$. $\square$

To continue our search for $\arg\min_{x}\|Ax-b\|$, we will need to be familiar with the pseudo-inverse of $A$, $A^+ = VD^{-1}U^{T}$. Note the following properties,
$U^{T}U = V^{T}V = D^{-1}D = I$ which we will use in the following,
$AA^{+}A = UDV^{T}VD^{-1}U^{T}UDV^{T} = UDV^{T} = A$
$A^{+}AA^{+} = A^{+}$

Theorem 2.
$\arg\min_{x}\|Ax-b\| = A^{+}b$

Proof.
We know that the solution must satisfy $A^{T}Ax = A^{T}b$. Let’s verify that $x = A^{+}b$ satisfies this condition.
$A^{T}Ax = VDU^{T}UDV^{T}VD^{-1}U^{T}b = VDU^{T}b = A^{T}b$
But, is this solution unique?
$Ax = AA^{+}b = UDV^{T}VD^{-1}U^{T}b = UU^{T}b$
Remember, $U$ has orthonormal columns; so, it’s columns form a basis of $span\{\vec{a_1} \dots \vec{a_n}\}$.
$\vec{u_i} \cdot b$ are the coordinates of $b$ along $\vec{u_i}$.
The projection of $b$ onto $span\{\vec{a_1} \dots \vec{a_n}\}$ is exactly $UU^{T}b$.
So, $A^{+}b$ is the unique minimum solution. $\square$