# Regression

Let’s think about the problem of solving $\min||Ax-b||$ where $A$ is an $m \times n$ matrix. Let’s instead look at minimizing the objective function $f(x) = \|Ax-b\|^2$.

First, we’ll calculate the gradient and set it equal to zero, since we know that must be the global minimum of our objective, $\nabla f = 2A^{T}(Ax-b) = 0$ $\implies A^{T}Ax = A^{T}b$

Then, if $A^{T}A$ has full rank (or, equivalently, $A$ has full column rank), then $x = (A^{T}A)^{-1}A^{T}b$ is the unique solution.  $\min \|Ax-b\| = dist(b,S)$

The proof that this minimum is unique is left as an exercise.

So, the optimal $\hat{x}$ achieves $A\hat{x} = \hat{b}$.

Lemma 1. $A(A^{T}A)^{-1}A^{T}$ is the orthogonal projection from $\mathbb{R}^m$ to $S = span\{\vec{a_1} \dots \vec{a_n}\}$.

Proof.
Take any column of $A$, say $\vec{a_i} = A\vec{e_i}$. Then, $A(A^{T}A)^{-1}A^{T}\vec{a_i} = A\vec{e_i} = \vec{a_i}$
Now consider any $y \bot \{\vec{a_i} \dots \vec{a_n}\}$, so $y \cdot \vec{a_i} = 0\ \forall i = 0 \dots n$.
Then, $A(A^{T}A)^{-1}A^{T}y = 0$.
So, for any vector $y \in \mathbb{R}^m$, $y = \sum_{i}\alpha_{i}\vec{a_i} + y'$ $A(A^{T}A)^{-1}A^{T}y = A(A^{T}A)^{-1}A^{T}(A\sum_{i}\alpha_{i}\vec{e_i})$ $= \sum_{i}\alpha_{i}\vec{a_i}$ is the projection of $y$ onto $span\{\vec{a_1} \dots \vec{a_n}\}$. $\square$

What if the columns of $A$ are not linearly independent, and so, $(A^{T}A)^{-1}$ is not well defined?

Let’s get familiar with what’s known as the Singular Value Decomposition, $A = UDV^T$, with the following definitions, $r = rank(A)$ $\sigma_1 \geq \dots \geq \sigma_r > 0$ $U = \{\vec{u_1} \dots \vec{u_r}\}$ orthonormal columns in $\mathbb{R}^m$ $D = Diag(\sigma_1 \dots \sigma_r)$ $V = \{\vec{v_1} \dots \vec{v_r}\}$ orthonormal columns in $\mathbb{R}^n$

Theorem 1.
For any $A \in \mathbb{R}^{m \times n}$, the singular values are unique. The singular values are paired $u_{i}, v_{i}$ such that , $Av_i = \sigma_{i}\vec{u_i}$
AND $\vec{u_i}^{T}A = \sigma_{i}\vec{v_i}$

Fact.
If $A$ is nonsingular, then $A^{-1} = VD^{-1}U^{T}$.

Proof.
If $A$ is nonsingular, $m = n = r$. Therefore, $AA^{-1} = UDV^{T}VD^{-1}U^{T} = UU^{T}$. We know $U^{T}U = I \implies U^T = U^{-1} \implies UU^T = I$. $\square$

To continue our search for $\arg\min_{x}\|Ax-b\|$, we will need to be familiar with the pseudo-inverse of $A$, $A^+ = VD^{-1}U^{T}$. Note the following properties, $U^{T}U = V^{T}V = D^{-1}D = I$ which we will use in the following, $AA^{+}A = UDV^{T}VD^{-1}U^{T}UDV^{T} = UDV^{T} = A$ $A^{+}AA^{+} = A^{+}$

Theorem 2. $\arg\min_{x}\|Ax-b\| = A^{+}b$

Proof.
We know that the solution must satisfy $A^{T}Ax = A^{T}b$. Let’s verify that $x = A^{+}b$ satisfies this condition. $A^{T}Ax = VDU^{T}UDV^{T}VD^{-1}U^{T}b = VDU^{T}b = A^{T}b$
But, is this solution unique? $Ax = AA^{+}b = UDV^{T}VD^{-1}U^{T}b = UU^{T}b$
Remember, $U$ has orthonormal columns; so, it’s columns form a basis of $span\{\vec{a_1} \dots \vec{a_n}\}$. $\vec{u_i} \cdot b$ are the coordinates of $b$ along $\vec{u_i}$.
The projection of $b$ onto $span\{\vec{a_1} \dots \vec{a_n}\}$ is exactly $UU^{T}b$.
So, $A^{+}b$ is the unique minimum solution. $\square$