Interior Point Method

We study another class of optimization algorithms: interior point methods.  These require a more specific structure than general convex optimization, but can still be used beyond linear programming.  These algorithms also work by following an interior path, determined by successive minima of a modified objective function. We will study in complete detail, one of the simplest variants, the primal-dual interior-point method.

Consider the primal program (P)

\min  \boldsymbol{c^T}\boldsymbol{x}
s.t.  A\boldsymbol{x} = \boldsymbol{b}, x \geq 0

with corresponding dual (D)

\max  \boldsymbol{b^T}\boldsymbol{y}
s.t.  A^T\boldsymbol{y} + \boldsymbol{s}= \boldsymbol{c}, s \geq 0

Note that (P) \geq (D).  Then, 0 \leq \boldsymbol{c}^T\boldsymbol{x} - \boldsymbol{b}^T\boldsymbol{y} = \boldsymbol{c}^T\boldsymbol{x} - \boldsymbol{x}^TA^T\boldsymbol{y} = \boldsymbol{c}^T\boldsymbol{x} - \boldsymbol{x}^T(\boldsymbol{c}-\boldsymbol{s}) = \boldsymbol{x}^T\boldsymbol{s} .  If the complimentary slackness conditions hold, this is achieved with equality, indicating that either each primal variable is 0, or it’s corresponding dual constraint is tight (zero slack).

Modify the primal objective to P(t) = \min \boldsymbol{c^T}\boldsymbol{x} - t \sum_{j=1}^n \ln(x_j) , with the same constraints.  This modification effectively punishes low x_j very severely, with x_j=0 having value of \infty , and is relatively easy to optimize.  As t approaches 0, this modification becomes the original, so we can solve the original by successively reducing t .

We have the constraints X\boldsymbol{s} = t\boldsymbol{1}, A\boldsymbol{x}=\boldsymbol{b}, \boldsymbol{x}\geq 0, A^T\boldsymbol{y}+\boldsymbol{s}=\boldsymbol{c} , where X=diag(\boldsymbol{x}) .  We can start with a solution for some t such that \boldsymbol{x} is feasible in (P), \boldsymbol{y}/\boldsymbol{s} are feasible in (D), and \boldsymbol{x}^T\boldsymbol{s}=tn .  Our goal is to iteratively find new solutions with lower t until eventually we get t to 0.  In doing this, we can’t quite maintain X\boldsymbol{s}=t\boldsymbol{1} , but we can keep the values close, with an \epsilon -approximate solution \|X\boldsymbol{s}-t\boldsymbol{1}\| \leq \epsilon t .

Now, suppose we have a valid x,y,z,t , and we want to make t smaller.  We fix some \alpha < 1 and let \hat{t} = \alpha t .  Then, solve the following system of linear equations:

A \Delta x = 0

A^T \Delta y + \Delta s = 0

S \Delta x + X \Delta s =  t 1 - Xs

We then update x\leftarrow x + \Delta x y \leftarrow  y + \Delta y , s \leftarrow s + \Delta s and t \leftarrow \alpha t.

Lemma 1.
Let $\varepsilon < 1/2$. If (x, y , s) is a solution of P(t) with \| Xs - t \| \leq \varepsilon t, then (\hat{x}, \hat{y}, \hat{s}) = (x, y, s) + (\Delta x, \Delta y, \Delta s) is a \frac{1+\varepsilon}{(1-\varepsilon)^2}{\varepsilon}^2-approximate solution with t.

Lemma 2.
Let \alpha \geq 1 - \frac{c}{\sqrt{n}}, latex (x, y, s)$ is a \frac{1}{5}-solution, then (\hat{x}, \hat{y}, \hat{s}) is an \varepsilon = \frac{1}{5}-approximation with \alpha t.

Theorem 1.
This interior point method reaches a solution with duality gap at most \delta in O(\sqrt{n} \log \frac{x_0^T s_0}{n \delta}) iterations, where each iteration is the solution of an (m+n) \times (m+n) linear system.

Proof. (Theorem 1)
Start at t_0 = 5 \frac{x_0^T s_0}{n}, \varepsilon = \frac{1}{5}. After each iteration, t_{i+1} \leq \alpha t_i , \varepsilon = \frac{1}{5}. To reach t = 5 \delta, we need {\alpha}^k \leq \frac{\delta n}{x_0^T s_0} \implies k \geq \log_\alpha \frac{x_0^T s_0}{n \delta} \geq c\sqrt{n} \log \frac{x_0^T s_0}{n \delta} suffices. \square

Proof. (Lemma 1)
Let (X, Y, s) satisfy
Ax = b
A^T y + s = c
\| \bar{X}s - t1 \| \leq \varepsilon t
x \geq 0

and let (\Delta x, \Delta y, \Delta s) satisfy
A \Delta x = 0
A^T \Delta y + \Delta s = 0
S \Delta x + X \Delta s = t1 - Xs

Note the following:
(i) (\Delta x)^T \Delta s = (\Delta x)^T (-A^T \Delta y) = -(A^T \Delta x)^T \Delta y = 0
(ii) (1-\varepsilon)t \leq x_j s_j \leq (1+\varepsilon)t \implies x_j \geq \frac{(1-\varepsilon)t}{s_j} \implies s_j \geq \frac{(1-\varepsilon)t}{x_j}

So (1-\varepsilon)t {\| X^{-1}\Delta x \|}^2 = (1-\varepsilon)t(\Delta x)^T X^{-1} X^{-1} \Delta x
= (\Delta x)^T X^{-1} \begin{pmatrix} \ddots & 0 & 0 \\ 0 & \frac{(1-\varepsilon)t}{x_j} & 0 \\ 0 & 0 & \ddots \end{pmatrix} \Delta x
\leq \Delta x^T X^{-1} S \Delta x
= \Delta x^T X^{-1} (t1 - Xs - X \Delta s)
= \Delta x^T X^{-1} (t1 - Xs)
\leq \| \Delta x^T X^{-1} \| \| t1 - Xs \|
\leq \varepsilon t \| \Delta x^T X^{-1} \|
\implies \| X^{-1} \Delta x \| \leq \frac{\varepsilon}{1-\varepsilon} < 1

Similarly, \| S^{-1} \Delta s \| \leq \frac{\varepsilon}{1-\varepsilon}.

\hat{s} = s + \Delta s = S(1 + S^{-1}\Delta s) \geq 0
\hat{x} \geq 0
(\hat{x}, \hat{y}, \hat{s}) is feasible.

\hat{x}_j \hat{s}_j = (x_j + \Delta x_j)(s_j + \Delta s_j)
= x_j s_j + x_j \Delta s_j + s_j \Delta x_j + \Delta x_j \Delta s_j
= x_j s_j + t - x_j s_j + \Delta x_j \Delta s_j
= t + \Delta x_j \Delta s_j

\| t1 - \hat{x}\hat{s} \| \leq \sum_j | t - \hat{x}_j \hat{s}_j |
= \sum_j |\Delta x_j \Delta s_j|
= \sum_j \frac{|\Delta x_j|}{x_j} \frac{|\Delta s_j|}{s_j}x_j s_j
\leq \sum_j \frac{|\Delta x_j|}{x_j} \frac{|\Delta s_j|}{s_j} (1+\varepsilon)t
\leq (1+\varepsilon)t (\sum_j (\frac{\Delta x_j}{x_j})^2)^{\frac{1}{2}}(\sum_j (\frac{\Delta s_j}{s_j})^2)^{\frac{1}{2}}
= (1+\varepsilon)t \| X^{-1}\Delta x \| \| S^{-1}\Delta s \|
\leq \frac{1+\varepsilon}{(1-\varepsilon)^2}{\varepsilon}^2 t \square

Proof. (Lemma 2)
\| \hat{X}\hat{s} - \hat{t}\boldsymbol{1} \| = \| \hat{X}\hat{s} - t1 + (1-\alpha}) t1 \|
\leq \| \hat{X}\hat{s} - t1 \| + (1-\alpha) t \| 1 \|
\leq {\varepsilon}^2 \frac{1+\varepsilon}{(1-\varepsilon)^2} t + (1-\alpha)t\sqrt{n}

We need {\varepsilon}^2 \frac{1+\varepsilon}{(1-\varepsilon)^2} t + (1-\alpha)t\sqrt{n} \leq \frac{1}{5} \hat{t} = \frac{\alpha}{5} t

Let \alpha = 1- \frac{c}{\sqrt{n}}. Then, we have \frac{1+\varepsilon}{(1-\varepsilon)^2} {\varepsilon}^2 + c \leq \frac{1-\frac{c}{\sqrt{n}}}{5}.

Plugging in \varepsilon = \frac{1}{5}, we get \frac{\frac{6}{5}}{\frac{16}{25}} \frac{1}{25} + c = \frac{3}{40} + c \leq \frac{1}{5} - \frac{c}{5\sqrt{n}}. So solving for c, we get c \leq \frac{\frac{1}{8}}{1 + \frac{1}{5\sqrt{n}}} \leq \frac{1}{16} suffices. \square


Ellipsoid Method

First, let’s look at using the ellipsoid method to solve LP feasibility (recall that we’ve shown that finding the optimal solution to an LP reduces to solving feasibility). Given constraints Ax \leq b, we want to output x : Ax \leq b or indicate that the set of constraints is infeasible.

The algorithm maintains an ellipsoid, which we can describe by

E(z, A) = \{ x : (x-z)^T A^{-1} (x-z) \leq 1\}

Example 1.
A = I, z = 0 \implies E(0, I) = \{x : \|x\|^2 \leq 1\}

Example 2.
A = \begin{pmatrix} a & 0 \\ 0 & b \end{pmatrix}, z=(1, 1) \implies E(z, A) = \frac{(x_1 - 1)^2}{a^2} + \frac{(x_2 - 1)^2}{b^2} \leq 1


  • Initialize E_0 = (0, R^2 I) for sufficiently large R
  • Repeat:
      – check if z is feasible
      – if not, violated constraint a \cdot x \leq a \cdot z
      – compute minimum volume ellipsoid E_{t+1} containing E_t \cap \{x : a \cdot x \leq a \cdot z\}

Lemma 1.
bits(R) \leq poly(n, bits(A), bits(b))

Lemma 2.
Vol(E_{t+1}) \leq e^{\frac{-1}{2n+2}} Vol(E_t)

Lemma 3.
Given E(z, A), the minimum value ellipsoid containing E(z, A) \cap \{x : a \cdot x \leq a \cdot z\} is E'(z', A') where

z' = z - \frac{1}{n+1} \frac{Aa}{\sqrt{a^T Aa}}
A' = \frac{n^2}{n^2 - 1} (A - \frac{2}{n+1} \frac{Aaa^T A^T}{a^T Aa})

For example, for A = I, z = 0, a = e_1 = (1, 0, ..., 0)^T this is

z' = -\frac{1}{n+1} (1, 0, ..., 0)^T
A' = \frac{n^2}{n^2 - 1} (I - \frac{2}{n+1} \begin{pmatrix} 1 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{pmatrix}) = \frac{n^2}{n^2 - 1} \begin{pmatrix} \frac{n-1}{n+1} & 0 & 0 \\ 0 & \ddots & \vdots \\ 0 & \dots & \ddots \end{pmatrix}

Theorem 1.
For a linear program, the ellipsoid algorithm either finds a feasible x or we can go to a lower-dimensional subspace after a polynomial number of iterations.

Proof. (Lemmas 2 and 3)
Consider the equality E(z, A) = z + A^{\frac{1}{2}}E(0, I). It suffices to understand the minimum half-ball and smallest ellipsoid containing it. Without loss of generality, assume a = (1, 0, ..., 0)^T. Then, by symmetry, E' center moves only along e_1 and it has axes lengths (1-z), r, ..., r for some r. By minimality, (0, 1, 0, ..., 0) is on the boundary of E'.

((0, 1) - (z, 0)) \begin{pmatrix} \frac{1}{(1-z)^2} & 0 \\ 0 & \frac{1}{r^2} \end{pmatrix} (\begin{pmatrix}0 \\ 1 \end{pmatrix} - \begin{pmatrix} z \\ 0 \end{pmatrix}) = 1
\frac{z^2}{(1-z)^2} + \frac{1}{r^2} = 1 \implies r = \frac{1-z}{\sqrt{1-2z}}

The volume of E' = Vol(B_n) \cdot \prod_i axislength_i = Vol(B_n) \cdot (1-z) \cdot r^{n-1} = Vol(B_n) \cdot \frac{(1-z)^n}{(1-2z)^{\frac{n-1}{2}}}

Minimizing f(z) = \frac{(1-z)^n}{(1-2z)^{\frac{n-1}{2}}} we get z = \frac{1}{n+1} and therefore the axes lengths are \frac{n}{n+1}, \frac{\frac{n}{n+1}}{\sqrt{\frac{n-1}{n+1}}} = \frac{n}{\sqrt{n^2 - 1}}, ...

and the volume is \frac{n^n}{(n+1)(n^2 - 1)^{\frac{n-1}{2}}} = (\frac{n}{n+1})(\frac{n^2}{n^2 - 1})^{\frac{n-1}{2}} = (1 - \frac{1}{n+1})(1 + \frac{1}{n^2 - 1})^{\frac{n-1}{2}}. Using the bound 1+x \leq e^x, we get volume \leq e^{-\frac{1}{n+1}} \cdot e^{\frac{1}{2(n+1)}} = e^{-\frac{1}{2n+2}}

To prove the theorem, we observe in addition that if P is feasible and bounded, it contains n+1 linearly independent vertices, x^0, ..., x^n. The simplex formed by them has volume

V = \frac{1}{n!} \begin{vmatrix} 1 & 1 & \dots & 1 \\ x^0 & x^1 & \dots & x^n \\ \vdots & \vdots & \vdots & \vdots \end{vmatrix}

Each x_i^j = \frac{\det(C_i^j)}{\det(C^j)} where x^j is the solution of C^j x = d^j for some subset of inequalities.

V = \frac{1}{n!} \begin{vmatrix} 1 & \dots & 1 \\ & \vdots & \\ \dots & \frac{\det(C_i^j)}{\det(C^j)} & \dots \end{vmatrix} = \frac{1}{n!} \det \begin{pmatrix} \det(C^0) & \dots & \det(C^n) \\ \det(C^0_1) & \dots \\ \vdots \\ & & \det(C^n_n) \end{pmatrix} \begin{pmatrix} \frac{1}{\det(C^0)} & & \\ & \ddots & \\ & & \frac{1}{\det(C^n)} \end{pmatrix} \geq \frac{1}{n!} \prod_j \frac{1}{\det(C^j)} \geq 2^{-poly(n, bits(A, b))} \square

Proving Strong Duality with Simplex

Consider the primal program P

\max c^T x
Ax \leq b
x \geq 0

and the dual D

\min b^T y
A^T y \geq c
y \geq 0

where \max c^T x \leq \min b^T y.

We will once again prove strong duality, but this time using the simplex algorithm.

Let z^* = c^T x^* be an optimal, bounded primal solution. Add slack variables x_{n+1} ... x_m such that

\max z = \sum_{i=1}^{n} c_i x_i
\sum_{j=1}^{n} a_{ij}x_{j} + x_{n+i} = b_i
x_1 ... x_{m+n} \geq 0

So, at the end of our simplex algorithm, we reach

z = z^* + \sum_{j=1}^{n+m} \bar{c}_j x_j
\bar{c}_i \leq 0

Let y^*_i = -\bar{c}_{n+i}

We will show that

(1) z^* = b^T y^*
(2) A^T y^* \geq c


We will first prove (1).

z = z^* + \sum_{j=1}^n \bar{c}_j x_j - \sum_{i=1}^m y^*_i x_{n+i}
= z^* + \sum_{j=1}^n \bar{c}_j x_j - \sum_{i=1}^m y^*_i b_i + \sum_{i=1}^m y^*_i (\sum_j a_{ij}x_j)
= z^* - \sum_{i=1}^m y^*_i b_i + \sum_{j=1}^n (\bar{c}_j + \sum_{i=1}^m a_{ij}y^*_i)x_j

holds for all x. So x = 0 \implies z = 0 \implies z^* = b^T y^*.

Now we show (2). We are left with

z = \sum_{j=1}^n (\bar{c}_j + \sum_{i=1}^m a_{ij}y^*_i)x_j = \sum_{j=1}^n c_j x_j

which holds for all x. So,

\bar{c}_j + \sum_{i=1}^m a_{ij}y^*_i = c_j
\bar{c}_j \leq 0

\implies A^T y^* \geq c

At the optimal solution we get

z = z^* + \sum_{j=1}^n \bar{c}_j x^*_j - \sum_{i=1}^m y^*_i x_{n+i}^*

with the second and third terms equal to 0,

\implies x^*_j > 0 \implies \bar{c}_j = 0
\implies y^*_i > 0 \implies x^*_{n+i} = b_i - \sum_{j=1}^n a_{ij}x^*_j = 0

This is what is referred to as “complementary slackness”.

Strong Duality

Last time we showed that \max \{\boldsymbol{c^T}\boldsymbol{x} : A\boldsymbol{x}\leq\boldsymbol{b}, x\geq 0\} \leq \min \{\boldsymbol{b^T}\boldsymbol{y} : A^{T}\boldsymbol{y}\geq\boldsymbol{c}, y\geq 0\}, also known as the Weak Duality Theorem.

Let’s now take a look at other standard forms for LPs. Last time we worked mostly with

\max \boldsymbol{c^{T}x}
A\boldsymbol{x} \leq \boldsymbol{b}
\boldsymbol{x} \geq 0

We have already seen that we can switch the objective between minimizing a function or maximizing a function by taking the negative of the original objective. Now we want to show that the following two forms are equally as expressive,

\max \boldsymbol{c^{T}x}
A\boldsymbol{x} = \boldsymbol{b}
\boldsymbol{x} \geq 0

\max \boldsymbol{c^{T}x}
A\boldsymbol{x} \leq \boldsymbol{b}

We can transform an inequality constraint of the form \boldsymbol{a}\cdot\boldsymbol{x} \leq b into an equality constraint by adding a nonnegative slack variable such that \boldsymbol{a} \cdot \boldsymbol{x} + s = b,\ s \geq 0. An equality constraint \boldsymbol{a}\cdot\boldsymbol{x} = b can easily be transformed into inequality constraints, \boldsymbol{a}\cdot\boldsymbol{x} \leq b and -\boldsymbol{a}\cdot\boldsymbol{x} \leq -b.

The nonnegativity constraints can be imposed as follows. For each original variable x, we have two variables \boldsymbol{x_1, x_2} such that \boldsymbol{x} = \boldsymbol{x_1} - \boldsymbol{x_2}. So constraint \boldsymbol{a}\cdot\boldsymbol{x} \leq b would be replaced with \boldsymbol{a}\cdot(\boldsymbol{x_1}-\boldsymbol{x_2}) \leq b and nonnegativity constraints \boldsymbol{x_1, x_2} \geq 0.

Let’s now look at prelationships between a primal LP that we will call (P) and its dual (D).

Theorem 1. (Strong Duality)
Given a primal LP (P) and its dual (D), one of the following four possibilities must hold:

1. (P) Infeasible; (D) Unbounded
2. (P) Unbounded; (D) Infeasible
3. (P) Infeasible; (D) Infeasible
4. (P) Bounded & Feasible; (D) Bounded & Feasible with equal optimal values

We can rule out the following three possibilities using Weak Duality,

5. (P) Unbounded; (D) Unbounded
6. (P) Unbounded; (D) Bounded & Feasible
7. (P) Bounded & Feasible; (D) Unbounded

Let’s turn our attention to ruling out

8. (P) Bounded & Feasible; (D) Infeasible

and by duality,

9. (P) Infeasible; (D) Bounded & Feasible

We will need the following lemma

Lemma 1. (Farkas’ Lemma)
For any A \in \mathbb{R}^{m \times n},\ \boldsymbol{b} \in \mathbb{R}^m, exactly one of the following holds:

(1) \exists \boldsymbol{x} : \boldsymbol{x}\geq 0,\ A\boldsymbol{x} = \boldsymbol{b}
(2) \exists \boldsymbol{y} : A^{T}\boldsymbol{y} \geq 0,\ \boldsymbol{b^T}\boldsymbol{y} < 0

(1) says that \boldsymbol{b} \in C = cone\{\boldsymbol{a^{(1)}} ... \boldsymbol{a^{(n)}}\}. If \boldsymbol{b} is not, then there exists a hyperplane that separates \boldsymbol{b} from C. We can take this hyperplane to be passing through \boldsymbol{0}, with normal \boldsymbol{y}. (2) then follows. \square

We now use Farkas’ Lemma to prove Strong Duality.

Let primal (P) be the following feasible and bounded LP,

\max \boldsymbol{c^{T}x}
A\boldsymbol{x} = \boldsymbol{b}
\boldsymbol{x} \geq 0

and the dual (D) is

\min \boldsymbol{b^{T}y}
A^{T}\boldsymbol{y} \geq \boldsymbol{c}

and z > OPT(P). Now consider the matrix \begin{pmatrix} A \\ \boldsymbol{c^T} \end{pmatrix} and vector \begin{pmatrix} \boldsymbol{b} \\ z \end{pmatrix}.

Apply Farkas’ Lemma, so either

(1) \exists \boldsymbol{x}\geq 0 : A\boldsymbol{x} = \boldsymbol{b},\ \boldsymbol{c^{T}x} = z — this cannot hold since we assumed \boldsymbol{c}^Tx <z
(2) \exists (\boldsymbol{y}, \alpha) : A^T\boldsymbol{y}+\alpha {\bf c}\ge 0, \boldsymbol{b^{T}y} + \alpha z <0. Multiplying the first with {\bf x}^T, and using Ax=b, we get {\bf b}^Ty + \alpha {\bf c}^T x \ge 0. Comparing the two inequalities with {\bf b}^Ty and using our assumption that {\bf c}^T x < z, we conclude that \alpha  < 0.  Hence,

\boldsymbol{b^T}(\frac{-\boldsymbol{y}}{\alpha}) < z
A^{T}(\frac{-\boldsymbol{y}}{\alpha}) \geq \boldsymbol{c}

which gives us a feasible dual  (-{\bf y}/\alpha)!

Moreover, this gives us OPT(P) \leq \boldsymbol{b^T}(\frac{-\boldsymbol{y}}{\alpha}) < z. Since z is an arbitrary value greater than OPT(P), we know that there exists a feasible dual with OPT(P) = OPT(D). \square


We will now look at a problem that is more general than solving linear systems of the form A\boldsymbol{x} = \boldsymbol{b}.

Linear programs are of the form:
\min \boldsymbol{c^T}\boldsymbol{x}
A\boldsymbol{x} \geq \boldsymbol{b}

Note that linear programs have a linear objective (\boldsymbol{c^T}\boldsymbol{x}) and linear constraints (A\boldsymbol{x} \geq \boldsymbol{b}). We can easily see how to reduce solving a linear system A\boldsymbol{x} = \boldsymbol{b} to a linear program, which would look like

\min \boldsymbol{0}
A\boldsymbol{x} \geq \boldsymbol{b}
A\boldsymbol{x} \leq \boldsymbol{b}

Let’s take a look at a few examples of linear programs.

Example 1. (Max Flow)

Let G(V, E) be a graph with directed edges and capacities c_{ij} \geq 0. Let vertices s, t \in V be the source and the sink respectively. Our goal is to find flows f_{ij} \geq 0 such that f_{ij} \leq c_{ij} and \sum_{i}f_{ij} = \sum_{l}f_{jl} \forall j \neq s,t. We want to maximize our net flow, which is the total flow out of source s, so out linear program is

\max \sum_{j}f_{sj}
\sum_{i}f_{ij} = \sum_{l}f_{jl} \forall j \in V \setminus \{s,t\}
f_{ij} \geq 0 \forall i,j \in V
f_{ij} \leq c_{ij} \forall i,j \in V

Example 2. (Low-cost Diet)

We need 180 units of protein, 100 units of carbs, and 20 units of fat. We can get sushi for 9 units of cost, tomato soup for 2 units of cost, and chicken massaman for 6 units of cost. A single unit of sushi gives 15 protein, 20 carbs, and 5 fat; tomato soup gives 5 protein, 30 carbs, 15 fat; chicken massaman gives 40 protein, 10 carb, 8 fat. What is the minimum cost satisfactory diet? Let’s assign the following variables:
x = sushi
y = tomato soup
z = chicken massaman

Then, we get the following linear program,

\min 9x + 2y + 6z
(1)\ \ \ \ 15x + 5y + 40z \geq 180
(2)\ \ \ \ 20x + 30y + 10z \geq 100
(3)\ \ \ \ 5x + 15y + 8z \geq 20

Let’s note the costs of some solutions to the above program.

x = 12,\ y = z = 0\ \ \ \ cost = 108
x = 2,\ y = 1,\ z = 4\ \ \ \ cost = 44

Is there a better solution? If not, how can we show that there is no better solution? Let’s work towards a lower bound for OPT, the value of the optimal solution to the above program.

We will first try taking a multiple of one of our constraints,

\frac{1}{7}(1) = \frac{15}{7}x + \frac{5}{7}y + \frac{40}{7}z \geq \frac{180}{7}

\implies 9x + 2y + 6z \geq \frac{180}{7} = 25\frac{5}{7}

So, 25\frac{5}{7} \leq OPT \leq 44. Can we tighten this lower bound?

We will explore this question on another example. Consider the following LP,

\max 4x_1 + x_2 + 5x_3 + 3x_4
(1)\ \ \ \ x_1 - x_2 - x_3 + 3x_4 \leq 1
(2)\ \ \ \ 5x_1 + x_2 + 3x_3 + 8x_4 \leq 55
(3)\ \ \ \ -x_1 + 2x_2 + 3x_3 - 5x_4 \leq 3
x_1, x_2, x_3, x_4 \geq 0

Let z^{*} be the value of the optimal solution. Again, let’s try some solutions and note the values

\boldsymbol{x} = (0, 0, 1, 0) \implies z^* \geq 5
\boldsymbol{x} = (2, 1, 1, \frac{1}{3}) \implies z^* \geq 15

How can we tell if this z^* is optimal?

\frac{5}{3}(2) = \frac{25}{3}x_1 + \frac{5}{3}x_2 + 5x_3 + \frac{40}{3}x_4 \leq \frac{275}{3}
\implies 4x_1 + x_2 + 5x_3 + 3x_4 \leq \frac{275}{3} \implies z^* \leq \frac{275}{3}

We can try another linear combination of constraints

(2) + (3) = 4x_1 + 3x_2 + 6x_3 + 3x_4 \leq 58
\implies z^* \leq 58

What is the best possible upper bound? We would like to choose multipliers y_1, y_2, y_3 \geq 0 for constraints (1), (2), (3). Then, we get

(y_1 + 5y_2 - y_1)x_1 + (-y_1 + y_2 + 2y_3)x_2 + (-y_1 + 3y_2 + 3y_3)x_3 + (3y_1 + 8y_2 - 5y_3)x_4 \leq y_1 + 55y_2 + 3y_3

So we’d like,

\min y_1 + 55y_2 + 3y_3
y_1 + 5y_2 - y_1 \geq 4
-y_1 + y_2 + 2y_3 \geq 1
-y_1 + 3y_2 + 3y_3 \geq 5
3y_1 + 8y_2 - 5y_3 \geq 3
y_1, y_2, y_3 \geq 0

Then, this gives a valid upper bound on z^*.

More generally the solution to the linear program

\min \boldsymbol{b^T}\boldsymbol{y}
A^T \boldsymbol{y} \geq \boldsymbol{c}
\boldsymbol{y} \geq 0

gives an upper bound for the linear program

\max \boldsymbol{c^T} \boldsymbol{x}
A\boldsymbol{x} \leq \boldsymbol{b}
\boldsymbol{x} \geq 0

or more concisely,

Theorem 1. (Weak Duality)
\max\{\boldsymbol{c^T}\boldsymbol{x}:A\boldsymbol{x} \leq \boldsymbol{b}, \boldsymbol{x} \geq 0\} \leq \min\{\boldsymbol{b^T}\boldsymbol{y}:A^T \boldsymbol{y} \geq \boldsymbol{c}, \boldsymbol{y} \geq 0\}

Power Iteration

Recall the singular value decomposition (SVD) of a matrix A = UDV^{T} = \sum_{i=1}^{r}\sigma_i \boldsymbol{u^{(i)}}\boldsymbol{v^{(i)^{T}}}.

We showed that V_{k} = span\{\boldsymbol{v^{(1)}}, ..., \boldsymbol{v^{(k)}}\} is the best-fit subspace for A, so

V_{k} = \arg\min_{V: dim(V) \leq k}\sum_{i=1}^{m}d(A_{(i)}, V)^2

We have that A_{(i)}V_{k}V_{k}^{T} is the projection of A_{(i)} to V_{k}, and so AV_{k}V_{k}^{T} is the projection of A to V_{k}.

Let A_k be defined as AV_{k}V_{k}^{T} = \sum_{i=1}^{k}\sigma_i \boldsymbol{u^{(i)}}\boldsymbol{v^{(i)^{T}}}.

Theorem 1.
\|A-A_{k}\|_{F} = \min_{B: rank(B) \leq k}\|A-B\|_{F}
\|A-A_{k}\|_{2} = \max_{\|\boldsymbol{v}\| = 1}\|(A-A_{k}) \cdot \boldsymbol{v}\| = \sigma_{k+1} = \min_{B: rank(B) \leq k}\|A-B\|_{2}

We proved part (1) last lecture. We will prove part (2) here. Note that

A-A_{k} = \sum_{i=k+1}^{r}\boldsymbol{u^{(i)}}\boldsymbol{v^{(i)^{T}}}
\|A-A_{k}\|_{2} = \sigma_{k+1}.

We need to show that for any B: rank(B)=k,
B=WZ^{T} where W, Z have \leq k columns and \|A-B\|_{2} \geq \sigma_{k+1}.

Since we know that Z has only k columns, \exists \boldsymbol{v} \in span\{\boldsymbol{v^{(1)}},...,\boldsymbol{v^{(k+1)}}\} such that
Z^{T}\boldsymbol{v} = 0,
\boldsymbol{v} = \sum_{i=1}^{k+1}\alpha_{i}\boldsymbol{v^{(i)}}, and

(A-B) \cdot \boldsymbol{v} = A \cdot \boldsymbol{v} = \sum_{i=1}^{k+1}\sigma_{i}\alpha_{i}\boldsymbol{u^{(i)}}
\|(A-B) \cdot \boldsymbol{v}\|^{2} = \sum_{i=1}^{k+1}\sigma_{i}^{2}\alpha_{i}^{2} \geq \sigma_{k+1}^{2}. \square

We’ve seen that the SVD is useful in more than one way. Now we will look at how to compute it, a central question in numerical analysis. A simple method, power iteration, is presented here:

Set \boldsymbol{x} \leftarrow e_i for i=1,2,\ldots,n
Repeat t times:{
\boldsymbol{x} \leftarrow A^{T}A\boldsymbol{x}
\boldsymbol{x} \leftarrow \frac{\boldsymbol{x}}{\|\boldsymbol{x}\|}

Theorem 2.
Assume (x\cdot v^{(1)})^2 \ge \frac{1}{n}. After t = \frac{\log n}{2\varepsilon} iterations, \boldsymbol{x} satisfies \|A\boldsymbol{x}\| \geq (1-\varepsilon) \sigma_{1}.

\boldsymbol{x} = \sum_{i=1}^{r}\alpha_{i}\boldsymbol{v^{(i)}}
A^{T}A = \sum_{i=1}^{r}\sigma_{i}^{2}\boldsymbol{v^{(i)}}\boldsymbol{v^{(i)^{T}}}
(A^{T}A)^l = \sum_{i=1}^{r}\sigma_{i}^{2l}\boldsymbol{v^{(i)}}\boldsymbol{v^{(i)^{T}}}

\frac{\boldsymbol{x}^{T}(A^{T}A)^{l}A^{T}A(A^{T}A)^{l}\boldsymbol{x}}{\boldsymbol{x}^{T}(A^{T}A)^{l}(A^{T}A)^{l}\boldsymbol{x}} = \frac{\sum_{i=1}^{r}\sigma_{i}^{4l+2}\alpha_{i}^{2}}{\sum_{i=1}^{r}\sigma_{i}^{4l}\alpha_{i}^{2}}

Since \sigma_{1} \geq \sigma_{2} \geq ..., the weight on higher \sigma_{i} is increasing.

\sum \sigma_{i}^{4l}\alpha_{i}^{2} = \sum \sigma_{i}^{4l}(\alpha_{i}^{2})^{\frac{2l}{2l+1}}(\alpha_{i}^{2})^{\frac{1}{2l+1}}
\leq (\sum \sigma_{i}^{4l+2}\alpha_{i}^{2})^{\frac{2l}{2l+1}}(\sum \alpha_{i}^{2})^{\frac{1}{2l+1}}
= (\sum \sigma_{i}^{4l+2}\alpha_{i}^{2})^{\frac{2l}{2l+1}}

\therefore \frac{\boldsymbol{x^{l^{T}}}A^{T}A\boldsymbol{x^{l}}}{\boldsymbol{x^{l^{T}}}\boldsymbol{x^{l}}} \geq (\sum \sigma_{i}^{4l+2}\alpha_{i}^{2})^{\frac{1}{2l+1}} \geq \alpha_{1}^{\frac{2}{2l+1}}\sigma_{1}^{2}.

Therefore, after l iterations,
\frac{\|A\boldsymbol{x^{l}}\|^{2}}{\|\boldsymbol{x^{l}}\|^{2}} \geq (\frac{1}{n})^{\frac{1}{2l+1}}\sigma_{1}^{2}
= e^{-\frac{\ln n}{2l+1}}\sigma_{1}^{2}

Setting 2l + 1 = \frac{\ln n}{\varepsilon} \geq e^{-\varepsilon}\sigma_{1}^{2} \geq (1-\varepsilon)\sigma_{1}^{2}. \square

We note that the choice of the initial x in the algorithm, one of the axis vectors e_i, must satisfy the assumption that (e_i \cdot v^{(1)})^2 \ge \frac{1}{n}. Instead of trying all n basis vectors, we could also start with a random unit vector.

\mathbb{E}(\alpha_{1}^{2}) = \frac{1}{n}

\sum x_{i}^{2} = 1 symmetric.

So, \mathbb{E}(x_{1}^{2}) = \mathbb{E}(x_{2}^{2}) = ... = \mathbb{E}(x_{n}^{2}).

\implies n\mathbb{E}(x_{1}^{2}) = 1 \implies \mathbb{E}(x_{1}^{2}) = \alpha_{1}^{2} = \frac{1}{n}. \square

Moreover, \alpha_1^2 \ge \frac{1}{n} with some constant probability. So a few runs of the algorithm will suffice with high probability.


Singular Value Decomposition

Last time we saw that \min \|Ax-b\| is achieved by x = A^{+}b where A^{+} = VD^{-1}U^T depended on the singular value decomposition of A.

This time we will explore singular value decomposition further.

Theorem 1.
Any m \times n real matrix A can be written as A = UDV^T where U,V have orthonormal columns and D is a diagonal matrix with entries \sigma_1 \geq \dots \geq \sigma_r > 0 where r = rank(A).

AV = UDV^{T}V = UD
A^{T}U = VD
or equivalently,
A\boldsymbol{v^{(i)}} = \sigma_i \boldsymbol{u^{(i)}}
A^{T}\boldsymbol{u^{(i)}} = \sigma_i \boldsymbol{v^{(i)}}

\boldsymbol{u^{(i)}} is a left singular vector
\boldsymbol{v^{(i)}} is a right singular vector

Lemma 1.
\boldsymbol{v^{(1)}} = \arg\max_{\|\boldsymbol{v}\|=1}\|A\boldsymbol{v}\|^2

Proof. (Linear Algebra)
A^{T}A\boldsymbol{v} = \sigma A^{T}\boldsymbol{u} = \sigma^2 \boldsymbol{v}

So, each \boldsymbol{v^{(i)}} is an eigenvector of A^{T}A
and each \boldsymbol{u^{(i)}} is an eigenvector of AA^{T}

For any eigenvector \boldsymbol{v} of A^{T}A,
A^{T}A\boldsymbol{v} = \lambda\boldsymbol{v}
define \boldsymbol{u} = \frac{1}{\sqrt{\lambda}}A\boldsymbol{v}

Then, A\boldsymbol{v} = \sqrt{\lambda}u
and A^{T}\boldsymbol{u} = \sqrt{\lambda}v.

So \arg\max_{\|\boldsymbol{v}\|\leq 1}\|A\boldsymbol{v}\|^2 is the top eigenvector of A^{T}A, and hence, the top right singular vector of A. \square

Proof. (Calculus)
Consider the objective function f(\boldsymbol{v}) = \|A\boldsymbol{v}\|^2 = \boldsymbol{v^{T}}A^{T}A\boldsymbol{v}.

Then, \nabla f(\boldsymbol{v}) = 2A^{T}A\boldsymbol{v}.

Intuitively we can see that if \boldsymbol{v} is a point on the surface of a sphere, defined by \|v\| \leq 1, then at any local optimum \boldsymbol{u'}, \nabla f(\boldsymbol{u'}) is parallel to \boldsymbol{u'}. The inverse is also true.


This relationship is represented by 2A^{T}A\boldsymbol{v} = \lambda \boldsymbol{v}.

By the above, we can see that all local optima of \|A\boldsymbol{v}\|^2 are eigenvectors. So, \max \|A\boldsymbol{v}\|^2 is achieved by an eigenvector of A^{T}A, or a right singular vector of A. \square

Can we reason similarly about \boldsymbol{v^{(i)}}, \boldsymbol{u^{(i)}}\ \forall\ i=2 \dots r?

Lemma 2.
\boldsymbol{v^{(1)}} = \arg\max_{\|\boldsymbol{v}\|\leq 1}\|A\boldsymbol{v}\|^2
\boldsymbol{v^{(2)}} = \arg\max_{\|\boldsymbol{v}\|\leq 1 \atop \boldsymbol{v} \bot \boldsymbol{v^{(1)}}}\|A\boldsymbol{v}\|^2

\boldsymbol{v^{(k)}} = \arg\max_{\|\boldsymbol{v}\|\leq 1 \atop \boldsymbol{v} \bot \boldsymbol{v^{(1)}} ... \boldsymbol{v^{(k-1)}}}\|A\boldsymbol{v}\|^2

Then, \boldsymbol{v^{(i)}} \ \forall\ i=2 \dots r are right singular vectors of A.

We can view the rows A_{(i)} of A as points in \mathbb{R}^n. Then, \boldsymbol{v^{(1)}} is the best-fit line (that goes through the origin), or the best-fit one-dimensional subspace. This is represented by,
\boldsymbol{v^{(1)}} = \arg\min_{\|v\|=1}\sum_{i+1}^{m}d(A_{(i)},\boldsymbol{v})^2


\sum_{i}\|A_{(i)}\|^2 = \sum_{i}d(A_{(i)},\boldsymbol{v})^2 + \sum_{i}\|A_{(i)} \cdot \boldsymbol{v}\|^2 is given by the Pythagorean Theorem. So, minimizing \sum_{i}d(A_{(i)},\boldsymbol{v})^2 requires that we proportionally maximize \sum_{i}\|A_{(i)} \cdot \boldsymbol{v}\|^2 = \|A\boldsymbol{v}\|^2. \square


Theorem 2.
V_k = span\{\boldsymbol{v^{(1)}} ... \boldsymbol{v^{(k)}}\}
Then, V_k is the best-fit k-dimensional subspace for A,
V_k = \arg\min_{V:dim(V) \leq k}\sum_{i}d(A_{(i)}, V)^2

We will do an induction on k.
k = 1: Lemma 1
Suppose Theorem 2 is true for (k-1)-dimensional subspace. So, V_{k-1} is the best-fit (k-1)-dimensional subspace.
Let W be the best k-dimensional subspace.
We know that \exists \boldsymbol{w_k} \in W such that \boldsymbol{w_k} \bot V_{k-1} since dim(W \cap V_{k-1}) \leq k-1.
So, W = span\{\boldsymbol{w_k}, W_{k-1}\}
d(A, W)^2 = d(A, \boldsymbol{w_k})^2 + d(A, W_{k-1})^2
\geq d(A, \boldsymbol{w_k})^2 + d(A, V_{k-1})^2 since V_{k-1} is best (k-1)-dimensional subspace, by IH
> d(A, \boldsymbol{v^{(k)}})^2 + d(A, V_{k-1})^2 since \boldsymbol{v^{(k)}} is the best among vectors orthogonal to V_{k-1}
= d(A, V_k)^2. \square

Proof of Theorem 1.
We want to show that A = \sum_{i=1}^{n}\sigma_{i} \boldsymbol{u^{(i)}}\boldsymbol{v^{(i)}}^T.

The LHS and RHS are both matrices. Since LHS \cdot \boldsymbol{v^{(i)}} = A\boldsymbol{v^{(i)}} = \sigma_{i}\boldsymbol{u^{(i)}} and RHS \cdot \boldsymbol{v^{(i)}} = \sum_{i=1}^{n}\sigma_{i}\boldsymbol{u^{(i)}}\boldsymbol{v^{(i)^T}}\boldsymbol{v^{(i)}} = \sum_{i=1}^{n}\sigma_{i}\boldsymbol{u^{(i)}}, we get that A\boldsymbol{v} = (\sum_{i=1}^{n}\sigma_{i}\boldsymbol{u^{(i)}}\boldsymbol{v^{(i)^T}})\boldsymbol{v} for every \boldsymbol{v} \in \mathbb{R}^n \implies A = \sum_{i=1}^{n}\sigma_{i}\boldsymbol{u^{(i)}}\boldsymbol{v^{(i)^T}}

Rational Approximation

Given a real number or a rational with large integers r, how can we approximate this by a rational with small integers?

Let’s recall the Euclidean algorithm to find \gcd(a,b) via the following example:

\gcd(209, 171)
209 = 171 \cdot 1 + 38
\gcd(171, 38)
171 = 38 \cdot 4 + 19
\gcd(38, 19) = 19

In general, for a = b \cdot q + r, we use \gcd(a,b) = \gcd(b,r), which is much faster than factorization!

To see this algorithm in another setting, we recall continued fractions. For this, we will consider the example of approximating e by a rational.

e = 2 + 0.71828\dots = 2 + (e-2)
= 2 + \frac{1}{1 + \frac{3-e}{e-2}}
= 2 + \frac{1}{1 + \frac{1}{2 + \frac{3e-8}{3-e}}}
= 2, 2 + \frac{1}{1}, 2 + \frac{1}{1 + \frac{1}{2}}, \dots
= 2, 3, 2\frac{2}{3}, \dots

Theorem 1.
For any integer B>0, the best rational approximation of any real r using denominator at most B is given by a continued fraction approximation of r.

r = a_0 \frac{1}{a_1 + \frac{1}{a_2 + \frac{1}{a_3 + \frac{1}{a_4 + \dots}}}}

The theorem will follow from the next lemma.

Lemma 2.
p_{k+1} = a_{k}p_{k} + p_{k-1}
q_{k+1} = a_{k}q_{k} + q_{k-1}
p_{-1} = 0, p_{0} = 1, q_{-1} = 1, q_{0} = 0

p_{k}q_{k+1}-p_{k+1}q_{k} = (-1)^{k} \implies |\frac{p_k}{q_k} - \frac{p_{k+1}}{q_{k+1}}| = \frac{1}{q_{k}q_{k+1}}

\forall r, \forall B \geq b, \exists unique \frac{a}{b} such that |r - \frac{a}{b}| < \frac{1}{2bB}

\frac{a}{b} from (3) is given by some \frac{p_k}{q_k}

(1) is left as homework.

(2) is by substitution. Note that r - \frac{p_k}{q_k} alternates in sign and converges.

(3)\ \exists \frac{a_1}{b_1}, \frac{a_2}{b_2} with b_1, b_2 \leq B
|\frac{a_1}{b_1} - \frac{a_2}{b_2}| \leq |\frac{a_1}{b_1}-r| + |\frac{a_2}{b_2}-r| < \frac{1}{2b_{1}B} + \frac{1}{2b_{2}B} \leq \frac{1}{b_{1}b_{2}}
But for any two \frac{a_1}{b_1} \neq \frac{a_2}{b_2},
|\frac{a_1}{b_1} - \frac{a_2}{b_2}| \geq |\frac{a_{1}b{2} - a_{2}b{1}}{b_{1}b_{2}}| \geq \frac{1}{b_{1}b_{2}}, where we reach a contradiction.

(4) Suppose \exists \frac{a}{b} \neq \frac{p_k}{q_k} such that |\frac{a}{b}-r| < \frac{1}{2bB}
For $latex q_k \leq B |\frac{a}{b}-r| \geq \min(|\frac{a}{b} – \frac{p_k}{q_k}|, |\frac{a}{b}-\frac{p_{k+1}}{q_{k+1}}|)$
\frac{1}{2bB} \geq \frac{1}{bq_{k+1}}
\implies q_{k+1} > 2B
|r-\frac{p_k}{q_k}| \leq \frac{1}{q_{k}q_{k+1}} < \frac{1}{2q_{k}B}
But such a solution is unique!

We will now use lemma 2 to complete our algorithm for solving Ax=b exactly. Recall we have \bar{x} where A\bar{x} = b\ (mod\ p^m). For each x_i, our goal is to find c_i, d such that
dx_i = c_{i}\ (mod\ p^m)
Then, A\frac{c}{d} = b since \|Ac\|, \|b\| << p^m.

Lemma 3.
Suppose \exists integers f,g such that |f|,|g|<\frac{1}{2}\sqrt{h} and gs = f\ (mod\ h).
Let the continued fraction approximations of \frac{s}{h} be \frac{w_1}{v_1} \dots \frac{w_k}{v_k} \dots and define u_i = v_{i}s - w_{i}h.
Let k be the smallest integer such that |u_k| \leq \sqrt{h}. Then, \frac{f}{g} = \frac{u_k}{v_k}.

This lemma solves our problem. We can quickly convert \bar{x} to a rational solution via the continued fraction approximations of \frac{x_i}{p^m} and finding the corresponding \frac{u_k}{v_k}.

Note that \{w_i\},\{v_i\} are increasing while |u_i| is decreasing (why?). Let f = gs - th. Then,
|\frac{s}{h} - \frac{t}{g}| = |\frac{f}{hg}| = |\frac{fg}{hg^2}| < \frac{1}{2g^2}

By Lemma 2, \frac{t}{g} must be some \frac{w_j}{v_j}.
\implies |u_j| \leq |f| \leq \frac{1}{2}\sqrt{h}, so j \geq k, since u_j is decreasing.

But u_{j}v_{k}-u_{k}v_{j} = (v_{j}s-w_{j}h)v_k - (v_{k}s-w_{k}h)v_j = (-w_{j}v_k + w_{k}v_j)h = 0\ (mod\ h) and |u_{j}v_{k}-u_{k}v_{j}| \leq (|u_j|+|u_k|)|v_j| \leq (\frac{1}{2}\sqrt{h} + \sqrt{h})\frac{1}{2}\sqrt{h} < h.

\implies u_{j}v_{k}=u_{k}v_{j} i.e. j=k and \frac{f}{g} = \frac{u_k}{v_k}.


Let’s think about the problem of solving \min||Ax-b|| where A is an m \times n matrix. Let’s instead look at minimizing the objective function f(x) = \|Ax-b\|^2.

First, we’ll calculate the gradient and set it equal to zero, since we know that must be the global minimum of our objective,
\nabla f = 2A^{T}(Ax-b) = 0
\implies A^{T}Ax = A^{T}b

Then, if A^{T}A has full rank (or, equivalently, A has full column rank), then x = (A^{T}A)^{-1}A^{T}b is the unique solution.


\min \|Ax-b\| = dist(b,S)

The proof that this minimum is unique is left as an exercise.

So, the optimal \hat{x} achieves A\hat{x} = \hat{b}.

Lemma 1.
A(A^{T}A)^{-1}A^{T} is the orthogonal projection from \mathbb{R}^m to S = span\{\vec{a_1} \dots \vec{a_n}\}.

Take any column of A, say \vec{a_i} = A\vec{e_i}. Then, A(A^{T}A)^{-1}A^{T}\vec{a_i} = A\vec{e_i} = \vec{a_i}
Now consider any y \bot \{\vec{a_i} \dots \vec{a_n}\}, so y \cdot \vec{a_i} = 0\ \forall i = 0 \dots n.
Then, A(A^{T}A)^{-1}A^{T}y = 0.
So, for any vector y \in \mathbb{R}^m, y = \sum_{i}\alpha_{i}\vec{a_i} + y'
A(A^{T}A)^{-1}A^{T}y = A(A^{T}A)^{-1}A^{T}(A\sum_{i}\alpha_{i}\vec{e_i})
= \sum_{i}\alpha_{i}\vec{a_i} is the projection of y onto span\{\vec{a_1} \dots \vec{a_n}\}. \square

What if the columns of A are not linearly independent, and so, (A^{T}A)^{-1} is not well defined?

Let’s get familiar with what’s known as the Singular Value Decomposition,
A = UDV^T, with the following definitions,
r = rank(A)
\sigma_1 \geq \dots \geq \sigma_r > 0
U = \{\vec{u_1} \dots \vec{u_r}\} orthonormal columns in \mathbb{R}^m
D = Diag(\sigma_1 \dots \sigma_r)
V = \{\vec{v_1} \dots \vec{v_r}\} orthonormal columns in \mathbb{R}^n

Theorem 1.
For any A \in \mathbb{R}^{m \times n}, the singular values are unique. The singular values are paired u_{i}, v_{i} such that ,
Av_i = \sigma_{i}\vec{u_i}
\vec{u_i}^{T}A = \sigma_{i}\vec{v_i}

If A is nonsingular, then A^{-1} = VD^{-1}U^{T}.

If A is nonsingular, m = n = r. Therefore, AA^{-1} = UDV^{T}VD^{-1}U^{T} = UU^{T}. We know U^{T}U = I \implies U^T = U^{-1} \implies UU^T = I. \square

To continue our search for \arg\min_{x}\|Ax-b\|, we will need to be familiar with the pseudo-inverse of A, A^+ = VD^{-1}U^{T}. Note the following properties,
U^{T}U = V^{T}V = D^{-1}D = I which we will use in the following,
AA^{+}A = UDV^{T}VD^{-1}U^{T}UDV^{T} = UDV^{T} = A
A^{+}AA^{+} = A^{+}

Theorem 2.
\arg\min_{x}\|Ax-b\| = A^{+}b

We know that the solution must satisfy A^{T}Ax = A^{T}b. Let’s verify that x = A^{+}b satisfies this condition.
A^{T}Ax = VDU^{T}UDV^{T}VD^{-1}U^{T}b = VDU^{T}b = A^{T}b
But, is this solution unique?
Ax = AA^{+}b = UDV^{T}VD^{-1}U^{T}b = UU^{T}b
Remember, U has orthonormal columns; so, it’s columns form a basis of span\{\vec{a_1} \dots \vec{a_n}\}.
\vec{u_i} \cdot b are the coordinates of b along $\vec{u_i}$.
The projection of b onto span\{\vec{a_1} \dots \vec{a_n}\} is exactly UU^{T}b.
So, A^{+}b is the unique minimum solution. \square

Faster Gaussian Elimination

Last time we saw how to solve Ax = b with a complexity of O(n^{4}(\log n + b)). We now want to see if we can solve the same question with O(n^{3}) complexity.

Consider solving Ax = b\ (mod\ p) where p is a prime. We can still use Gaussian Elimination and, without loss of generality, assume that a_{ij} \in \{0, \dots, p-1\}.

Let’s work through an example:

\begin{bmatrix} 1 & 0 & 1 \\ 0 & 1 & 1 \\ 1 & 1 & 1 \end{bmatrix} x = \begin{bmatrix} 0 \\ 1 \\ 0 \end{bmatrix}

= \left[\begin{array}{ccc|c} 1 & 0 & 1 & 0 \\ 0 & 1 & 1 & 1 \\ 1 & 1 & 1 & 0 \end{array}\right] =_{(3)-(1)} \left[\begin{array}{ccc|c} 1 & 0 & 1 & 0 \\ 0 & 1 & 1 & 1 \\ 0 & 1 & 0 & 0 \end{array}\right] =_{(3)-(2)} \left[\begin{array}{ccc|c} 1 & 0 & 1 & 0 \\ 0 & 1 & 1 & 1 \\ 0 & 0 & 1 & 1 \end{array}\right]

We then solve for x to get

x = \begin{bmatrix} 1 \\ 0 \\ 1 \end{bmatrix}

In general, we will need to be familiar with the multiplicative inverse which is

b^{-1} = a \in \{1, \dots, p-1\} s.t. ab = 1\ (mod\ p)

For example,

5^{-1} = 3\ (mod\ 7)
4^{-1} = 2\ (mod\ 7)
6^{-1} = 6\ (mod\ 7)

So, coefficients of A remain in \{0, 1, \dots, p-1\} and complexity O(n^{3}\log p).

Can we use this to solve Ax = b faster?

It turns out, YES, we can!

1. First, solve A\bar{x}=b\ (mod\ p^m) for a “large enough” m.
2. Convert \bar{x} to rational x with “not too large” numerator and denominator.

We want m to be large enough such that the resulting \bar{x} the unique solution to A\bar{x}=b\ (mod\ p^m). How large, exactly, is that?

We know that A has entries with b bits. So,

|det(A)| \leq (2^{2b}n)^n = 2^{(2b+\log{n})n}

This implies that every numerator/denominator of coordinates of x is an integer q \leq 2^{(2b+\log{n})n}.

Assume p < 2^{2b} and p does not divide det(A).

Now, our high level approach to solving Ax = b will be to take the following steps:

1. Solve A\bar{x} = b\ (mod\ p^m) in \tilde{O}(n^3 + mn^2).

2. Set m = O(n\log{n}) and recover x from \bar{x} such that Ax = b.

First, we will try to find \frac{c_{i}}{d} = x_{i} where c_i is a vector and d is an integer with |c_i|, d \leq q. Why?

Ac = dA\bar{x}\ (mod\ p^m) = db\ (mod\ p^m)
A\frac{c}{d} = b\ (mod\ p^m)

|A_{i}c| \leq ||A_i|| \max_{j}|c_j| \leq 2^{2b+\log{n}}q  2^{(2b + \log{n})(n+1)}, so m > \frac{(2b + \log{n})(n+1)}{\log{p}} will suffice.

We now present the algorithm to solve the first step:

1. Find C

AC = I\ (mod\ p)

2. Set b_0 = b. For i = 0 \dots m-1

x_i = cb_i\ (mod\ p)
b_{i+1} = \frac{1}{p}(b_i - Ax_i)

3. Calculate \bar{x} and A\bar{x}.

\bar{x} = \sum_{i=0}^{m-1} x_{i}p^{i}
A\bar{x} = \sum_{i=0}^{m-1} Ax_{i}p^{i}
= \sum_{i=0}^{m-1} (b_{i} - pb_{i+1})p^{i}
= \sum_{i=0}^{m-1} b_{i}p^{i} - b_{i+1}p^{i+1}
= b_0 - b_{m}p^m
= b_0\ (mod\ p^m)

Next time we will look at how to approximate a given real/rational by a rational with small integers using the Euclidean algorithm for greatest common divisor. We will use this to complete the solution to Ax=b.