Interior Point Method

We study another class of optimization algorithms: interior point methods.  These require a more specific structure than general convex optimization, but can still be used beyond linear programming.  These algorithms also work by following an interior path, determined by successive minima of a modified objective function. We will study in complete detail, one of the simplest variants, the primal-dual interior-point method.

Consider the primal program (P)

\min  \boldsymbol{c^T}\boldsymbol{x}
s.t.  A\boldsymbol{x} = \boldsymbol{b}, x \geq 0

with corresponding dual (D)

\max  \boldsymbol{b^T}\boldsymbol{y}
s.t.  A^T\boldsymbol{y} + \boldsymbol{s}= \boldsymbol{c}, s \geq 0

Note that (P) \geq (D).  Then, 0 \leq \boldsymbol{c}^T\boldsymbol{x} - \boldsymbol{b}^T\boldsymbol{y} = \boldsymbol{c}^T\boldsymbol{x} - \boldsymbol{x}^TA^T\boldsymbol{y} = \boldsymbol{c}^T\boldsymbol{x} - \boldsymbol{x}^T(\boldsymbol{c}-\boldsymbol{s}) = \boldsymbol{x}^T\boldsymbol{s} .  If the complimentary slackness conditions hold, this is achieved with equality, indicating that either each primal variable is 0, or it’s corresponding dual constraint is tight (zero slack).

Modify the primal objective to P(t) = \min \boldsymbol{c^T}\boldsymbol{x} - t \sum_{j=1}^n \ln(x_j) , with the same constraints.  This modification effectively punishes low x_j very severely, with x_j=0 having value of \infty , and is relatively easy to optimize.  As t approaches 0, this modification becomes the original, so we can solve the original by successively reducing t .

We have the constraints X\boldsymbol{s} = t\boldsymbol{1}, A\boldsymbol{x}=\boldsymbol{b}, \boldsymbol{x}\geq 0, A^T\boldsymbol{y}+\boldsymbol{s}=\boldsymbol{c} , where X=diag(\boldsymbol{x}) .  We can start with a solution for some t such that \boldsymbol{x} is feasible in (P), \boldsymbol{y}/\boldsymbol{s} are feasible in (D), and \boldsymbol{x}^T\boldsymbol{s}=tn .  Our goal is to iteratively find new solutions with lower t until eventually we get t to 0.  In doing this, we can’t quite maintain X\boldsymbol{s}=t\boldsymbol{1} , but we can keep the values close, with an \epsilon -approximate solution \|X\boldsymbol{s}-t\boldsymbol{1}\| \leq \epsilon t .

Now, suppose we have a valid x,y,z,t , and we want to make t smaller.  We fix some \alpha < 1 and let \hat{t} = \alpha t .  Then, solve the following system of linear equations:

A \Delta x = 0

A^T \Delta y + \Delta s = 0

S \Delta x + X \Delta s =  t 1 - Xs

We then update x\leftarrow x + \Delta x y \leftarrow  y + \Delta y , s \leftarrow s + \Delta s and t \leftarrow \alpha t.

Lemma 1.
Let $\varepsilon < 1/2$. If (x, y , s) is a solution of P(t) with \| Xs - t \| \leq \varepsilon t, then (\hat{x}, \hat{y}, \hat{s}) = (x, y, s) + (\Delta x, \Delta y, \Delta s) is a \frac{1+\varepsilon}{(1-\varepsilon)^2}{\varepsilon}^2-approximate solution with t.

Lemma 2.
Let \alpha \geq 1 - \frac{c}{\sqrt{n}}, latex (x, y, s)$ is a \frac{1}{5}-solution, then (\hat{x}, \hat{y}, \hat{s}) is an \varepsilon = \frac{1}{5}-approximation with \alpha t.

Theorem 1.
This interior point method reaches a solution with duality gap at most \delta in O(\sqrt{n} \log \frac{x_0^T s_0}{n \delta}) iterations, where each iteration is the solution of an (m+n) \times (m+n) linear system.

Proof. (Theorem 1)
Start at t_0 = 5 \frac{x_0^T s_0}{n}, \varepsilon = \frac{1}{5}. After each iteration, t_{i+1} \leq \alpha t_i , \varepsilon = \frac{1}{5}. To reach t = 5 \delta, we need {\alpha}^k \leq \frac{\delta n}{x_0^T s_0} \implies k \geq \log_\alpha \frac{x_0^T s_0}{n \delta} \geq c\sqrt{n} \log \frac{x_0^T s_0}{n \delta} suffices. \square

Proof. (Lemma 1)
Let (X, Y, s) satisfy
Ax = b
A^T y + s = c
\| \bar{X}s - t1 \| \leq \varepsilon t
x \geq 0

and let (\Delta x, \Delta y, \Delta s) satisfy
A \Delta x = 0
A^T \Delta y + \Delta s = 0
S \Delta x + X \Delta s = t1 - Xs

Note the following:
(i) (\Delta x)^T \Delta s = (\Delta x)^T (-A^T \Delta y) = -(A^T \Delta x)^T \Delta y = 0
(ii) (1-\varepsilon)t \leq x_j s_j \leq (1+\varepsilon)t \implies x_j \geq \frac{(1-\varepsilon)t}{s_j} \implies s_j \geq \frac{(1-\varepsilon)t}{x_j}

So (1-\varepsilon)t {\| X^{-1}\Delta x \|}^2 = (1-\varepsilon)t(\Delta x)^T X^{-1} X^{-1} \Delta x
= (\Delta x)^T X^{-1} \begin{pmatrix} \ddots & 0 & 0 \\ 0 & \frac{(1-\varepsilon)t}{x_j} & 0 \\ 0 & 0 & \ddots \end{pmatrix} \Delta x
\leq \Delta x^T X^{-1} S \Delta x
= \Delta x^T X^{-1} (t1 - Xs - X \Delta s)
= \Delta x^T X^{-1} (t1 - Xs)
\leq \| \Delta x^T X^{-1} \| \| t1 - Xs \|
\leq \varepsilon t \| \Delta x^T X^{-1} \|
\implies \| X^{-1} \Delta x \| \leq \frac{\varepsilon}{1-\varepsilon} < 1

Similarly, \| S^{-1} \Delta s \| \leq \frac{\varepsilon}{1-\varepsilon}.

\hat{s} = s + \Delta s = S(1 + S^{-1}\Delta s) \geq 0
\hat{x} \geq 0
(\hat{x}, \hat{y}, \hat{s}) is feasible.

\hat{x}_j \hat{s}_j = (x_j + \Delta x_j)(s_j + \Delta s_j)
= x_j s_j + x_j \Delta s_j + s_j \Delta x_j + \Delta x_j \Delta s_j
= x_j s_j + t - x_j s_j + \Delta x_j \Delta s_j
= t + \Delta x_j \Delta s_j

\| t1 - \hat{x}\hat{s} \| \leq \sum_j | t - \hat{x}_j \hat{s}_j |
= \sum_j |\Delta x_j \Delta s_j|
= \sum_j \frac{|\Delta x_j|}{x_j} \frac{|\Delta s_j|}{s_j}x_j s_j
\leq \sum_j \frac{|\Delta x_j|}{x_j} \frac{|\Delta s_j|}{s_j} (1+\varepsilon)t
\leq (1+\varepsilon)t (\sum_j (\frac{\Delta x_j}{x_j})^2)^{\frac{1}{2}}(\sum_j (\frac{\Delta s_j}{s_j})^2)^{\frac{1}{2}}
= (1+\varepsilon)t \| X^{-1}\Delta x \| \| S^{-1}\Delta s \|
\leq \frac{1+\varepsilon}{(1-\varepsilon)^2}{\varepsilon}^2 t \square

Proof. (Lemma 2)
\| \hat{X}\hat{s} - \hat{t}\boldsymbol{1} \| = \| \hat{X}\hat{s} - t1 + (1-\alpha}) t1 \|
\leq \| \hat{X}\hat{s} - t1 \| + (1-\alpha) t \| 1 \|
\leq {\varepsilon}^2 \frac{1+\varepsilon}{(1-\varepsilon)^2} t + (1-\alpha)t\sqrt{n}

We need {\varepsilon}^2 \frac{1+\varepsilon}{(1-\varepsilon)^2} t + (1-\alpha)t\sqrt{n} \leq \frac{1}{5} \hat{t} = \frac{\alpha}{5} t

Let \alpha = 1- \frac{c}{\sqrt{n}}. Then, we have \frac{1+\varepsilon}{(1-\varepsilon)^2} {\varepsilon}^2 + c \leq \frac{1-\frac{c}{\sqrt{n}}}{5}.

Plugging in \varepsilon = \frac{1}{5}, we get \frac{\frac{6}{5}}{\frac{16}{25}} \frac{1}{25} + c = \frac{3}{40} + c \leq \frac{1}{5} - \frac{c}{5\sqrt{n}}. So solving for c, we get c \leq \frac{\frac{1}{8}}{1 + \frac{1}{5\sqrt{n}}} \leq \frac{1}{16} suffices. \square