# 9-13 Lecture Notes

Consider how we would solve $Ax = b$ using fast gradient descent.  We would apply fast GD to $\|Ax-b\|^2$ and try to minimize this.  From results in previous classes, this would take at most $O\left(\frac{\lambda_{max}(A)}{\lambda_{min}(A)} \log\frac{n}{\epsilon}\right)$ iterations, each taking $O(mn)$, for a total complexity of

$O\left(\frac{\lambda_{max}(A)}{\lambda_{min}(A)} m n \log\frac{n}{\epsilon}\right)$.

Unfortunately, $\frac{\lambda_{max}(A)}{\lambda_{min}(A)}$ can be exponential in the number of bits in the input, so fast gradient descent is not a polynomial time algorithm here.

We can use Gaussian elimination.  Each iteration of Gaussian elimination consists of $O(n)$ arithmetic operations per row, and there are $n-1$ iterations total, so Gaussian elimination requires $O(n^3)$ operations.  This seems like it would be polynomial time, but each multiplication operation could potentially double the number of bits required to represent numbers, which in turn would cause the run time to become exponential.

We will modify Gaussian elimination slightly to achieve polynomial runtime, but first, some background.  The determinant of a matrix, conventionally defined as the product of the eigenvalues of the matrix,  can be thought of as the volume of a parallelopiped defined by the rows (or equivalently, the columns) of the matrix.

Lemma 1 (Hadamard’s Inequality): $|Det(a)| \leq \Pi_i \|A_{(i)}\|$.

The bound is the volume of a rectangular prism with side lengths equal to the norm of each of the rows of the matrix.  A rectangular prism will maximize the volume for a given set of side lengths, so the parallelopiped corresponding to the determinant must have volume at most the volume of the rectangular prism.

Lemma 2: $Det(AB) = Det(A)Det(B)$

This implies that scaling rows or columns will also scale the determinant by the same factor, and adding a multiple of any row to another will not change the determinant.

Also, we note that the sum of $n$ things, each of length $2w$, is $2w + \log n$.  As a consequence, the size of the determinant for a $n$ by $n$ matrix, with each entry in the matrix represented in $w$ bits, is at most $\log(n 2^{2w})$.

Now we can give a slightly modified Gaussian elimination algorithm that runs in polynomial time.  At its core, in the $k$th step, latex k $columns are “eliminated”, and latex k$ rows are fixed.  More precisely the first $k$ rows of $A^k$ are the same as those of $A^{k-1}$. The first $k$ columns of $A^k$ are zero below the diagonal. The remaining entries are defined as follows (for $i,j > k$):

$a_{ij}^k = \frac{a_{ij}^{k-1}a_{kk}^{k-1} - a_{ik}^{k-1}a_{kj}^{k-1}}{a_{k-1,k-1}^{k-1}}$,

with $a_{00}^{0} =1$ (essentially, don’t divide by anything in the first iteration).

Theorem: Let $A^k$ be the matrix after the $k$th iteration, so that the first $k$ rows/columns form an upper triangular matrix.  Then, if $A^0=A$ is integral, so is $A^k$, and the number of bits needed to represent each entry of $A^k$ is $O(n(b + \log n))$ where $b$ is the maximum number of bits used for any entry of $A$.

Proof: Once $k$ is chosen, the entry $a_{kk}^k = a_{kk}^{k-1}$ and is fixed for the rest of the algorithm.  We now call that simply $a_{kk}^*$.

Consider the determinant of some $(k+l)\times (k+l)$ minor of $A^k$ containing the first $k$ rows and columns.  The change here from $A^{k-1}$ is that the rows have been multiplied by $\frac{a_{kk}^*}{a_{k-1,k-1}^*}$, which changes the determinant by that factor $\ell$ times.  The remaining part, subtracting multiples of one row from another, does not change the determinant. Then,

$Det(B^k) = \left(\frac{a_{kk}^*}{a_{k-1,k-1}^*}\right)^\ell Det(B^{k-1})$

for any such minor of size $(k+\ell)$ by $(k + \ell)$.

Now, take the minor $B^k$ to be the minor of $A^k$ induced by rows $\{1,2,\ldots,k, i\}$ and columns $\{1,2,\ldots, k, j\}$ for some $i,j >k$. Then,

\begin{aligned} Det(B^k) &= \left(\frac{a_{kk}^*}{a_{k-1,k-1}^*}\right) Det(B^{k-1}) \\ &= \left(\frac{a_{kk}^*}{a_{k-1,k-1}^*}\right) \left(\frac{a_{k-1,k-1}^*}{a_{k-2,k-2}^*}\right)^2 Det(B^{k-2}) \\ &= \left(\frac{a_{kk}^*}{a_{k-1,k-1}^*}\right) \left(\frac{a_{k-1,k-1}^*}{a_{k-2,k-2}^*}\right)^2 ... \left(\frac{a_{11}^*}{a_{00}^*}\right)^k Det(B^{0}) \\ &= a_{kk}^*a_{k-1,k-1}^*...a_{11}^* Det(B^0) \end{aligned}

Since $B^k$ is diagonal, $Det(B^k) = a_{kk}^*a_{k-1,k-1}^*...a_{11}^* a_{ij}^k$, so $a_{ij}^k = Det(B^0)$, which is integral.

The complexity is now $O(n^3)$ operations, each on numbers of size at most $O(n(b+\log n))$, leading to an overall complexity of $O(n^4(b+\log n))$, which is polynomial in terms of the size of the input, as desired.