Consider how we would solve using fast gradient descent. We would apply fast GD to
and try to minimize this. From results in previous classes, this would take at most
iterations, each taking
, for a total complexity of
.
Unfortunately, can be exponential in the number of bits in the input, so fast gradient descent is not a polynomial time algorithm here.
We can use Gaussian elimination. Each iteration of Gaussian elimination consists of arithmetic operations per row, and there are
iterations total, so Gaussian elimination requires
operations. This seems like it would be polynomial time, but each multiplication operation could potentially double the number of bits required to represent numbers, which in turn would cause the run time to become exponential.
We will modify Gaussian elimination slightly to achieve polynomial runtime, but first, some background. The determinant of a matrix, conventionally defined as the product of the eigenvalues of the matrix, can be thought of as the volume of a parallelopiped defined by the rows (or equivalently, the columns) of the matrix.
Lemma 1 (Hadamard’s Inequality): .
The bound is the volume of a rectangular prism with side lengths equal to the norm of each of the rows of the matrix. A rectangular prism will maximize the volume for a given set of side lengths, so the parallelopiped corresponding to the determinant must have volume at most the volume of the rectangular prism.
Lemma 2:
This implies that scaling rows or columns will also scale the determinant by the same factor, and adding a multiple of any row to another will not change the determinant.
Also, we note that the sum of things, each of length
, is
. As a consequence, the size of the determinant for a
by
matrix, with each entry in the matrix represented in
bits, is at most
.
Now we can give a slightly modified Gaussian elimination algorithm that runs in polynomial time. At its core, in the th step, latex k $ columns are “eliminated”, and latex k $ rows are fixed. More precisely the first
rows of
are the same as those of
. The first
columns of
are zero below the diagonal. The remaining entries are defined as follows (for
):
,
with (essentially, don’t divide by anything in the first iteration).
Theorem: Let be the matrix after the
th iteration, so that the first
rows/columns form an upper triangular matrix. Then, if
is integral, so is
, and the number of bits needed to represent each entry of
is
where
is the maximum number of bits used for any entry of $A$.
Proof: Once is chosen, the entry
and is fixed for the rest of the algorithm. We now call that simply
.
Consider the determinant of some minor of
containing the first
rows and columns. The change here from
is that the rows have been multiplied by
, which changes the determinant by that factor
times. The remaining part, subtracting multiples of one row from another, does not change the determinant. Then,
for any such minor of size by
.
Now, take the minor to be the minor of
induced by rows
and columns
for some $i,j >k$. Then,
Since is diagonal,
, so
, which is integral.
The complexity is now operations, each on numbers of size at most
, leading to an overall complexity of
, which is polynomial in terms of the size of the input, as desired.