Due to the decoherence of the state-of-the-art physical implementations of

quantum computers, it is essential to parallelize the quantum circuits to

reduce their depth. Two decades ago, Moore et al. demonstrated that additional

qubits (or ancillae) could be used to design "shallow" parallel circuits for

quantum operators. They proved that any $n$-qubit CNOT circuit could be

parallelized to $O(\log n)$ depth, with $O(n^2)$ ancillae. However, the

near-term quantum technologies can only support limited amount of qubits,

making space-depth trade-off a fundamental research subject for quantum-circuit

synthesis.

In this work, we establish an asymptotically optimal space-depth trade-off

for the design of CNOT circuits. We prove that for any $m\geq0$, any $n$-qubit

CNOT circuit can be parallelized to $O\left(\max \left\{\log n,

\frac{n^{2}}{(n+m)\log (n+m)}\right\} \right)$ depth, with $O(m)$ ancillae. We

show that this bound is tight by a counting argument, and further show that

even with arbitrary two-qubit quantum gates to approximate CNOT circuits, the

depth lower bound still meets our construction, illustrating the robustness of

our result. Our work improves upon two previous results, one by Moore et al.

for $O(\log n)$-depth quantum synthesis, and one by Patel et al. for $m = 0$:

for the former, we reduce the need of ancillae by a factor of $\log^2 n$ by

showing that $m=O(n^2/\log^2 n)$ additional qubits suffice to build $O(\log

n)$-depth, $O(n^2/\log n)$ size --- which is asymptotically optimal --- CNOT

circuits; for the later, we reduce the depth by a factor of $n$ to the

asymptotically optimal bound $O(n/\log n)$. Our results can be directly

extended to stabilizer circuits using an earlier result by Aaronson et al. In

addition, we provide relevant hardness evidences for synthesis optimization of

CNOT circuits in term of both size and depth.