Chapter 6

\[ \newcommand{\BB}{\mathbb{B}} \newcommand{\CC}{\mathbb{C}} \newcommand{\HH}{\mathbb{H}} \newcommand{\KK}{\mathbb{K}} \newcommand{\NN}{\mathbb{N}} \newcommand{\QQ}{\mathbb{Q}} \newcommand{\RR}{\mathbb{R}} \newcommand{\ZZ}{\mathbb{Z}} \newcommand{\ii}{\mathrm{i}} \newcommand{\jj}{\mathrm{j}} \newcommand{\kk}{\mathrm{k}} \newcommand{\dd}{\mathrm{d}} \newcommand{\FT}{\mathcal{F}} % Fourier Transform \newcommand{\tto}{\twoheadrightarrow} \newcommand{\inv}{^{-1}} \newcommand{\RF}{\mathrm{RF}} \newcommand{\sys}{\mathrm{sys}} \newcommand{\diag}{\mathrm{diag}} \newcommand{\cx}{\mathrm{CX}} \newcommand{\cy}{\mathrm{CY}} \newcommand{\cz}{\mathrm{CZ}} \newcommand{\cat}{\ket{\mathrm{cat}}} \newcommand{\catp}[1]{\ket{\mathrm{cat}_{#1}}} \newcommand{\calE}{\mathcal{E}} \newcommand{\calF}{\mathcal{F}} \newcommand{\calH}{\mathcal{H}} \newcommand{\calR}{\mathcal{R}} \newcommand{\abs}[1]{\left|#1\right|} \newcommand{\norm}[1]{\|#1\|} \newcommand{\sprod}[2]{\langle#1|#2\rangle} % deprecated, use braket instead. \newcommand{\braket}[2]{\langle#1|#2\rangle} % scalar product \newcommand{\ptrace}[2]{\mathrm{tr}_{#1}\left(#2\right)} \newcommand{\trace}[1]{\mathrm{tr}\left(#1\right)} \newcommand{\rank}[1]{\mathrm{rank}\left(#1\right)} \newcommand{\floor}[1]{\lfloor#1\rfloor} \newcommand{\ceil}[1]{\lceil#1\rceil} \newcommand{\bra}[1]{\langle#1|} \newcommand{\ket}[1]{|#1\rangle} \newcommand{\proj}[1]{\ket{#1}\bra{#1}} \newcommand{\mean}[1]{\langle#1\rangle} \newcommand{\wt}[1]{\mathrm{wt}\left(#1\right)} \newcommand{\prob}[1]{\mathrm{Prob}\left[#1\right]} \newcommand{\orac}{\mathrm{Orac}} \newcommand{\?}{} % sometimes I need just a separator other than whitespace \]

Table of Contents

Setup

import sympy as sp
from sympy.vector import CoordSys3D

from chapter_4 import Ry, theta, Z

Set up some variables:

a, b = sp.symbols('a b', real=True)  # usage for a,b or alpha,beta in the book
c, s = sp.symbols('c s', real=True)  # usage for cosine and sine

Exercises

Exercise 6.1

Show that the unitary operator corresponding to the phase shift in the Grover iteration is \(2\ket{0}\bra{0}-I\).

Proof

Recall that the phase shift operator in Grover's algorithm does the following to the standard basis states:

\[ \ket{x} \mapsto \begin{cases} \ket{0} & \text{if } x=0 \\ -\ket{x} & \text{otherwise.} \end{cases} \]

Since \(\sprod{0}{x}=\delta_{x,0}\) this is the same as what \(2\ket{0}\bra{0}-I\) does. QED.

Exercise 6.2

Let \(\ket{\psi}=N^{-1/2}\sum_{i=0}^{N-1}\ket{k}=H^{\otimes n}\ket{0}\). Show that the operation \(2\ket{\psi}\bra{\psi}-I\) applied to a general state \(\sum_k\alpha_k\ket{k}\) produces

\[ \sum_k \left[ -\alpha_k + 2\mean{\alpha} \right] \ket{k} , \]

where \(\mean{\alpha}=\sum_k\alpha_k/N\) is the mean value of the \(\alpha_k\). For this reason, \(2\ket{\psi}\bra{\psi}-I\) is sometimes referred to as the inversion about mean operation.

Proof

First of all we note that

\[ \ket{\psi}\bra{\psi} = \frac{1}{N} \sum_{ij} \ket{i}\bra{j} . \]

Hence

\begin{align*} (2\ket{\psi}\bra{\psi}-I) \sum_k \alpha_k \ket{k} &= \sum_k -\alpha_k \ket{k} + 2\sum_{ijk} \frac{\alpha_k}{N} \ket{i}\sprod{j}{k} \\ &= \sum_k -\alpha_k \ket{k} + 2\sum_{ik} \frac{\alpha_k}{N} \ket{i} \\ &= \sum_k -\alpha_k \ket{k} + 2\sum_{i} \mean{\alpha} \ket{i} \\ &= \sum_k (-\alpha_k + 2\mean{\alpha}) \ket{k} . \end{align*}

QED.

Exercise 6.3

Show that in the \(\ket{\alpha}\), \(\ket{\beta}\) basis, we may write the Grover iteration as a rotation

\[ G = \begin{bmatrix} \cos(\theta) & -\sin(\theta) \\ \sin(\theta) & \cos(\theta) \end{bmatrix} , \]

where \(\theta\) is a real number in the range \(0\) to \(\pi/2\) (assuming for simplicity that \(M\leq N/2\); this limitation will be lifted shortly), chosen so that

\[ \sin(\theta) = \frac{2\sqrt{M(N-M)}}{N} . \]

Proof

Recall that \(M\) is the number of solutions of \(f(x)=1\) and that

\[ \ket{\alpha} = \frac{1}{\sqrt{N-M}} \sum_{f(x)=0} \ket{x} , \quad \text{and} \quad \ket{\beta} = \frac{1}{\sqrt{M}} \sum_{f(x)=1} \ket{x} . \]

Moreover

\[ \ket{\psi} = H^{\otimes n} \ket{0} = \underbrace{\sqrt{\frac{N-M}{N}}}_{=:\cos(\theta/2)} \ket{\alpha} + \underbrace{\sqrt{\frac{M}{N}}}_{=:\sin(\theta/2)} \ket{\beta} . \]

This is consistent with the formula for \(\theta\) from the exercise statement, which can be seen by the trigonometric identity \(\sin(2x)=2\sin(x)\cos(x)\). The Grover iteration \(G\) is the product of the inversion about \(\psi\), and the oracle \(\orac\) (inversion about \(\alpha\)). We ignore the workspace here.

\[ G = \mathrm{inv}(\psi) \cdot \orac \]

By definition of \(\alpha\) and \(\beta\) we have

\[ \orac = \begin{bmatrix} 1 & 0 \\ 0 & -1 \end{bmatrix} . \]

Note that this is the Pauli-Z matrix. Let

\[ U(\theta) = \begin{bmatrix} \cos(\theta) & -\sin(\theta) \\ \sin(\theta) & \cos(\theta) \end{bmatrix} . \]

Note that \(U(\theta/2)=R_y(\theta)\) (Pauli-Y rotation) and that \(U(\theta/2)(1,0)\) is the coordinate representation of \(\ket{\psi}\). Hence

\[ \mathrm{inv}(\psi) = U(\theta/2) \cdot \begin{bmatrix} 1 & 0 \\ 0 & -1 \end{bmatrix} \cdot U(-\theta/2) . \]

To show the claim only a short calculation is left. Let us do it with the help of sympy:

Inv_psi = sp.simplify(Ry * Z * Ry.H)  # Ry = Ry(theta)
Orac = Z

assert Ry.subs(theta, 2*theta) == Inv_psi * Orac
"PASSED"
PASSED

Another way to see this (without much calculation) is to note that \(ZU(-\phi)Z=U(\phi)\). QED.

Exercise 6.4

Give explicit steps for the quantum search algorithm, for the case of multiple solutions (\(1 < M \leq N/2\)).

Solution

The difference to the special case \(M=1\) is essentially the number of repetitions of the Grover iteration. Also note that \(M\) has an influence on the success probability.

We assume that \(N=2^n\), \(f:\{0,1\}^n\to\{0,1\}\), and that \(M\) is the number of solutions of \(f(x)=1\), as in the book. Moreover let (recall that we assume \(M\leq N/2\))

\[ \theta = \arcsin\left(\frac{2\sqrt{M(N-M)}}{N}\right) \]

This formula comes from exercise 6.3. In the solution of that exercise we showed that

\[ \ket{\psi} := H^{\otimes n} \ket{0} = \cos(\theta/2) \ket{\alpha} + \sin(\theta/2) \ket{\beta} . \]

Here \(\alpha\) and \(\beta\) are as in the solution of exercise 6.3. Let us write

\[ \ket{\varphi} := \cos(\varphi) \ket{\alpha} + \sin(\varphi) \ket{\beta} . \]

Hence \(\ket{\psi}=\ket{\theta/2}\), \(\beta=\ket{\pi/2}\) and \(G\) acts as \(\ket{\varphi}\mapsto\ket{\varphi+\theta}\). In particular, since we aim to rotate \(\ket{\psi}\) to \(\ket{\beta}\) (approximately), the number of repetitions for the Grover algorithm is:

\[ R = \mathrm{round}\left( \frac{\pi}{2\theta} - \frac{1}{2} \right) = O \left( \sqrt{\frac{N}{M}} \right) , \]

(for \(M\leq N/2\)).

Algorithm (Quantum Search (Grover) for \(0 < M \leq N/2\))
Inputs
  • An \(n\) bit register and a one bit register.
  • A black box oracle \(U_f\) which performs the transformation \(U_f\ket{x,q}=\ket{x,q\oplus f(x)}\).
Outputs
A solution \(x_0\) of \(f(x_0)=1\) with probability at least \(1-\sin(\theta/2)^2=1-M/N\geq1/2\) (since the Grover iterations miss \(\pi/2\) by at most \(\theta/2\)). Every solution has the same chance to get "chosen".
Runtime
\(R=O(\sqrt{N/M})\) calls to the oracle and to \((2\ket{\psi}\bra{\psi}-I)\), and \(O(n)\) gates/operations for other things.
Procedure
  1. Initialize registers to \(\ket{0}\ket{0}\).
  2. Apply \(H^{\otimes n}\otimes HX\):

    \[ \rightarrow \frac{1}{2^{n/2}} \sum_{x=0}^{2^n-1} \ket{x} \ket{-} = \ket{\psi}\ket{-} = \ket{\theta/2}\ket{-} \]

  3. Apply \(R\) repetitions of \((2\ket{\psi}\bra{\psi}-I)U_f\):

    \[ \rightarrow \ket{(2R+1)\theta/2}\ket{-} \approx \ket{\beta}\ket{-} \]

  4. Measure the first register to obtain some \(x_0\). Return \(x_0\).

Exercise 6.5

Show that the augmented oracle \(U_f'\) may be constructed using one application of the original oracle \(U_f\), and elementary quantum gates, using the extra qubit \(\ket{q}\).

  • What strikes me is the fact that it is possible to do this with just one application of the oracle! We will see that it is not possible to do this classically in a reversible way.

Proof

According to the book we have to implement \(U_f'\) as a controlled (by \(q\)) version of \(U_f\) which acts like this:

\[ \ket{q,x,y} \mapsto \ket{q,x,y\oplus \overline{q}f(x)} . \]

Recall that the techniques of chapter 4 require to decompose \(U_f\) into elementary gates and recursively add controls. We do not do this for the following reasons:

  • \(U_f\) is a black box. So we are not supposed to assume anything about its internal workings. In particular it might not be implemented by the "usual" elementary gates.
  • Even if it was (and we get access to the circuit during execution of the algorithm) the generic standard recursive procedure produces a relatively complicated circuit (in general).
  • There is a simple way to get \(U'_f\) using \(U_f\) as a black box!

The key to see this is the fact that the oracle \(U_f\) leaves \(\ket{x}\ket{+​}\) fixed. Let us introduce a fourth (auxiliary) register holding one qubit, initialized to \(\ket{+}\). We want to implement \(U'_f\) as:

\[ \ket{q,x,y,+} \mapsto \ket{q,x,y\oplus \overline{q}f(x),+} . \]

It is not hard to see that

\[ U'_f=\mathrm{CSWAP}(1,3,4)\cdot U_f \cdot \mathrm{CSWAP}(1,3,4) \]

accomplishes exactly that. Here \(\mathrm{CSWAP}(1,3,4)\) is the controlled swap gate exchanging registers three and four conditioned on register one. QED.

Proof of the remark

In the framework of (classical) reversible computation as sketched in chapter 3.2.5 in the book we need at least two applications of the oracle.

Before we prove this let us first show a classical reversible algorithm which uses two applications of the oracle:

\begin{align*} \ket{q,x,y,0} &\stackrel{U_f(2,4)}{\longmapsto} &\ket{q,x,y,f(x)} \\ &\stackrel{X(1)}{\longmapsto} &\ket{\overline{q},x,y,f(x)} \\ &\stackrel{\mathrm{Toff}(1,4,3)}{\longmapsto} &\ket{\overline{q},x,y\oplus\overline{q}f(x),f(x)} \\ &\stackrel{X(1)}{\longmapsto} &\ket{q,x,y\oplus\overline{q}f(x),f(x)} \\ &\stackrel{U_f(2,4)}{\longmapsto} &\ket{q,x,y\oplus\overline{q}f(x),0} \end{align*}

The second application of the oracle is needed to uncompute the garbage from the auxiliary register. An intuitive way to see the necessity of the second application is to look at the information present in the registers. In case of \(q=1\) the final state does not contain a single bit of information about \(f\). On the other hand, if we apply the oracle just once it leaves one bit of information about \(f\) (the value of \(f\) at some input, but not necessarily at \(x\)). So the computation after the application of the oracle would have to erase the information which is not possible with reversible computation (without using the oracle, and hence some information about \(f\), again).

Let us give a (hopefully) more rigorous proof. Assume to the contrary that it was possible with just one application of the oracle and an additional \(m\) qubit auxiliary register (initialized to \(0\)). The circuit (which is just a sequence of reversible gates) can be divided into three parts. The part \(A\) before the oracle, the application of the oracle \(U_f\), and the part \(B\) after the oracle:

\begin{align*} \ket{q,x,y,0} &\stackrel{A}{\longmapsto} \ket{a_1, a_2, a_3, a_4} \\ &\stackrel{U_f(p)}{\longmapsto} \ket{a'_1, a'_2, a'_3, a'_4} \\ &\stackrel{B}{\longmapsto} \ket{q,x,y\oplus\overline{q}f(x),0} \end{align*}

In the second step the \(p\) "signifies" our choice to what \(n+1\) qubits to apply the oracle. Let us merge the \(a_i\) into a single vector \(z\) (just for notational simplicity). Same for \(z'\) vs the \(a_i\).

After the oracle step for a certain \(j_0\) we have \(z'_{j_0}=z_{j_0}\oplus\,f(\tilde{x})\). Here the \(\tilde{x}=\tilde{x}(q,x,A,p)\) only depends on \(q\), \(x\), the first part \(A\), and the places \(p\) to which the oracle is applied. Note that for \(q=0\) clearly \(\tilde{x}=x\) (otherwise, there is no way to obtain \(f(x)\)), but for \(q=1\) (the interesting case) we cannot conclude this.

Now let us simulate the above circuit on a larger circuit with four zones. A zone is just a group of registers. The first zone is a read-only zone containing \(q,x,y\) all the time. The second and third zone contains the same four registers (each) as the above circuit. The last zone contains only a one-qubit circuit.

Let us denote by \(\mathrm{Sim}(A,i)\) for \(i\in\{2,3\}\) the circuit which executes \(A\) at zone \(i\). The simulations of \(U_f\) and \(B\) are defined similarly.

Consider the circuit which acts as follows:

  1. Start \[ \ket{q,x,y;0;0;0} \]
  2. Copy \(q,x,y\) to zone \(2\): \[ \rightarrow \ket{q,x,y;q,x,y,0;0;0} \]
  3. Apply \(\mathrm{Sim}(A,2)\): \[ \rightarrow \ket{q,x,y;z;0;0} \]
  4. Apply \(\mathrm{Sim}(U_f(p),2)\): \[ \rightarrow \ket{q,x,y;z';0;0} \]
  5. Apply \(\mathrm{Sim}(B,2)\): \[ \rightarrow \ket{q,x,y;q,x,y\oplus \overline{q}f(x),0;0;0} \]

Now let us see how we can extract \(f(\tilde{x})\) from the final state (without using the oracle). The possibility of being able to do this is a contradiction since clearly no traces of \(f\) are left in the final state for \(q=1\).

  1. Uncompute \(\mathrm{Sim}(B,2)\): \[ \rightarrow \ket{q,x,y;z';0;0} \]
  2. Copy \(q,x,y\) to zone \(3\): \[ \rightarrow \ket{q,x,y;z';q,x,y,0;0} \]
  3. Apply \(\mathrm{Sim}(A,3)\): \[ \rightarrow \ket{q,x,y;z';z;0} \]
  4. Add zone three to zone two (using bit-wise addition):

    \[ \rightarrow \ket{q,x,y;z'';z;0} \]

    Note that \(z''\) contains just one non-trivial bit \(z''_{j_0}=f(\tilde{x})\) (the others are trivially zero).

  5. Copy \(z''_{j_0}=f(\tilde{x})\) to the fourth zone and erase it from zone two (by a CNOT controlled by the fourth zone): \[ \rightarrow \ket{q,x,y;0;z;f(\tilde{x})} \]
  6. Uncompute \(\mathrm{Sim}(A,3)\) (not important, just for reasons of feng shui): \[ \rightarrow \ket{q,x,y;0;0;f(\tilde{x})} \]

We have managed to extract a bit of information about \(f\) no matter the value of \(q\). This is a contradiction as already explained above. QED.

Exercise 6.6

Verify that the gates in the dotted box in the second figure of Box 6.1 perform the conditional phase shift operation \(2\ket{00}\bra{00}-I\), up to an unimportant global phase factor.

Proof

The circuit in the dotted region implements the following operator

\[ X \otimes X \cdot \underbrace{I\otimes H \cdot C(X) \cdot I\otimes H}_{C(Z)} \cdot X \otimes X \]

Recall that the controlled Z-gate acts as follows on the standard basis:

\[ C(Z) \ket{x} = \begin{cases} -\ket{11} & \text{for } x=11 \\ \ket{x} & \text{otherwise.} \end{cases} \]

The X-gate just exchanges the roles of \(0\) and \(1\):

\[ X \otimes X \cdot C(Z) \cdot X \otimes X \cdot \ket{x} = \begin{cases} -\ket{00} & \text{for } x=00 \\ \ket{x} & \text{otherwise.} \end{cases} \]

Up to a factor of \(-1\) (the global phase factor) this is precisely what \(2\ket{00}\bra{00}-I\) does. QED.

Exercise 6.7

Verify that the circuits shown in Figures 6.4 and 6.5 implement the operations \(\exp(-\ii\ket{x}\bra{x}\Delta t)\) and \(\exp(-\ii\ket{\psi}\bra{\psi}\Delta t)\), respectively, with \(\ket{\psi}=H^{\otimes n}\ket{0}\).

Remark

There is a typo in the rotation matrix. The matrix should be

\[ \begin{bmatrix} 1 & 0 \\ 0 & e^{-\ii\Delta t} \end{bmatrix} \]

The minus-sign is missing in the figures of the circuits.

Proof for the first circuit

Since \(\proj{x}\) is a projection (that is, an operator for which \(P^2=P\)) the series expansion of the exponential function implies

\[ e^{-\ii\proj{x}\Delta t} = e^{-\ii\Delta t} \proj{x} + (I - \proj{x}) \]

Hence

\[ e^{-\ii\proj{x}\Delta t} \ket{y} = \begin{cases} \ket{y} & \text{if } y\neq x \\ e^{-\ii\Delta t} \ket{x} & \text{if } y=x \end{cases} \; = e^{-\ii\Delta t f(y)} \ket{y} . \]

On the other hand this is exactly what the circuit (\(U_f\mathrm{diag}(1,\exp(-\ii\,\Delta\,t))U_f\)) does.

\begin{align*} \ket{y,0} &\stackrel{U_f}{\longmapsto} \ket{y,f(y)} \\ &\stackrel{[\ldots]}{\longmapsto} e^{-\ii\Delta t f(y)} \ket{y,f(y)} \\ &\stackrel{U_f}{\longmapsto} e^{-\ii\Delta t f(y)} \ket{y,0} \end{align*}

QED.

Proof for the second circuit

In the same way as in the first part we see that

\[ e^{-\ii\proj{\psi}\Delta t} \ket{\varphi} = \begin{cases} \ket{\varphi} & \text{if } \ket{\varphi}\perp \ket{\psi} \\ e^{-\ii\Delta t} \ket{\psi} & \text{if } \ket{\varphi}=\ket{\psi} \end{cases} \]

Again this is exactly what the circuit does. Let us first consider the case \(\ket{\varphi}=\ket{\psi}\):

\begin{align*} \ket{\psi,0} &\stackrel{H^{\otimes n}}{\longmapsto} \ket{0,0} \\ &\stackrel{C_0^n(X)}{\longmapsto} \ket{0,1} \\ &\stackrel{[\ldots]}{\longmapsto} e^{-\ii\Delta t} \ket{0,1} \\ &\stackrel{C_0^n(X)}{\longmapsto} e^{-\ii\Delta t} \ket{0,0} \\ &\stackrel{H^{\otimes n}}{\longmapsto} e^{-\ii\Delta t} \ket{\psi,0} . \end{align*}

For the case \(\ket{\varphi}\perp\ket{\psi}\) we only have to note that \(H^{\otimes\,n}\ket{\varphi}\) is orthogonal to \(\ket{0}\) and hence the controlled \(X\) gate does nothing. QED.

Exercise 6.8

Suppose the simulation step is performed to an accuracy \(O(\Delta t^r)\). Show that the number of oracle calls required to simulate \(H\) to reasonable accuracy is \(O(N^{-r/2(r-1)})\). Note that as \(r\) becomes large the exponent of \(N\) approaches \(1/2\).

Proof

From equation (6.23) we know that the simulation time is \(t=\pi/2\alpha=O(\sqrt{N})\). Hence the number of steps is \(t/\Delta t = O(\Delta t\inv \sqrt{N})\). Since the error of a product of unitary operators scales linearly we see that the overall error is

\[ O(\Delta t^{r-1} \sqrt{N}) \]

This must be \(O(1)\) and hence we must choose \(\Delta t = O(N^{-1/2(r-1)})\). To minimize the number of steps we set \(\Delta t \approx N^{-1/2(r-1)}\) and obtain the claim. QED.

Exercise 6.9

Verify Equation (6.25): Let \(c=\cos(\Delta t/2)\), \(s=\sin(\Delta t/2)\):

\[ U(\Delta t) = \left( c^2 - s^2 \, \vec{\psi}\cdot\hat{z} \right) I - \ii s \left(c \, (\vec{\psi}+\hat{z}) + s \, \vec{\psi}\times\hat{z} \right) \cdot \vec{\sigma} . \]

(Hint: see Exercise 4.15.)

Proof

Let \(\hat{z}=(0,0,1)\) and \(\vec{\psi}=(2\alpha\beta,0,\alpha^2-\beta^2)\). From the paragraph above equation (6.25) we deduce

\[ U(\Delta t) = e^{-\ii \proj{\psi} \Delta t} \cdot e^{-\ii \proj{x} \Delta t} = e^{-\ii \Delta t} \cdot e^{-\ii \vec{\psi}\cdot\vec{\sigma} \Delta t/2} \cdot e^{-\ii \hat{z}\cdot\vec{\sigma} \Delta t/2} . \]

The first of the three terms is the unimportant global phase. The second and the third are rotations around \(\vec{\psi}\) and \(\hat{z}\) and angle \(\Delta t\) (both). Hence we are indeed in the setting of exercise 4.15. From that exercise we deduce that \(U(\Delta t)\) is a rotation around an axis \(\vec{n}\) by an angle \(\theta\)

\[ U(\Delta t) = e^{-\ii \Delta t} \left( \cos(\theta/2) I + \sin(\theta/2) \vec{n}\cdot\vec{\sigma} \right) \]

such that:

\[ \cos(\theta/2) = c^2 - s^2 \, \vec{\psi}\cdot\hat{z} \]

and

\[ \sin(\theta/2) \vec{n} = sc \, (\vec{\psi}+\hat{z}) + s^2 \, \vec{\psi}\times\hat{z} . \]

QED.

Exercise 6.10

Show that by choosing \(\Delta t\) appropriately we can obtain a quantum search algorithm which uses \(O(\sqrt{N})\) queries, and for which the final state is \(\ket{x}\) exactly, that is, the algorithm works with probability \(1\), rather than with some smaller probability.

Proof

To not obscure the proof by too much technicalities I only prove the result for large enough \(N\). Let \(c=\cos(\Delta t/2)\) and \(s=\sin(\Delta t/2)\).

From the proof of exercise 6.9 we know that \(U(\Delta t)\) is a rotation of angle \(\theta\) about \(\vec{r}\) where:

\[ \cos(\theta/2) = c^2 - s^2 \, \vec{\psi}\cdot\hat{z} \]

and

\[ \sin(\theta/2) \vec{r} = sc \, (\vec{\psi}+\hat{z}) + s^2 \, \vec{\psi}\times\hat{z} . \]

Also recall that \(\hat{z}=(0,0,1)\) and \(\vec{\psi}=(2\alpha\beta,0,\alpha^2-\beta^2)\) where \(\alpha=1/\sqrt{N}\) and \(\beta=\sqrt{(N-1)/N}\). Let us further elaborate on \(\theta\). Using one of the above formulas we get

\[ \cos(\theta/2) = 1 - \frac{2s}{N} . \]

From the Taylor expansion \(\cos(x)=1-x^2/2+O(x^4)\) of the cosine function we obtain

\[ \theta = 4 \sqrt{\frac{s}{N}} + O(sN\inv) . \]

Now consider the axis of rotation. Below with \(\sim\) we mean that the LHS is equal to the RHS up to a real factor. Using the axis-formula a short calculation yields

\[ \vec{r} \sim (\beta c, -\beta s, \alpha c) . \]

The squared norm of the RHS is \(1-\alpha^2s^2\). Hence

\[ \vec{r} = (\beta c, -\beta s, \alpha c) / \sqrt{1-\alpha^2s^2} = (\beta c, -\beta s, \alpha c) + O(s^2N\inv) . \]

From the original formula for \(\vec{r}\) we see that \(\vec{r}\cdot\hat{z}=\vec{r}\cdot\vec{\psi}\approx\alpha\,c=O(N^{-1/2})\). This implies that \(\hat{z}\) indeed lies on the same orbit as \(\psi\). Let \(\omega\) be the angle of the rotation about \(\vec{r}\) which is needed to rotate \(\vec{\psi}\) into \(\hat{z}\). Let

\[ \hat{z}' = \hat{z} - (\hat{z}\cdot\vec{r}) \; \vec{r} = \hat{z} - \frac{\alpha c}{\sqrt{1-\alpha^2 c^2}} \; \vec{r} = \hat{z} + O(N^{-1/2}) \; \vec{r} \]

and

\[ \vec{\psi}' = \vec{\psi} - (\vec{\psi}\cdot\vec{r}) \; \vec{r} = \vec{\psi} - \frac{\alpha c}{\sqrt{1-\alpha^2 c^2}} \; \vec{r} = \vec{\psi} + O(N^{-1/2}) \; \vec{r} . \]

Hence

\[ \cos(\omega) = \vec{\psi}' \cdot \hat{z}' = \vec{\psi} \cdot \hat{z} + O(N\inv) = \alpha^2 - \beta^2 + O(N\inv) = -1 + O(N\inv) , \]

and therefore \(\omega=\pi+O(N^{-1/2})\). Using this and the formula for \(\theta\) we see that the number of required \(\theta\) rotations is

\[ \frac{\omega}{\theta} = \frac{\pi}{4\sqrt{s}} \sqrt{N} + O(1) . \]

To enable a 100% probability this number must be an integer. The standard case with \(\Delta t=\pi\) means \(s=1\). By the intermediate value theorem (the expression is continuous) there is an \(s\in[1-cN^{-1/2},1]\) (for some constant \(c>0\) depending on the constant in the \(O(1)\) above) such that this is true. Clearly \(\omega/\theta=O(\sqrt{N})\) for such \(s\). QED.

Appendix

Let us use sympy to do the boring calculations. First let us define \(\hat{z}\) and \(\vec{\psi}\) in a 3D cartesian coordinate system. Moreover we set up our slightly scaled version r1 of \(\vec{r}\).

N = CoordSys3D('N')

z = N.k
psi = 2*a*b*N.i + (a**2 - b**2)*N.k

r_formula_rhs = s*c*(psi + z) + s**2 * psi.cross(z)
r1 = b*c*N.i - b*s*N.j + a*c*N.k

Here we check that our formula for \(\vec{r}\) is valid:

norm2_r1 = r1.dot(r1).subs(b**2, 1 - a**2).subs(c**2, 1 - s**2)

assert r_formula_rhs.subs(b**2, 1 - a**2) == 2*a*s * r1
assert sp.simplify(norm2_r1) == 1 - a**2 * s**2
"PASSED"
PASSED

Exercise 6.11 (Multiple solution continuous quantum search)

Guess a Hamiltonian with which one may solve the continuous time search problem in the case where the search problem has \(M\) solutions.

Solution

My Ansatz looks as follows

\[ H = \proj{\chi} + \proj{\psi} , \]

where

\[ \chi = \frac{1}{\sqrt{M}} \sum_{f(x)=1} \ket{x} \]

and we let \(\ket{\psi}\) unspecified at first (as always we take \(\ket{\psi}=H^{\otimes\,n}\ket{0}\) in the end). It turns out that section 6.2 essentially works out analogously with this Hamiltonian if one replaces \(x\) with \(\chi\).

In particular we arrive at the same formula for the Hamiltonian and formula (6.23) which states that the corresponding unitary group rotates (the still arbitrary) \(\ket{\psi}\) into \(\ket{\chi}\):

\[ \cos(\alpha t) \ket{\psi} - i\sin(\alpha t) \ket{\chi} . \]

Thus after time \(t=\pi/2\alpha\) the system is rotated in a state which when measured produces a solution of \(f(x)=1\) (equal probability for each possibility). So what is the difference? Something must be different, right? I am glad you asked! Actually so far nothing is different!

Essentially the only thing which differs is the value of \(\alpha\) once we fix \(\ket{\psi}\). As in chapter 6.2 we have to choose \(\ket{\psi}\) in a way which allows us to determine \(\alpha\) without knowing \(\ket{\chi}\). This is where we fix \(\ket{\psi}=H^{\otimes\,n}\ket{0}\). This in turn leads to

\[ \alpha=\sqrt{M/N} . \]

Let us finally investigate the implications for the performance of an approximation similar to the one in the book. It is easiest to just follow the proof of exercise 6.10. For simplicity we restrict our short analysis to the case \(M=o(N)\).

The angle \(\theta\) of rotation is determined by (c.f. the case for \(M=1\))

\[ \cos(\theta/2) = 1 - \frac{2Ms}{N} . \]

Hence

\[ \theta = 4 \sqrt{\frac{sM}{N}} + O(sMN\inv) . \]

The same reasoning as in exercise 6.10 allows us to estimate the angle \(\omega\) between \(\vec{\psi}\) and \(\hat{z}\):

\[ \cos(\omega) = -1 + O(MN\inv) . \]

Hence \(\omega\approx\pi\). We conclude that the number of iterations is \(\omega/\theta=\Theta(\sqrt{N/M})\) (if we choose \(\Delta t=\pi\) and thus \(s=1\)). But keep in mind that we used \(M=o(N)\) in this analysis!

Exercise 6.12 (Alternative Hamiltonian for quantum search)

Suppose

\[ H = \ket{x}\bra{\psi} + \ket{\psi}\bra {x} . \]

  1. Show that it takes time \(O(1)\) to rotate from the state \(\ket{\psi}\) to the state \(\ket{x}\), given an evolution according to the Hamiltonian \(H\).
  2. Explain how a quantum simulation of the Hamiltonian \(H\) may be performed, and determine the number of oracle calls your simulation technique requires to obtain the solution with high probability.

Proof of (1)

Let \(\ket{\psi}=\alpha\ket{x}+\beta\ket{y}\). In the basis \((\ket{x},\ket{y})\)

\[ H = \alpha I + \beta X + \alpha Z . \]

Note that the term involving the identity \(I\) is not important since it only corresponds to a global phase in \(e^{-\ii Ht}\). Recall from the book that the Hamiltonian \(H_0=\proj{x}+\proj{\psi}\) is equal to \(I+\alpha(\beta X+\alpha Z)\). Hence

\[ e^{-\ii Ht} \sim e^{-\ii H_0 \alpha \inv t} \]

where \(\sim\) means up to a global phase. We already know from equation (6.23) that \(H_0\) rotates \(\ket{\psi}\) into \(\ket{x}\) after time \(t=\pi/2\alpha\). Hence \(H\) does the same after time \(t=\pi/2=O(1)\). QED.

Solution to (2)

First of all note that the two summands of \(H\) (I mean \(\ket{x}\bra{\psi}\) and \(\ket{\psi}\bra {x}\)) are not self adjoint. Hence their exponentials (e.g. \(e^{-\ii\ket{x}\bra{\psi}}\)) are not unitary. So we cannot simulate the summands by a circuit.

On the other hand from the first part we already know that the exponential of \(H\) is essentially a time-scaled version of the exponential of \(H_0\). So if we can simulate the latter this yields a simulation of the former. The resource requirements are the same of course. Not sure if this is the intended meaning of the exercise though.

Exercise 6.13

Consider a classical algorithm for the counting problem which samples uniformly and independently \(k\) times from the search space, and let \(X_1,\ldots,X_k\) be the results of the oracle calls, that is, \(X_j=1\) if the \(j\)​th oracle call revealed a solution to the problem, and \(X_j=0\) if the \(j\)​th oracle call did not reveal a solution to the problem. This algorithm returns the estimate \(S=\frac{N}{k}\sum_{j=1}^{k}X_j\) for the number of solutions to the search problem. Show that the standard deviation of \(S\) is \(\Delta\,S=\sqrt{M(N-M)/k}\). Prove that to obtain a probability at least \(3/4\) of estimating \(M\) correctly to within an accuracy \(\sqrt{M}\) for all values of \(M\) we must have \(k=\Omega(N)\).

Proof

Computing the standard deviation \(\Delta\,S\)

Let us denote by \(P(\ldots)\) and \(E(\ldots)\) the probability of some event and the expectation of some random variable.

According to the exercise statement the random variables \(X_i\) are iid. Whenever the index \(j\) is irrelevant we denote by \(X\) any of the \(X_j\). We have

\[ p := P(X=1) = \frac{M}{N}, \quad P(X=0) = 1-p = \frac{N-M}{N} . \]

The mean value of the \(X_j\) is

\[ \mu := E(X) = p\cdot 1 + (1-p)\cdot 0 = \frac{M}{N} = p . \]

Hence, by linearity \(E(S)=N\mu=M\). This justifies \(S\) as an estimator for \(M\). The variance is

\begin{align*} \Delta\,S^2 = E((S-M)^2) &= \frac{N^2}{k^2} E((\sum_j X_j-\mu)^2) \\ &= \frac{N^2}{k^2} \sum_j E((X-\mu)^2) \\ &= \frac{N^2}{k} E(X-2\mu X+\mu^2)) \\ &= \frac{N^2}{k} p(1-p) \\ &= \frac{M(N-M)}{k} . \end{align*}

For the second equality we use that due to independence \(E(X_iX_j)=E(X_i)E(X_j)\) for \(i\neq\,j\). For the third equality we use \(X^2=X\).

Proof of an upper bound for \(k\)

The exercise does not really ask for this but we do it anyway. For reasons to become clear at the end we consider the accuracy \(C\sqrt{M}\) (instead of just \(\sqrt{M}\)) for some \(C>0\). By Chebychev's inequality for any \(\lambda > 0\):

\[ P(\abs{S-M} \geq \lambda\Delta S) \leq \frac{1}{\lambda^2} . \]

Choosing \(\lambda\) such that \(\lambda\Delta S=C\sqrt{M}\) yields

\[ P(\abs{S-M} \geq \sqrt{M}) \leq \frac{1}{\lambda^2} = \frac{N-M}{C^2k} . \]

Hence \(k=4N/C^2\) is sufficient for the LHS to be smaller than \(1-3/4=1/4\). This is the desired upper bound. QED.

Remark
Clearly the upper bound for the case \(C\leq2\) is not really useful. For those \(C\) we should either hope to improve our estimate or just systematically visit all \(N\) points in the search space (which yields the exact count with certainty instead of an estimate just sometimes!).
Proof of \(k=\Omega(N)\)

We assume that \(N\) is even. This simplifies the proof and is a reasonable assumption in our context since the quantum counting algorithm even assumes that \(N\) is a power of two.

Assume \(k\) is chosen in a way that

\[ P(\abs{S-M} \leq \sqrt{M}) \geq \frac{2}{3} \]

holds for all \(M\). In particular this must be satisfied for \(M=N/2\). Let \(Y=\sum_jX_j\). Then

\[ \abs{S-M} \leq \sqrt{M} \text{ with } M=N/2 \quad \Longleftrightarrow \quad \frac{k}{2} - \frac{k}{\sqrt{2N}} \leq Y \leq \frac{k}{2} + \frac{k}{\sqrt{2N}} \]

Note that

\[ p_y := P(Y=y) = \binom{k}{y} 2^{-k} \]

Let \(\alpha=k/\sqrt{2N}\). The target probability requirement is equivalent to

\[ \sum_{k/2-\alpha\leq y \leq k/2+\alpha} p_y \geq \frac{3}{4} . \]

Recall that in our analysis we only consider asymptotic estimates for \(N\to\infty\). It is not hard to see that the previous inequality readily implies \(k\to\infty\) as \(N\to\infty\). This is a first step to prove \(k=\Omega(N)\). Now let us analyze the LHS more deeply. From the formula for \(p_y\) together with Stirling's formula we see that for the \(p_y\) occurring in the LHS (abbreviating \(s=y/k\))

\begin{align*} p_y &= \frac{\sqrt{2\pi k}(k/e)^k}{\sqrt{2\pi y}(y/e)^y \sqrt{2\pi (k-y)}((k-y)/e)^{k-y}} (1 + o(1)) 2^{-k} \\ &= \sqrt{\frac{2}{\pi k}} \cdot \frac{2^{-k}}{(1-s)^{k-y} s^y} (1 + o(1)) \end{align*}

Let us define \(\varepsilon\) by \(y=(k+\varepsilon k)/2\). By assumption \(\varepsilon\leq\sqrt{2/N}\). Hence

\begin{align*} p_y &= \sqrt{\frac{2}{\pi k}} \cdot \frac{1 + o(1)}{(1-\varepsilon)^{k-y} (1+\varepsilon)^y} \\ &= \sqrt{\frac{2}{\pi k}} \cdot \frac{1 + o(1)}{(1-\varepsilon)^{(1-\varepsilon)k/2} (1+\varepsilon)^{(1+\varepsilon)k/2}} \\ &= \sqrt{\frac{2}{\pi k}} \cdot \frac{1 + o(1)}{(1-\varepsilon^2)^{k/2}} \cdot \left[ \frac{1-\varepsilon}{1+\varepsilon} \right]^{\varepsilon k/2} . \end{align*}

Let us assume that there is a special sequence of values for \(k\) which is \(o(N)\) while still satisfying the probability estimate (the proof is done if we can rule out this possibility). Using \((1\pm\delta)^{o(\delta\inv)}=1+o(1)\) if \(\delta\to0\) (this is related to the well-known formula \(e^x=\lim_{n\to\infty}(1+\frac{x}{n})^n\)) we obtain

\[ p_y = \sqrt{\frac{2}{\pi k}} \cdot (1 + o(1)) \]

for \(k\) from the special sequence. Under these assumptions the LHS of the probability estimate is not large enough:

\[ \sum_{k/2-\alpha\leq y \leq k/2+\alpha} p_y = O\left(k^{-1/2}\cdot \frac{k}{\sqrt{N}}\right) = O\left(\sqrt{\frac{k}{N}}\right) = o(1) \leq \frac{3}{4} \text{ eventually} . \]

This is a contradiction! Hence such a special sequence does not exist. Thus \(k=\Omega(N)\). QED.

Exercise 6.14

Prove that any classical counting algorithm with probability at least \(3/4\) for estimating \(M\) correctly to within an accuracy \(c\sqrt{M}\) for some constant \(c\) and for all values of \(M\) must make \(\Omega(N)\) oracle calls.

Remark
This is vaguely related to the fact that the sample mean is a so called Uniformly Minimum Variance Unbiased Estimator (UMVUE) for the mean value parameter of the Bernoulli distribution. See libretexts.org.

Proof

The exercise is a little bit vague in my opinion - but that is OK. If we speak about probabilities there must be a probability distribution somewhere. The exercise doesn't explicitly define one. Sometimes the distribution is given implicitly. In this case I think there are at least two options:

  • Postulate a probability distribution on the inputs.
  • Assume that the oracle calls are at randomly selected arguments.

I solved the exercise for both interpretations.

Interpretation 1

For a given \(N\), any such algorithm does essentially the following:

  1. Input: An oracle computing a function \(f:\{0,\ldots,N-1\}\to\{0,1\}\).
  2. Determine \(\vec{X}=(X_1,\ldots,X_k)\), where \(X_j=f(N_j)\) for some \(N_j < N\).
  3. Compute \(h(\vec{X})\) and return the result.

As we are interested in the asymptotics of \(k\) as \(N\to\infty\) we actually have to take into account sequences of algorithms which might be specifically tailored to each value of \(N\). In particular the sequence of algorithms (their "source code") might not be computable. This does not complicate the analysis at all (nothing we do here depends on computability - measure theory is the framework which governs the math here) but it is worth pointing out since the exercise is formulated with computing in mind.

Of course, a concrete algorithm might not actually separate steps 2 and 3 but it is always possible to change the algorithm so that it behaves like that by essentially running it twice: the first time we just remember the \(X_j\) and the second time we plugin the already computed values for \(X_j\). Of course this is a bit dumb but it simplifies our analysis.

The \(N_j\) need not be constant. In fact, any \(N_j\) might depend on anything what previously was computed, including already known \(X_{j'}\). We may assume that the \(N_j\) are all different. If this was not the case we may change the algorithm by inserting a subroutine which checks if \(N_j\) already appeared and just retrieves a copy of the corresponding \(X_{j'}\). In particular \(k\) might also depend on the course of computation and hence is not necessarily just a function of \(N\). For that reason \(h\) essentially takes a list of \(X_j\) as input and not just a fixed size vector.

As the algorithms are supposed to be arbitrary they might in particular be deterministic (meaning that they compute a function mapping oracles to integers). In that case it wouldn't even make sense to speak about a probability of \(3/4\) that something happens! To introduce randomness to the setting one could postulate a probability distribution over the inputs. Note that under this assumption \(M\) is a random variable.

One possible distribution is the uniform distribution over all \(f\) (all possible \(2^N\) oracles occur with the same probability \(2^{-N}\)).

Assuming a uniform distribution over all possible \(f\) (the input of the algorithm) we may deduce that the \(X_j\) are independent and identically distributed (iid) according the Bernoulli distribution with parameter \(p=M/N\) (as in exercise 6.13). For the independence it is important that the \(N_j\) are all different.

We show now that in that case the exercise statement is wrong (at least for sufficiently large \(c\))! In fact just estimating \(N/2\) regardless of the input (\(f\)) does the job! The reason for this is that most oracles just have \(M=N/2+O(N^{-1/2})\) under this distribution.

Theorem
Assuming a uniform distribution over all possible \(f\) the algorithm Return N/2 is accurate within \(cN^{-1/2}\) with probability \(1-o(1)\) as \(N\to\infty\) and \(c\to\infty\). In particular no oracle calls are needed.

Proof

This statement follows from the central limit theorem. Let us proof it directly - just for fun (it is not difficult).

Let us abbreviate \(R=O(m\inv + (N-m)\inv)\) in the following. The probability that the number \(M\) of solutions to \(f(x)=1\) (for a random \(f\)) is \(m\) is

\[ P(M=m) = \binom{N}{m} 2^{-N} . \]

A straightforward calculation using Stirling's formula \(n!=\sqrt{2\pi\,n}(n/e)^n(1+O(n\inv))\) shows

\[ P(M=m) = \frac{N\inv}{\sqrt{2\pi\sigma^2}} \left[ 2(1-p)^{1-p} p^p \right]^{-N} \; (1 + R) . \]

Here we set \(p=m/N\) and \(\sigma^2=\frac{p(1-p)}{N}\). Let us define the entropy function \(H(p)=-p\ln(p)-(1-p)\ln(1-p)\). In case you do not know how this function looks like you should search the internet for a plot. Wikipedia should have one. Plugging in the entropy function we get

\[ P(M=m) = \frac{N\inv}{\sqrt{2\pi\sigma^2}} e^{-(H(p) - \ln(2))N} \; (1 + R) . \]

Observe that \(H\) is maximal at \(p=1/2\). Moreover \(H(1/2)=\ln(2)\), \(H'(1/2)=0\), and \(H''(1/2)=4\). Hence for \(p=\frac{1}{2}+\frac{\delta}{\sqrt{2}}\) with small \(\delta=O(1/\sqrt{N})\) and \(d\delta=O(N\inv)\) we have, setting \(\sigma_0^2=\frac{1}{4N}=\sigma+O(N\inv)\),

\[ P\left(\frac{1}{2} + \frac{\delta}{\sqrt{2}} - d\delta < \frac{M}{N} < \frac{1}{2} + \frac{\delta}{\sqrt{2}} + d\delta \right) = \frac{2d\delta}{\sqrt{2\pi\sigma_0^2}} e^{-\frac{\delta^2}{2\sigma_0^2}} \; (1 + O(N\inv)) . \]

This implies that the probability distribution of \(M\) for \(N\to\infty\) is close to the normal distribution with mean \(N/2\) and standard deviation \(\sqrt{N}/2\). In particular almost all of the probability mass lies within a few multiplies of \(\sqrt{N}\) around \(N/2\). QED.

Let us investigate a different class of probability distributions.

Let \(\rho:[0,1]\to[0,\infty)\) be a (fixed) smooth function (which does not depend on \(N\)) with \(\int_0^1\rho(x)dx=1\). Consider a probability distribution satisfying

\[ P(M=m) = N\inv \rho(m/N) \; (1 + O(N\inv)) \]

and such that oracles having the same number \(m\) of solutions have the same probability (symmetry condition). One possibile density function is e.g. \(\rho(p)=1\).

The previous distribution does satisfy the symmetry condition but not the part about the \(N\)​-independence of \(\rho\) as the proof above showed that the corresponding \(\rho\) approached the delta distribution at \(1/2\) (got narrower with increasing \(N\)).

For these distributions one can readily see that conditioned on \(M=m\) (for any \(m\)) the \(X_j\) are again iid according to the Bernoulli distribution with parameter \(p=m/N\). The symmetry condition is the reason for this (again).

Theorem
Let \(c>0\) and \(\beta > 1/2\). Consider the just discussed probability distribution on the inputs (the oracles). Assume that for any \(N\) and a randomly chosen \(f\) the algorithms determine \(M\) within an accuracy \(c\sqrt{M}\) with probability at least \(\beta\). Then \(k=\Omega(N)\) as \(N\to\infty\).
Proof

Let us write \(a\pm b\) for the interval \([a-b,a+b)\). Recall that \(M\) is the number of solutions of the input \(f\) and hence is a random variable.

Without loss of generality we assume that \(\rho\) is compactly supported in \((0,1)\), i.e. there is an \(\varepsilon > 0\) such that \(\rho\) is zero outside of \([\varepsilon,1-\varepsilon]\). If this was not already the case we could just replace \(\rho\) by some "trimmed" (and normalized) version \(\rho_\varepsilon\). If we choose \(\varepsilon\) small enough the success probability \(\beta_{\varepsilon}\) of the algorithm is close to \(\beta\) (uniformly for all \(N\)) and hence larger than \(1/2\).

Let

\[ \beta_m = P \left(h(\vec{X}) \in m \pm c\sqrt{m} \; \middle| \; M=m \right) \]

By assumptions we have \(\sum_m\beta_m=\beta > 1/2\). Here and in the following we implicitly assume that \(m\) is in the "positive probability zone" (in particular bounded away from \(0\) and \(N-1\)). For example in the sum \(\sum_m\beta_m\) we may assume

\[ \varepsilon N \leq m \leq (1 - \varepsilon) N . \]

For other expressions the "positive probability zone" might be slightly extendend to the left or right (or both). This comes in handy next since it avoids "index overflow". Let

\[ \beta'_m = P \left(h(\vec{X}) \in m \pm c\sqrt{m} \; \middle| \; M = m + c\sqrt{m} \right) \]

Clearly the corresponding events imply the failure of the algorithm (accuracy is too bad). Hence \(\sum_m\beta'_m\leq1-\beta < 1/2\). Let us summarize what we have so far:

\[ \sum_m\beta_m=\beta > 1/2 \quad \text{and} \quad \sum_m\beta'_m\leq1-\beta < 1/2 . \]

Recall that we have to show \(k=\Omega(N)\). Using proof by contradiction and possibly restricting to an infinite subset of the values for \(N\) of we may assume that \(k=o(N)\) in the following.

Our goal is to show that under this assumption \(\beta'_m\sim\beta_m\) (i.e. \(\beta'_m/\beta_m\to1\)) as \(N\to\infty\) uniformly with respect to \(m\). This is the desired contradition to what we know already.

Let \(S=\sum_{j=1}^k X_j\) and \(p=m/N\). By our assumption on the probability distribution of \(f\) we have that the \(X_j\) are Bernoulli distributed with parameter \(p\) and hence

\[ P(S=s|M=m) = \binom{k}{s} p^s (1-p)^{k-s} . \]

(i.e. \(S\) satisfies a binomial distribution if conditioned on fixed \(M\).) Moreover the symmetry condition on the distribution further implies that

\[ P(\vec{X}=x | S=s \cap M=m) \]

is the same for all \(x\) with \(\sum_jx_j=s\) (and \(0\) otherwise). One could argue that this condition means that (for fixed \(M\)) no algorithm has any chance of extracting any information beyond \(s\) from \(\vec{X}\) since it is not able to generate a probability distribution with less entropy.

To prove \(\beta'_m\sim\beta_m\) it is sufficient to prove that \(P(\vec{X}=x|M=m)\,\sim\,P(\vec{X}=x|M=m+c\sqrt{m})\) (both uniformly with respect to \(m\)). To prove the latter, by the just discussed consequences of the symmetry property, it is sufficient to prove

\[ P(S=s|M=m)\,\sim\,P(S=s|M=m+c\sqrt{m}) , \]

uniformly with respect to \(m\). Actually we only prove this for "most" \(s\) (in the sense that their probability sums to a value asymptotically approaching \(1\) (for both values of \(M\))), but this is sufficient. Let \(R=O(1/s)+O(1/(k-s))\), \(q=s/k\), and \(\sigma_q^2=q(1-q)/k\). Let \(H(q)=-q\ln(q)-(1-q)\ln(1-q)\) and define

\[ E(q,p) = H(q) + q\ln(p) + (1-q)\ln(1-p) . \]

A straightforward calculation using Stirling's formula \(n!=\sqrt{2\pi\,n}(n/e)^n(1+O(n\inv))\) shows

\begin{align*} P(S=s|M=m) &= \binom{k}{s} p^s (1-p)^{k-s} \\ &= \frac{k\inv}{\sqrt{2\pi\sigma_q^2}} \cdot \left[q^q(1-q)^{1-q}\right]^{-k} \cdot \left[p^q(1-p)^{1-q}\right]^k \cdot (1 + R) \\ &= \frac{k\inv}{\sqrt{2\pi\sigma_q^2}} \cdot e^{kE(q,p)} \cdot (1 + R) . \end{align*}

Observe that at \(q=p\) we have \(E=\partial_qE=0\) and \(\frac{1}{2}\partial_q^2E=-1/(2\sigma_p^2)\). Recall that we are essentially only interested in \(p\in[\varepsilon,1-\varepsilon]\) (this is not entirely correct but the details do not matter). Note that \(\sigma_q=\sigma_p(1+O(k^{-1/2}))\) for \(q=p+O(k^{-1/2})\). Hence the probability concentrates around \(s=kp\pm O(\sqrt{k})\) and for these \(s\) we have:

\[ P(S=s|M=m) = \frac{k\inv}{\sqrt{2\pi\sigma_p^2}} \cdot e^{-\frac{(q-p)^2}{2\sigma_p^2}} \cdot (1 + o(1)) . \]

Compare this with the normal distribution. We essentially proved the central limit theorem in a special case.

Recall that \(q=s/k\) and \(p=m/N\). Also observe that \(\sigma_p^2=\Theta(k\inv)\) and that \(p':=p+c\sqrt{m}/N=p+o(k^{-1/2})\). Using this in the just derived asymptotic formula for \(P(S=s|M=m)\) yields the claim. QED.

Interpretation 2

Let us offer an interpretation which is analogous to exercise 6.13. Let again consider an algorithm as sketched in the previous interpretation. But now we assume more specifically that the \(N_j\) are just sampled uniformly at random. In particular \(k\) is deterministic and only depends on \(N\) (not on the oracle).

  1. Input: An oracle computing a function \(f:\{0,\ldots,N-1\}\to\{0,1\}\).
  2. Compute \(k=k(N)\).
  3. Determine \(\vec{X}=(X_1,\ldots,X_k)\) where \(X_j=f(N_j)\) and the \(N_j\) are sampled uniformly at random on the input space.
  4. Compute \(h_k(\vec{X})\) and return the result.

In this case we can consider individual oracles and need not consider probability distributions over them. The \(X_j\) are again iid according to the Bernoulli distribution with parameter \(p=M/N\) where \(M\) is the number of solutions for the oracle under consideration.

We note again that we consider a sequence of algorithms (as we are interested in the asymptotics \(N\to\infty\)).

Theorem
Consider a sequence of algorithms as discussed above. Let \(c>0\) and \(\alpha>0\). Assume that for any oracle the responsible algorithm (the one with the matching value for \(N\)) computes \(M\) correctly within accuracy \(c\sqrt{M}\) with probability at least \(\alpha\). Then the number of oracle calls is \(k=\Omega(N)\).
Proof

First of all we note that if \(M\) is correct within accuracy \(c\sqrt{M}\) it is also correct within \(c\sqrt{N}\). From here on we only assume the latter. Moreover let us write \(a\,\pm\,b\) for the interval \([a-b,a+b)\).

Without loss of generality we may assume that \(k=o(N)\) and have to deduce a contradiction from that. If this were not true we might consider a subsequence \((N_i)\) such that the corresponding \(k_i=k_i(N_i)\) satisfy \(k_i=o(N_i)\).

Let \(r=r(k,N)\) be such that \(r\to\infty\) in a way such that \(r^2k=o(N)\). This is possible due to \(k=o(N)\). Consider a set \(\{f_1,\ldots,f_r\}\) of oracles with corresponding

\[ p_i = \frac{1}{2} + \frac{2ic_i}{\sqrt{N}} \]

so that \(M_i=Np_i\) is the number of solutions of \(f_i(x)=1\). Here \(c_i\) the smallest \(c_i\geq c\) so that \(M_i\) is an integers. Note that \(c_i=c+O(N^{-1/2})\). It is not essential that these \(p_i\) are close to \(1/2\). The important thing is that they are close together (the width is \(o(1/\sqrt{k})\)) and bounded away from \(p=0\) and \(p=1\). Moreover we will see below that it is also not really important that \(r\to\infty\), instead it is sufficient that we can choose it arbitrary large, at least for large \(N\).

Observe that if \(f_i\) is the input oracle the \(X_j\) are iid and Bernoulli distributed with parameter \(p_i\). Let us denote by \(P_{p_i}\) the corresponding probability measure. Let us write

\[ I_i = p_i \pm \frac{c}{\sqrt{N}} \]

By definition of the \(p_i\) these intervals are disjoint. By assumptions of the theorem we have

\[ P_{p_i} \left( h(\vec{X}) \in I_i \right) \geq \alpha > 0 . \]

Lemma

Let \(\beta\in(0,1)\). There is a constant \(\delta\in(0,1]\) such that for all \(N\) the following holds: For each \(k=k(N)\) there is a set \(U_k\subseteq\{0,1\}^k\) such that for all the \(p_i\)

\[ P_{p_i}(\vec{X} \in U_k) \geq \beta . \]

and for any combination \(p_i\), \(p_j\) and any \(x\)

\[ P_{p_i}(\vec{X}=x)\geq\delta P_{p_j}(\vec{X}=x) . \]

Proof

Let \(Y=\sum_{j=1}^kX_j\). For any \(f_i\) by the central limit theorem the probability distribution of \(Y\) approaches the normal distribution with mean \(M_i/2=\frac{1}{2}+O(\sqrt{N})\) and variance \(O(k)\) (we have essentially proven this directly in the proof of another theorem).

Since \(r/\sqrt{N}=o(1/\sqrt{k})\) the widths of the corresponding gaussian bell curves are much larger than any of the \(p_i\) are away from each other - at least for large \(N\). Moreover they are roughly the same and the convergence is uniform. Hence

\[ U_k = \left\{x\in\{0,1\}^k \, \middle| \; \sum_j x_j \in \frac{k}{2} \pm c_0 \sqrt{k} \right\} \]

for a suitably chosen \(c_0\) (independent of \(k\) or \(N\)) does the job for all \(k\) large enough (\(\delta\) could even be chosen as close to \(1\) as we like). For all smaller \(k\) we can simply choose \(U_k=\{0,1\}^k\) because the condition involving \(\delta\) is trivially true by finiteness for a sufficiently small \(\delta\). QED.

Let \(U_k\) be as in the lemma for \(\beta\) chosen such that \(\gamma=\alpha+\beta-1\,>\,0\). Moreover let

\[ U_k(j) = h(\vec{X}) \in I_j \cap \vec{X} \in U_k . \]

Then

\[ P_{p_i}(U_k(j)) \geq \alpha + \beta - 1 = \gamma > 0 . \]

By the condition in the lemma involving \(\delta\) and the disjointness of the \(I_j\) we see that

\[ P_{p_0} \left( \bigcup_{j=1}^r U_k(j) \right) \geq r\gamma\delta . \]

The LHS is bounded by \(1\). On the other hand the RHS tends to \(\infty\) as \(N\to\infty\) since \(r\to\infty\). This is the desired contradiction. QED.

Exercise 6.15

Use the Cauchy–Schwarz inequality to show that for any normalized state vector \(\ket{\psi}\) and set of \(N\) orthonormal basis vectors \(\ket{x}\),

\[ \sum_x \norm{\psi - x}^2 \geq 2N - 2\sqrt{N} . \]

Proof

Let \(\psi_x=\sprod{x}{\psi}\). Observe that

\[ \norm{\psi - x}^2 = 2 - 2\Re(\sprod{x}{\psi}) = 2 - 2\Re(\psi_x) . \]

Moreover by the Cauchy-Schwarz inequality

\[ \sum_x \Re(\psi_x) \leq \sum_x \abs{\psi_x}\cdot 1 \leq \left(\sum_x \abs{\psi_x}^2 \right)^{\frac{1}{2}} \cdot \left(\sum_x 1 \right)^{\frac{1}{2}} = \sqrt{N} . \]

Using this while doing a summation of the first observation yields the claim.

Exercise 6.16

Suppose we merely require that the probability of an error being made is less than \(1/2\) when averaged uniformly over the possible values for \(x\), instead of for all values of \(x\). Show that \(\Omega(\sqrt{N})\) oracle calls are still required to solve the search problem.

Remark
The original exercise spoke about \(O(\sqrt{N})\) instead of \(\Omega(\sqrt{N})\). I corrected this typo.

Proof

Below we sketch the main steps of the proof as it is presented in the book. For this exercise the only relevant modification happens at step 5. Let the notation be as in the book:

\[ D_k = \sum_x \norm{\psi_k^x - \psi_k}^2, \quad E_k = \sum_x \norm{\psi_k^x - x}^2, \quad F_k = \sum_x \norm{x - \psi_k}^2. \]

with \(\psi_k=(\prod_{i=1}^kU_i)\psi\) and \(\psi_k^x=(\prod_{i=1}^kU_iO_x)\psi\) where the \(U_i\) are arbitrary unitary operators, \(\psi\) is an arbitrary normalized state, and \(O_x=I-2\proj{x}\) is the oracle. Moreover let

\[ p_x = \abs{\sprod{x}{\psi_k^x}}^2 \]

be the success probability for a given value of \(x\). The proof from the book assumed \(p_x\geq1/2\). The exercise weakened this to \(\sum_xp_x\geq N/2\).

  1. For all \(k\)

    \[ D_{k+1} \leq (\sqrt{D_k} + 2)^2 . \]

    This can be proved using \(O_x-I=-2\proj{x}\) and the Cauchy-Schwarz inequality.

  2. For all \(k\)

    \[ D_k\leq4k^2 . \]

    Follows by induction from 1 and \(D_0=0\).

  3. For all \(k\)

    \[ D_k \geq (\sqrt{F_k} - \sqrt{E_k})^2 . \]

    Follows from the triangle inequality and the Cauchy-Schwarz inequality.

  4. For all \(k\)

    \[ F_k \geq 2N - 2\sqrt{N} . \]

    This follows from the Cauchy-Schwarz inequality (see exercise 6.15).

  5. If \(\sum_x p_x\geq N/2\) then for all \(k\)

    \[ E_k \leq N . \]

    Since this is the crucial step for this exercise we give a detailed proof below. (Under the stronger assumption that \(p_x\geq1/2\), for all \(x\), the proof from the book showed the stronger statement \(E_k\leq(2-\sqrt{2})N\).)

  6. For all \(k\)

    \[ D_k = \Omega(N) . \]

    Follows from 3, 4, and 5.

The claim (that \(\Omega(\sqrt{N})\) oracle calls are necessary) follows from 2 and 6. QED.

Proof of 5

As in the main text we may assume that \(\sprod{x}{\psi_k^x} > 0\) (replace \(\ket{x}\) by \(e^{\ii\theta}\ket{x}\) if necessary). Hence

\[ \norm{\psi_k^x - x}^2 = 2 - 2 \abs{\sprod{x}{\psi_k^x}} = 2 - 2 \sqrt{p_x} . \]

Using \(0\leq p_x \leq 1\) (hence \(\sqrt{p_x}\,\geq\,p_x\)) and \(\sum_x p_x\geq N/2\) we obtain

\[ E_k = \sum_x (2 - 2\sqrt{p_x}) \leq \sum_x (2 - 2p_x) \leq N . \]

Note that this inequality is (essentially) sharp since equality holds if half of the \(p_x\) is \(1\) and the other half are \(0\) (at least for even \(N\) this is possible). QED.

Exercise 6.17 (Optimality for multiple solutions)

Suppose the search problem has \(M\) solutions. Show that \(\Omega(\sqrt{N/M})\) oracle applications are required to find a solution.

Remark
The original exercise spoke about \(O(\sqrt{N/M})\) instead of \(\Omega(\sqrt{N/M})\). I corrected this typo.

Proof

We closely follow the structure of the proof of exercise 6.16 (which in turn follows the proof from the book). Without loss of generality we may assume that

\[ M \leq \frac{N}{4} . \]

Otherwise \(O(\sqrt{N/M})=O(1)\) isn't interesting anyway. Let us define, essentially as in the book, \(\psi_k=(\prod_{i=1}^kU_i)\psi\) and \(\psi_k^f=(\prod_{i=1}^kU_iO_f)\psi\) where the \(U_i\) are arbitrary unitary operators, \(\psi\) is an arbitrary normalized state, and \(O_f=I-2\sum_{f(x)=1}\proj{x}\) is the oracle for some \(f\) with \(M\) solutions for the equation \(f(x)=1\). Let us define the projection operator

\[ P_f = \sum_{f(x)=1} \proj{x} = (I - O_f)/2 . \]

Note that there are unique normalized states \(\chi_k^f\) and \(\phi_k^f\) from the subspaces corresponding to \(f(x)=1\) and \(f(x)=0\) resepectively, and unique \(a_f,b_f\geq0\) (with \(a_f^2+b_f^2=1\)) such that

\[ \psi_k^f = a_f \chi_k^f + b_f \phi_k^f . \]

Note that the success probability (which is assumed to be at least \(1/2\) for any \(f\)) is given by \(p_f=a_f^2\). Finally consider

\[ D_k = \sum_f \norm{\psi_k^f - \psi_k}^2, \quad E_k = \sum_f \norm{\psi_k^f - \chi_k^f}^2, \quad F_k = \sum_f \norm{\chi_k^f - \psi_k}^2. \]

Here the summation ranges over all \(f\) which have exactly \(M\) solutions for \(f(x)=1\).

  1. For all \(k\)

    \[ \sum_f \norm{P_f\psi_k}^2 = \frac{M}{N} \binom{N}{M} =: S . \]

    Proof
    Let \(\psi_k=\sum_xc_x\ket{x}\). Then \(\norm{P_f\psi_k}^2=\sum_{f(x)=1}\abs{c_x}^2\) is just a sum of squares of \(M\) of the coefficients of \(\psi_k\). By symmetry in the overall sum over all \(f\) each coefficient appears equally often. All in all there are \(M\binom{N}{M}\) summands of the form \(\abs{c_x}^2\). Because \(\sum_x\abs{c_x}^2=1\) (which are exactly \(N\) summands) the claim follows. QED.
  2. For all \(k\)

    \[ D_{k+1} \leq (\sqrt{D_k} + 2\sqrt{S})^2 . \]

    Proof

    This is just a short calculation using the Cauchy-Schwarz inequality (in the second to last step):

    \begin{align*} D_{k+1} &= \sum_f \norm{O_f \psi_k^f - \psi_k}^2 \\ &= \sum_f \norm{O_f(\psi_k^f - \psi_k) + (O_f - I)\psi_k}^2 \\ &\leq \sum_f \left( \norm{\psi_k^f-\psi_k}^2 + 4\norm{\psi_k^f-\psi_k}\norm{P_f\psi_k} + 4\norm{P_f\psi_k}^2 \right) \\ &\leq D_k + 4\sqrt{D_k S} + 4S \\ &= (\sqrt{D_k} + 2\sqrt{S})^2 . \end{align*}

    QED.

  3. For all \(k\)

    \[ D_k \leq 4k^2 \cdot S . \]

    Proof
    This follows by induction from 2 and \(D_0=0\). QED.
  4. For all \(k\)

    \[ D_k \geq (\sqrt{F_k} - \sqrt{E_k})^2 . \]

    Proof
    This follows from the triangle inequality and the Cauchy-Schwarz inequality (the short proof is exactly the same as the analogous inequality from the book).
  5. For all \(k\)

    \[ F_k \geq \left(2 - 2\sqrt{\frac{M}{N}}\right) \binom{N}{M} . \]

    Note that here the bound on \(M\) comes in handy.

    Proof

    By \(P_f\chi_k^f=\chi_k^f\), \(\norm{\chi_k^f}=1\), and the fact that \(P_f\) is hermitian we have

    \[ \norm{\chi_k^f - \psi_k} = 2 - 2\Re(\sprod{\chi_k^f}{P_f\psi_k}) \geq 2 - 2\norm{P_f\psi_k} . \]

    Using this together with the definition of \(S\) (see 1) and the Cauchy-Schwarz inequality we get

    \[ F_k \geq 2\binom{N}{M} - 2 \left( \sum_f\norm{P_f\psi_k}^2 \right)^{\frac{1}{2}} \cdot \left( \sum_f 1 \right)^{\frac{1}{2}} = 2\binom{N}{M} - 2 \sqrt{S \binom{N}{M}} . \]

    The claim follows from 1. QED.

  6. If \(p_f\geq1/2\) for all \(f\) then for all \(k\)

    \[ E_k \leq (2 - \sqrt{2}) \binom{N}{M} . \]

    Proof

    From the relation between \(\chi_k^f\) and \(\psi_k^f\) we get

    \[ E_k = \sum_f \left( (1 - a_f)^2 + b_f^2 \right) = \sum_f (2 - 2 a_f) \]

    Now the assumption on the success probability \(a_f^2=p_f\geq1/2\) yields the claim. QED.

  7. For all \(k\)

    \[ D_k = \Omega\left( \binom{N}{M} \right) . \]

    Proof
    This follows from 4, 5, 6, and the bound on \(M\).

The assertion \(k=\Omega(\sqrt{N/M})\) now follows from 1, 3, and 7. QED.

Exercise 6.18

Prove that the minimum degree polynomial representing a Boolean function \(F(X)\) is unique.

Remark
The proof below shows a stronger statement: The multi-linear polynomial representing \(F\) is unique. Here (as in the book) multi-linear means that any variable \(X_i\) occurs only as \(X_i^0=1\) or \(X_i^1\) but not with higher powers. Note that any minimal polynomial is necessarily multi-linear.

Proof

Any multi-linear polynomial representing \(F\) is of the form

\[ P(X) = \sum_{e} c_e \prod_{k=0}^{N-1} X_k^{e_k} \]

where the sum ranges over all \(e:\{0,\ldots,N-1\}\to\{0,1\}\) (binary strings of length \(N\)). In the following we use the notation \(\mathrm{bc}(e)\) (bit count) to denote \(\sum_ke_k\).

Lemma
Let \(P\) and \(Q\) be two multi-linear polynomials representing \(F\) with corresponding coefficients \((c_e)\) and \((d_e)\). Let \(n\in\{0,\ldots,N-1\}\). Assume that \(c_e=d_e\) for all \(e\) with \(\mathrm{bc}(e)\leq\,n\). Then \(c_e=d_e\) for all \(e\) with \(\mathrm{bc}(e)\leq\,n+1\).
Proof

Let \(e\) be such that \(\mathrm{bc}(e)=n+1\). Choose \(X=e\). By assumptions

\[ 0 = F(X) - F(X) = P(X) - Q(X) = c_e - d_e . \]

This (essentially) implies the claim. QED.

Also note that \(P(0,\ldots,0)=c_{(0,\ldots,0)}\). Hence two multi-linear polynomials representing \(F\) are proven to be equal by mathematical induction (using the lemma as the induction step and the just mentioned fact as the induction start). QED.

Exercise 6.19

Show that \(P(X)=1-\prod_{k=0}^{N-1}(1-X_k)\) represents OR.

Proof

This is obvious. Probably one way to make it even more obvious is to note that

\[ P(X) = \begin{cases} 0 & \text{if } X_k = 0 \text{ for all } k, \\ 1 & \text{else.} \end{cases} \]

This is precisely the behaviour of the OR function. QED.

Exercise 6.20

Show that \(Q_0(\mathrm{OR})\geq N\) by constructing a polynomial which represents the OR function from the output of a quantum circuit which computes OR with zero error.

Remark
  • Proving \(Q_0(\mathrm{OR})\geq N\) implies \(Q_0(\mathrm{OR})=N\) since \(N\) oracle calls to all possible inputs are clearly sufficient to implement a deterministic algorithm to solve the search problem (OR corresponds to the search problem whether \(X_i=1\) has a solution \(i\)).
  • Exactly this is proved in (Robert Beals and Harry Buhrman and Richard Cleve and Michele Mosca and Ronald de Wolf, 1998). This paper served me well as a nice introduction to the method of polynomials.

Proof

First of all note that \(1-\prod_{k=0}^{N-1}(1-X_k)\) is the minimal polynomial of \(\mathrm{OR}\). This follows from the proof of exercise 6.18 since this polynomial is already multi-linear.

Hence \(\deg(\mathrm{OR})=N\).

Let us briefly recall how an algorithm for the zero-error case works. First a number \(m\) of qubits is prepared in some initial state, without loss of generality \(\ket{0}\). Then a quantum circuit \((\prod_{i=1}^{T}U_iO)U_0\) is applied, where the \(U_i\) are arbitrary unitary operators and \(O\) is the call to the oracle. Overall there are \(T\) calls and we have to prove that \(T\geq N\).

To actually return something we have to measure at least some of the qubits. Without loss of generality we may assume that all \(m\) qubits are measured (in practice one probably only measures two qubits (which is sufficient to differentiate up to four answers)). Let \(\mathcal{Y}\), \(\mathcal{N}\), \(\mathcal{I}\) be disjoint sets whose union is \(\{0,\ldots,2^m-1\}\). Based on whether the measurement is in \(\mathcal{Y}\), \(\mathcal{N}\), or \(\mathcal{I}\) we return yes, no, and inconclusive. We may assume that at least \(\mathcal{Y}\) and \(\mathcal{N}\) are non-empty (otherwise the algorithm cannot be useful).

Let

\[ \ket{\psi(X)}=\sum_{k=0}^{2^m-1} c_k(X) \ket{k} \]

be the final state right before measurement. From the book we already know that each \(c_k(X)\) is a complex polynomial of degree at most \(T\). Consider

\[ P_{\mathcal{N}}(X) = \sum_{k\in \mathcal{N}} \abs{c_k(X)}^2 . \]

This is the probability of returning no (meaning that the algorithm claims that \(X\) contains no ones). Since the algorithm is zero-error we have \(P_{\mathcal{N}}(X)=0\) for \(X\neq\vec{0}\). Hence \(c_k(X)=0\) for \(X\neq\vec{0}\) and all \(k\in\mathcal{N}\). On the other hand there must be a \(k'\in\mathcal{N}\) such that \(c_{k'}(\vec{0})\neq0\). Let

\[ P(X) = \Re\left(1 - \frac{c_{k'}(X)}{c_{k'}(\vec{0})}\right) . \]

Clearly this is a real polynomial of degree at most \(T\). By construction it represents OR. Using the fact that \(\deg(\mathrm{OR})=N\) this implies that \(T\geq N\), as desired. QED.

References

Robert Beals and Harry Buhrman and Richard Cleve and Michele Mosca and Ronald de Wolf (1998). Quantum Lower Bounds by Polynomials.