Chapter 6

Table of Contents

Setup

import sympy as sp
from sympy.vector import CoordSys3D

from chapter_4 import Ry, theta, Z

Set up some variables:

a, b = sp.symbols('a b', real=True)  # usage for a,b or alpha,beta in the book
c, s = sp.symbols('c s', real=True)  # usage for cosine and sine

Exercises

Exercise 6.1

Show that the unitary operator corresponding to the phase shift in the Grover iteration is 2|00|I.

Proof

Recall that the phase shift operator in Grover's algorithm does the following to the standard basis states:

|x{|0if x=0|xotherwise.

Since 0|x=δx,0 this is the same as what 2|00|I does. QED.

Exercise 6.2

Let |ψ=N1/2i=0N1|k=Hn|0. Show that the operation 2|ψψ|I applied to a general state kαk|k produces

k[αk+2α]|k,

where α=kαk/N is the mean value of the αk. For this reason, 2|ψψ|I is sometimes referred to as the inversion about mean operation.

Proof

First of all we note that

|ψψ|=1Nij|ij|.

Hence

(2|ψψ|I)kαk|k=kαk|k+2ijkαkN|ij|k=kαk|k+2ikαkN|i=kαk|k+2iα|i=k(αk+2α)|k.

QED.

Exercise 6.3

Show that in the |α, |β basis, we may write the Grover iteration as a rotation

G=[cos(θ)sin(θ)sin(θ)cos(θ)],

where θ is a real number in the range 0 to π/2 (assuming for simplicity that MN/2; this limitation will be lifted shortly), chosen so that

sin(θ)=2M(NM)N.

Proof

Recall that M is the number of solutions of f(x)=1 and that

|α=1NMf(x)=0|x,and|β=1Mf(x)=1|x.

Moreover

|ψ=Hn|0=NMN=:cos(θ/2)|α+MN=:sin(θ/2)|β.

This is consistent with the formula for θ from the exercise statement, which can be seen by the trigonometric identity sin(2x)=2sin(x)cos(x). The Grover iteration G is the product of the inversion about ψ, and the oracle Orac (inversion about α). We ignore the workspace here.

G=inv(ψ)Orac

By definition of α and β we have

Orac=[1001].

Note that this is the Pauli-Z matrix. Let

U(θ)=[cos(θ)sin(θ)sin(θ)cos(θ)].

Note that U(θ/2)=Ry(θ) (Pauli-Y rotation) and that U(θ/2)(1,0) is the coordinate representation of |ψ. Hence

inv(ψ)=U(θ/2)[1001]U(θ/2).

To show the claim only a short calculation is left. Let us do it with the help of sympy:

Inv_psi = sp.simplify(Ry * Z * Ry.H)  # Ry = Ry(theta)
Orac = Z

assert Ry.subs(theta, 2*theta) == Inv_psi * Orac
"PASSED"
PASSED

Another way to see this (without much calculation) is to note that ZU(ϕ)Z=U(ϕ). QED.

Exercise 6.4

Give explicit steps for the quantum search algorithm, for the case of multiple solutions (1<MN/2).

Solution

The difference to the special case M=1 is essentially the number of repetitions of the Grover iteration. Also note that M has an influence on the success probability.

We assume that N=2n, f:{0,1}n{0,1}, and that M is the number of solutions of f(x)=1, as in the book. Moreover let (recall that we assume MN/2)

θ=arcsin(2M(NM)N)

This formula comes from exercise 6.3. In the solution of that exercise we showed that

|ψ:=Hn|0=cos(θ/2)|α+sin(θ/2)|β.

Here α and β are as in the solution of exercise 6.3. Let us write

|φ:=cos(φ)|α+sin(φ)|β.

Hence |ψ=|θ/2, β=|π/2 and G acts as |φ|φ+θ. In particular, since we aim to rotate |ψ to |β (approximately), the number of repetitions for the Grover algorithm is:

R=round(π2θ12)=O(NM),

(for MN/2).

Algorithm (Quantum Search (Grover) for 0<MN/2)
Inputs
  • An n bit register and a one bit register.
  • A black box oracle Uf which performs the transformation Uf|x,q=|x,qf(x).
Outputs
A solution x0 of f(x0)=1 with probability at least 1sin(θ/2)2=1M/N1/2 (since the Grover iterations miss π/2 by at most θ/2). Every solution has the same chance to get "chosen".
Runtime
R=O(N/M) calls to the oracle and to (2|ψψ|I), and O(n) gates/operations for other things.
Procedure
  1. Initialize registers to |0|0.
  2. Apply HnHX:

    12n/2x=02n1|x|=|ψ|=|θ/2|

  3. Apply R repetitions of (2|ψψ|I)Uf:

    |(2R+1)θ/2||β|

  4. Measure the first register to obtain some x0. Return x0.

Exercise 6.5

Show that the augmented oracle Uf may be constructed using one application of the original oracle Uf, and elementary quantum gates, using the extra qubit |q.

  • What strikes me is the fact that it is possible to do this with just one application of the oracle! We will see that it is not possible to do this classically in a reversible way.

Proof

According to the book we have to implement Uf as a controlled (by q) version of Uf which acts like this:

|q,x,y|q,x,yqf(x).

Recall that the techniques of chapter 4 require to decompose Uf into elementary gates and recursively add controls. We do not do this for the following reasons:

  • Uf is a black box. So we are not supposed to assume anything about its internal workings. In particular it might not be implemented by the "usual" elementary gates.
  • Even if it was (and we get access to the circuit during execution of the algorithm) the generic standard recursive procedure produces a relatively complicated circuit (in general).
  • There is a simple way to get Uf using Uf as a black box!

The key to see this is the fact that the oracle Uf leaves |x|+ fixed. Let us introduce a fourth (auxiliary) register holding one qubit, initialized to |+. We want to implement Uf as:

|q,x,y,+|q,x,yqf(x),+.

It is not hard to see that

Uf=CSWAP(1,3,4)UfCSWAP(1,3,4)

accomplishes exactly that. Here CSWAP(1,3,4) is the controlled swap gate exchanging registers three and four conditioned on register one. QED.

Proof of the remark

In the framework of (classical) reversible computation as sketched in chapter 3.2.5 in the book we need at least two applications of the oracle.

Before we prove this let us first show a classical reversible algorithm which uses two applications of the oracle:

|q,x,y,0Uf(2,4)|q,x,y,f(x)X(1)|q,x,y,f(x)Toff(1,4,3)|q,x,yqf(x),f(x)X(1)|q,x,yqf(x),f(x)Uf(2,4)|q,x,yqf(x),0

The second application of the oracle is needed to uncompute the garbage from the auxiliary register. An intuitive way to see the necessity of the second application is to look at the information present in the registers. In case of q=1 the final state does not contain a single bit of information about f. On the other hand, if we apply the oracle just once it leaves one bit of information about f (the value of f at some input, but not necessarily at x). So the computation after the application of the oracle would have to erase the information which is not possible with reversible computation (without using the oracle, and hence some information about f, again).

Let us give a (hopefully) more rigorous proof. Assume to the contrary that it was possible with just one application of the oracle and an additional m qubit auxiliary register (initialized to 0). The circuit (which is just a sequence of reversible gates) can be divided into three parts. The part A before the oracle, the application of the oracle Uf, and the part B after the oracle:

|q,x,y,0A|a1,a2,a3,a4Uf(p)|a1,a2,a3,a4B|q,x,yqf(x),0

In the second step the p "signifies" our choice to what n+1 qubits to apply the oracle. Let us merge the ai into a single vector z (just for notational simplicity). Same for z vs the ai.

After the oracle step for a certain j0 we have zj0=zj0f(x~). Here the x~=x~(q,x,A,p) only depends on q, x, the first part A, and the places p to which the oracle is applied. Note that for q=0 clearly x~=x (otherwise, there is no way to obtain f(x)), but for q=1 (the interesting case) we cannot conclude this.

Now let us simulate the above circuit on a larger circuit with four zones. A zone is just a group of registers. The first zone is a read-only zone containing q,x,y all the time. The second and third zone contains the same four registers (each) as the above circuit. The last zone contains only a one-qubit circuit.

Let us denote by Sim(A,i) for i{2,3} the circuit which executes A at zone i. The simulations of Uf and B are defined similarly.

Consider the circuit which acts as follows:

  1. Start |q,x,y;0;0;0
  2. Copy q,x,y to zone 2: |q,x,y;q,x,y,0;0;0
  3. Apply Sim(A,2): |q,x,y;z;0;0
  4. Apply Sim(Uf(p),2): |q,x,y;z;0;0
  5. Apply Sim(B,2): |q,x,y;q,x,yqf(x),0;0;0

Now let us see how we can extract f(x~) from the final state (without using the oracle). The possibility of being able to do this is a contradiction since clearly no traces of f are left in the final state for q=1.

  1. Uncompute Sim(B,2): |q,x,y;z;0;0
  2. Copy q,x,y to zone 3: |q,x,y;z;q,x,y,0;0
  3. Apply Sim(A,3): |q,x,y;z;z;0
  4. Add zone three to zone two (using bit-wise addition):

    |q,x,y;z;z;0

    Note that z contains just one non-trivial bit zj0=f(x~) (the others are trivially zero).

  5. Copy zj0=f(x~) to the fourth zone and erase it from zone two (by a CNOT controlled by the fourth zone): |q,x,y;0;z;f(x~)
  6. Uncompute Sim(A,3) (not important, just for reasons of feng shui): |q,x,y;0;0;f(x~)

We have managed to extract a bit of information about f no matter the value of q. This is a contradiction as already explained above. QED.

Exercise 6.6

Verify that the gates in the dotted box in the second figure of Box 6.1 perform the conditional phase shift operation 2|0000|I, up to an unimportant global phase factor.

Proof

The circuit in the dotted region implements the following operator

XXIHC(X)IHC(Z)XX

Recall that the controlled Z-gate acts as follows on the standard basis:

C(Z)|x={|11for x=11|xotherwise.

The X-gate just exchanges the roles of 0 and 1:

XXC(Z)XX|x={|00for x=00|xotherwise.

Up to a factor of 1 (the global phase factor) this is precisely what 2|0000|I does. QED.

Exercise 6.7

Verify that the circuits shown in Figures 6.4 and 6.5 implement the operations exp(i|xx|Δt) and exp(i|ψψ|Δt), respectively, with |ψ=Hn|0.

Remark

There is a typo in the rotation matrix. The matrix should be

[100eiΔt]

The minus-sign is missing in the figures of the circuits.

Proof for the first circuit

Since |xx| is a projection (that is, an operator for which P2=P) the series expansion of the exponential function implies

ei|xx|Δt=eiΔt|xx|+(I|xx|)

Hence

ei|xx|Δt|y={|yif yxeiΔt|xif y=x=eiΔtf(y)|y.

On the other hand this is exactly what the circuit (Ufdiag(1,exp(iΔt))Uf) does.

|y,0Uf|y,f(y)[]eiΔtf(y)|y,f(y)UfeiΔtf(y)|y,0

QED.

Proof for the second circuit

In the same way as in the first part we see that

ei|ψψ|Δt|φ={|φif |φ|ψeiΔt|ψif |φ=|ψ

Again this is exactly what the circuit does. Let us first consider the case |φ=|ψ:

|ψ,0Hn|0,0C0n(X)|0,1[]eiΔt|0,1C0n(X)eiΔt|0,0HneiΔt|ψ,0.

For the case |φ|ψ we only have to note that Hn|φ is orthogonal to |0 and hence the controlled X gate does nothing. QED.

Exercise 6.8

Suppose the simulation step is performed to an accuracy O(Δtr). Show that the number of oracle calls required to simulate H to reasonable accuracy is O(Nr/2(r1)). Note that as r becomes large the exponent of N approaches 1/2.

Proof

From equation (6.23) we know that the simulation time is t=π/2α=O(N). Hence the number of steps is t/Δt=O(Δt1N). Since the error of a product of unitary operators scales linearly we see that the overall error is

O(Δtr1N)

This must be O(1) and hence we must choose Δt=O(N1/2(r1)). To minimize the number of steps we set ΔtN1/2(r1) and obtain the claim. QED.

Exercise 6.9

Verify Equation (6.25): Let c=cos(Δt/2), s=sin(Δt/2):

U(Δt)=(c2s2ψz^)Iis(c(ψ+z^)+sψ×z^)σ.

(Hint: see Exercise 4.15.)

Proof

Let z^=(0,0,1) and ψ=(2αβ,0,α2β2). From the paragraph above equation (6.25) we deduce

U(Δt)=ei|ψψ|Δtei|xx|Δt=eiΔteiψσΔt/2eiz^σΔt/2.

The first of the three terms is the unimportant global phase. The second and the third are rotations around ψ and z^ and angle Δt (both). Hence we are indeed in the setting of exercise 4.15. From that exercise we deduce that U(Δt) is a rotation around an axis n by an angle θ

U(Δt)=eiΔt(cos(θ/2)I+sin(θ/2)nσ)

such that:

cos(θ/2)=c2s2ψz^

and

sin(θ/2)n=sc(ψ+z^)+s2ψ×z^.

QED.

Exercise 6.10

Show that by choosing Δt appropriately we can obtain a quantum search algorithm which uses O(N) queries, and for which the final state is |x exactly, that is, the algorithm works with probability 1, rather than with some smaller probability.

Proof

To not obscure the proof by too much technicalities I only prove the result for large enough N. Let c=cos(Δt/2) and s=sin(Δt/2).

From the proof of exercise 6.9 we know that U(Δt) is a rotation of angle θ about r where:

cos(θ/2)=c2s2ψz^

and

sin(θ/2)r=sc(ψ+z^)+s2ψ×z^.

Also recall that z^=(0,0,1) and ψ=(2αβ,0,α2β2) where α=1/N and β=(N1)/N. Let us further elaborate on θ. Using one of the above formulas we get

cos(θ/2)=12sN.

From the Taylor expansion cos(x)=1x2/2+O(x4) of the cosine function we obtain

θ=4sN+O(sN1).

Now consider the axis of rotation. Below with we mean that the LHS is equal to the RHS up to a real factor. Using the axis-formula a short calculation yields

r(βc,βs,αc).

The squared norm of the RHS is 1α2s2. Hence

r=(βc,βs,αc)/1α2s2=(βc,βs,αc)+O(s2N1).

From the original formula for r we see that rz^=rψαc=O(N1/2). This implies that z^ indeed lies on the same orbit as ψ. Let ω be the angle of the rotation about r which is needed to rotate ψ into z^. Let

z^=z^(z^r)r=z^αc1α2c2r=z^+O(N1/2)r

and

ψ=ψ(ψr)r=ψαc1α2c2r=ψ+O(N1/2)r.

Hence

cos(ω)=ψz^=ψz^+O(N1)=α2β2+O(N1)=1+O(N1),

and therefore ω=π+O(N1/2). Using this and the formula for θ we see that the number of required θ rotations is

ωθ=π4sN+O(1).

To enable a 100% probability this number must be an integer. The standard case with Δt=π means s=1. By the intermediate value theorem (the expression is continuous) there is an s[1cN1/2,1] (for some constant c>0 depending on the constant in the O(1) above) such that this is true. Clearly ω/θ=O(N) for such s. QED.

Appendix

Let us use sympy to do the boring calculations. First let us define z^ and ψ in a 3D cartesian coordinate system. Moreover we set up our slightly scaled version r1 of r.

N = CoordSys3D('N')

z = N.k
psi = 2*a*b*N.i + (a**2 - b**2)*N.k

r_formula_rhs = s*c*(psi + z) + s**2 * psi.cross(z)
r1 = b*c*N.i - b*s*N.j + a*c*N.k

Here we check that our formula for r is valid:

norm2_r1 = r1.dot(r1).subs(b**2, 1 - a**2).subs(c**2, 1 - s**2)

assert r_formula_rhs.subs(b**2, 1 - a**2) == 2*a*s * r1
assert sp.simplify(norm2_r1) == 1 - a**2 * s**2
"PASSED"
PASSED

Exercise 6.11 (Multiple solution continuous quantum search)

Guess a Hamiltonian with which one may solve the continuous time search problem in the case where the search problem has M solutions.

Solution

My Ansatz looks as follows

H=|χχ|+|ψψ|,

where

χ=1Mf(x)=1|x

and we let |ψ unspecified at first (as always we take |ψ=Hn|0 in the end). It turns out that section 6.2 essentially works out analogously with this Hamiltonian if one replaces x with χ.

In particular we arrive at the same formula for the Hamiltonian and formula (6.23) which states that the corresponding unitary group rotates (the still arbitrary) |ψ into |χ:

cos(αt)|ψisin(αt)|χ.

Thus after time t=π/2α the system is rotated in a state which when measured produces a solution of f(x)=1 (equal probability for each possibility). So what is the difference? Something must be different, right? I am glad you asked! Actually so far nothing is different!

Essentially the only thing which differs is the value of α once we fix |ψ. As in chapter 6.2 we have to choose |ψ in a way which allows us to determine α without knowing |χ. This is where we fix |ψ=Hn|0. This in turn leads to

α=M/N.

Let us finally investigate the implications for the performance of an approximation similar to the one in the book. It is easiest to just follow the proof of exercise 6.10. For simplicity we restrict our short analysis to the case M=o(N).

The angle θ of rotation is determined by (c.f. the case for M=1)

cos(θ/2)=12MsN.

Hence

θ=4sMN+O(sMN1).

The same reasoning as in exercise 6.10 allows us to estimate the angle ω between ψ and z^:

cos(ω)=1+O(MN1).

Hence ωπ. We conclude that the number of iterations is ω/θ=Θ(N/M) (if we choose Δt=π and thus s=1). But keep in mind that we used M=o(N) in this analysis!

Exercise 6.12 (Alternative Hamiltonian for quantum search)

Suppose

H=|xψ|+|ψx|.

  1. Show that it takes time O(1) to rotate from the state |ψ to the state |x, given an evolution according to the Hamiltonian H.
  2. Explain how a quantum simulation of the Hamiltonian H may be performed, and determine the number of oracle calls your simulation technique requires to obtain the solution with high probability.

Proof of (1)

Let |ψ=α|x+β|y. In the basis (|x,|y)

H=αI+βX+αZ.

Note that the term involving the identity I is not important since it only corresponds to a global phase in eiHt. Recall from the book that the Hamiltonian H0=|xx|+|ψψ| is equal to I+α(βX+αZ). Hence

eiHteiH0α1t

where means up to a global phase. We already know from equation (6.23) that H0 rotates |ψ into |x after time t=π/2α. Hence H does the same after time t=π/2=O(1). QED.

Solution to (2)

First of all note that the two summands of H (I mean |xψ| and |ψx|) are not self adjoint. Hence their exponentials (e.g. ei|xψ|) are not unitary. So we cannot simulate the summands by a circuit.

On the other hand from the first part we already know that the exponential of H is essentially a time-scaled version of the exponential of H0. So if we can simulate the latter this yields a simulation of the former. The resource requirements are the same of course. Not sure if this is the intended meaning of the exercise though.

Exercise 6.13

Consider a classical algorithm for the counting problem which samples uniformly and independently k times from the search space, and let X1,,Xk be the results of the oracle calls, that is, Xj=1 if the j​th oracle call revealed a solution to the problem, and Xj=0 if the j​th oracle call did not reveal a solution to the problem. This algorithm returns the estimate S=Nkj=1kXj for the number of solutions to the search problem. Show that the standard deviation of S is ΔS=M(NM)/k. Prove that to obtain a probability at least 3/4 of estimating M correctly to within an accuracy M for all values of M we must have k=Ω(N).

Proof

Computing the standard deviation ΔS

Let us denote by P() and E() the probability of some event and the expectation of some random variable.

According to the exercise statement the random variables Xi are iid. Whenever the index j is irrelevant we denote by X any of the Xj. We have

p:=P(X=1)=MN,P(X=0)=1p=NMN.

The mean value of the Xj is

μ:=E(X)=p1+(1p)0=MN=p.

Hence, by linearity E(S)=Nμ=M. This justifies S as an estimator for M. The variance is

ΔS2=E((SM)2)=N2k2E((jXjμ)2)=N2k2jE((Xμ)2)=N2kE(X2μX+μ2))=N2kp(1p)=M(NM)k.

For the second equality we use that due to independence E(XiXj)=E(Xi)E(Xj) for ij. For the third equality we use X2=X.

Proof of an upper bound for k

The exercise does not really ask for this but we do it anyway. For reasons to become clear at the end we consider the accuracy CM (instead of just M) for some C>0. By Chebychev's inequality for any λ>0:

P(|SM|λΔS)1λ2.

Choosing λ such that λΔS=CM yields

P(|SM|M)1λ2=NMC2k.

Hence k=4N/C2 is sufficient for the LHS to be smaller than 13/4=1/4. This is the desired upper bound. QED.

Remark
Clearly the upper bound for the case C2 is not really useful. For those C we should either hope to improve our estimate or just systematically visit all N points in the search space (which yields the exact count with certainty instead of an estimate just sometimes!).
Proof of k=Ω(N)

We assume that N is even. This simplifies the proof and is a reasonable assumption in our context since the quantum counting algorithm even assumes that N is a power of two.

Assume k is chosen in a way that

P(|SM|M)23

holds for all M. In particular this must be satisfied for M=N/2. Let Y=jXj. Then

|SM|M with M=N/2k2k2NYk2+k2N

Note that

py:=P(Y=y)=(ky)2k

Let α=k/2N. The target probability requirement is equivalent to

k/2αyk/2+αpy34.

Recall that in our analysis we only consider asymptotic estimates for N. It is not hard to see that the previous inequality readily implies k as N. This is a first step to prove k=Ω(N). Now let us analyze the LHS more deeply. From the formula for py together with Stirling's formula we see that for the py occurring in the LHS (abbreviating s=y/k)

py=2πk(k/e)k2πy(y/e)y2π(ky)((ky)/e)ky(1+o(1))2k=2πk2k(1s)kysy(1+o(1))

Let us define ε by y=(k+εk)/2. By assumption ε2/N. Hence

py=2πk1+o(1)(1ε)ky(1+ε)y=2πk1+o(1)(1ε)(1ε)k/2(1+ε)(1+ε)k/2=2πk1+o(1)(1ε2)k/2[1ε1+ε]εk/2.

Let us assume that there is a special sequence of values for k which is o(N) while still satisfying the probability estimate (the proof is done if we can rule out this possibility). Using (1±δ)o(δ1)=1+o(1) if δ0 (this is related to the well-known formula ex=limn(1+xn)n) we obtain

py=2πk(1+o(1))

for k from the special sequence. Under these assumptions the LHS of the probability estimate is not large enough:

k/2αyk/2+αpy=O(k1/2kN)=O(kN)=o(1)34 eventually.

This is a contradiction! Hence such a special sequence does not exist. Thus k=Ω(N). QED.

Exercise 6.14

Prove that any classical counting algorithm with probability at least 3/4 for estimating M correctly to within an accuracy cM for some constant c and for all values of M must make Ω(N) oracle calls.

Remark
This is vaguely related to the fact that the sample mean is a so called Uniformly Minimum Variance Unbiased Estimator (UMVUE) for the mean value parameter of the Bernoulli distribution. See libretexts.org.

Proof

The exercise is a little bit vague in my opinion - but that is OK. If we speak about probabilities there must be a probability distribution somewhere. The exercise doesn't explicitly define one. Sometimes the distribution is given implicitly. In this case I think there are at least two options:

  • Postulate a probability distribution on the inputs.
  • Assume that the oracle calls are at randomly selected arguments.

I solved the exercise for both interpretations.

Interpretation 1

For a given N, any such algorithm does essentially the following:

  1. Input: An oracle computing a function f:{0,,N1}{0,1}.
  2. Determine X=(X1,,Xk), where Xj=f(Nj) for some Nj<N.
  3. Compute h(X) and return the result.

As we are interested in the asymptotics of k as N we actually have to take into account sequences of algorithms which might be specifically tailored to each value of N. In particular the sequence of algorithms (their "source code") might not be computable. This does not complicate the analysis at all (nothing we do here depends on computability - measure theory is the framework which governs the math here) but it is worth pointing out since the exercise is formulated with computing in mind.

Of course, a concrete algorithm might not actually separate steps 2 and 3 but it is always possible to change the algorithm so that it behaves like that by essentially running it twice: the first time we just remember the Xj and the second time we plugin the already computed values for Xj. Of course this is a bit dumb but it simplifies our analysis.

The Nj need not be constant. In fact, any Nj might depend on anything what previously was computed, including already known Xj. We may assume that the Nj are all different. If this was not the case we may change the algorithm by inserting a subroutine which checks if Nj already appeared and just retrieves a copy of the corresponding Xj. In particular k might also depend on the course of computation and hence is not necessarily just a function of N. For that reason h essentially takes a list of Xj as input and not just a fixed size vector.

As the algorithms are supposed to be arbitrary they might in particular be deterministic (meaning that they compute a function mapping oracles to integers). In that case it wouldn't even make sense to speak about a probability of 3/4 that something happens! To introduce randomness to the setting one could postulate a probability distribution over the inputs. Note that under this assumption M is a random variable.

One possible distribution is the uniform distribution over all f (all possible 2N oracles occur with the same probability 2N).

Assuming a uniform distribution over all possible f (the input of the algorithm) we may deduce that the Xj are independent and identically distributed (iid) according the Bernoulli distribution with parameter p=M/N (as in exercise 6.13). For the independence it is important that the Nj are all different.

We show now that in that case the exercise statement is wrong (at least for sufficiently large c)! In fact just estimating N/2 regardless of the input (f) does the job! The reason for this is that most oracles just have M=N/2+O(N1/2) under this distribution.

Theorem
Assuming a uniform distribution over all possible f the algorithm Return N/2 is accurate within cN1/2 with probability 1o(1) as N and c. In particular no oracle calls are needed.

Proof

This statement follows from the central limit theorem. Let us proof it directly - just for fun (it is not difficult).

Let us abbreviate R=O(m1+(Nm)1) in the following. The probability that the number M of solutions to f(x)=1 (for a random f) is m is

P(M=m)=(Nm)2N.

A straightforward calculation using Stirling's formula n!=2πn(n/e)n(1+O(n1)) shows

P(M=m)=N12πσ2[2(1p)1ppp]N(1+R).

Here we set p=m/N and σ2=p(1p)N. Let us define the entropy function H(p)=pln(p)(1p)ln(1p). In case you do not know how this function looks like you should search the internet for a plot. Wikipedia should have one. Plugging in the entropy function we get

P(M=m)=N12πσ2e(H(p)ln(2))N(1+R).

Observe that H is maximal at p=1/2. Moreover H(1/2)=ln(2), H(1/2)=0, and H(1/2)=4. Hence for p=12+δ2 with small δ=O(1/N) and dδ=O(N1) we have, setting σ02=14N=σ+O(N1),

P(12+δ2dδ<MN<12+δ2+dδ)=2dδ2πσ02eδ22σ02(1+O(N1)).

This implies that the probability distribution of M for N is close to the normal distribution with mean N/2 and standard deviation N/2. In particular almost all of the probability mass lies within a few multiplies of N around N/2. QED.

Let us investigate a different class of probability distributions.

Let ρ:[0,1][0,) be a (fixed) smooth function (which does not depend on N) with 01ρ(x)dx=1. Consider a probability distribution satisfying

P(M=m)=N1ρ(m/N)(1+O(N1))

and such that oracles having the same number m of solutions have the same probability (symmetry condition). One possibile density function is e.g. ρ(p)=1.

The previous distribution does satisfy the symmetry condition but not the part about the N​-independence of ρ as the proof above showed that the corresponding ρ approached the delta distribution at 1/2 (got narrower with increasing N).

For these distributions one can readily see that conditioned on M=m (for any m) the Xj are again iid according to the Bernoulli distribution with parameter p=m/N. The symmetry condition is the reason for this (again).

Theorem
Let c>0 and β>1/2. Consider the just discussed probability distribution on the inputs (the oracles). Assume that for any N and a randomly chosen f the algorithms determine M within an accuracy cM with probability at least β. Then k=Ω(N) as N.
Proof

Let us write a±b for the interval [ab,a+b). Recall that M is the number of solutions of the input f and hence is a random variable.

Without loss of generality we assume that ρ is compactly supported in (0,1), i.e. there is an ε>0 such that ρ is zero outside of [ε,1ε]. If this was not already the case we could just replace ρ by some "trimmed" (and normalized) version ρε. If we choose ε small enough the success probability βε of the algorithm is close to β (uniformly for all N) and hence larger than 1/2.

Let

βm=P(h(X)m±cm|M=m)

By assumptions we have mβm=β>1/2. Here and in the following we implicitly assume that m is in the "positive probability zone" (in particular bounded away from 0 and N1). For example in the sum mβm we may assume

εNm(1ε)N.

For other expressions the "positive probability zone" might be slightly extendend to the left or right (or both). This comes in handy next since it avoids "index overflow". Let

βm=P(h(X)m±cm|M=m+cm)

Clearly the corresponding events imply the failure of the algorithm (accuracy is too bad). Hence mβm1β<1/2. Let us summarize what we have so far:

mβm=β>1/2andmβm1β<1/2.

Recall that we have to show k=Ω(N). Using proof by contradiction and possibly restricting to an infinite subset of the values for N of we may assume that k=o(N) in the following.

Our goal is to show that under this assumption βmβm (i.e. βm/βm1) as N uniformly with respect to m. This is the desired contradition to what we know already.

Let S=j=1kXj and p=m/N. By our assumption on the probability distribution of f we have that the Xj are Bernoulli distributed with parameter p and hence

P(S=s|M=m)=(ks)ps(1p)ks.

(i.e. S satisfies a binomial distribution if conditioned on fixed M.) Moreover the symmetry condition on the distribution further implies that

P(X=x|S=sM=m)

is the same for all x with jxj=s (and 0 otherwise). One could argue that this condition means that (for fixed M) no algorithm has any chance of extracting any information beyond s from X since it is not able to generate a probability distribution with less entropy.

To prove βmβm it is sufficient to prove that P(X=x|M=m)P(X=x|M=m+cm) (both uniformly with respect to m). To prove the latter, by the just discussed consequences of the symmetry property, it is sufficient to prove

P(S=s|M=m)P(S=s|M=m+cm),

uniformly with respect to m. Actually we only prove this for "most" s (in the sense that their probability sums to a value asymptotically approaching 1 (for both values of M)), but this is sufficient. Let R=O(1/s)+O(1/(ks)), q=s/k, and σq2=q(1q)/k. Let H(q)=qln(q)(1q)ln(1q) and define

E(q,p)=H(q)+qln(p)+(1q)ln(1p).

A straightforward calculation using Stirling's formula n!=2πn(n/e)n(1+O(n1)) shows

P(S=s|M=m)=(ks)ps(1p)ks=k12πσq2[qq(1q)1q]k[pq(1p)1q]k(1+R)=k12πσq2ekE(q,p)(1+R).

Observe that at q=p we have E=qE=0 and 12q2E=1/(2σp2). Recall that we are essentially only interested in p[ε,1ε] (this is not entirely correct but the details do not matter). Note that σq=σp(1+O(k1/2)) for q=p+O(k1/2). Hence the probability concentrates around s=kp±O(k) and for these s we have:

P(S=s|M=m)=k12πσp2e(qp)22σp2(1+o(1)).

Compare this with the normal distribution. We essentially proved the central limit theorem in a special case.

Recall that q=s/k and p=m/N. Also observe that σp2=Θ(k1) and that p:=p+cm/N=p+o(k1/2). Using this in the just derived asymptotic formula for P(S=s|M=m) yields the claim. QED.

Interpretation 2

Let us offer an interpretation which is analogous to exercise 6.13. Let again consider an algorithm as sketched in the previous interpretation. But now we assume more specifically that the Nj are just sampled uniformly at random. In particular k is deterministic and only depends on N (not on the oracle).

  1. Input: An oracle computing a function f:{0,,N1}{0,1}.
  2. Compute k=k(N).
  3. Determine X=(X1,,Xk) where Xj=f(Nj) and the Nj are sampled uniformly at random on the input space.
  4. Compute hk(X) and return the result.

In this case we can consider individual oracles and need not consider probability distributions over them. The Xj are again iid according to the Bernoulli distribution with parameter p=M/N where M is the number of solutions for the oracle under consideration.

We note again that we consider a sequence of algorithms (as we are interested in the asymptotics N).

Theorem
Consider a sequence of algorithms as discussed above. Let c>0 and α>0. Assume that for any oracle the responsible algorithm (the one with the matching value for N) computes M correctly within accuracy cM with probability at least α. Then the number of oracle calls is k=Ω(N).
Proof

First of all we note that if M is correct within accuracy cM it is also correct within cN. From here on we only assume the latter. Moreover let us write a±b for the interval [ab,a+b).

Without loss of generality we may assume that k=o(N) and have to deduce a contradiction from that. If this were not true we might consider a subsequence (Ni) such that the corresponding ki=ki(Ni) satisfy ki=o(Ni).

Let r=r(k,N) be such that r in a way such that r2k=o(N). This is possible due to k=o(N). Consider a set {f1,,fr} of oracles with corresponding

pi=12+2iciN

so that Mi=Npi is the number of solutions of fi(x)=1. Here ci the smallest cic so that Mi is an integers. Note that ci=c+O(N1/2). It is not essential that these pi are close to 1/2. The important thing is that they are close together (the width is o(1/k)) and bounded away from p=0 and p=1. Moreover we will see below that it is also not really important that r, instead it is sufficient that we can choose it arbitrary large, at least for large N.

Observe that if fi is the input oracle the Xj are iid and Bernoulli distributed with parameter pi. Let us denote by Ppi the corresponding probability measure. Let us write

Ii=pi±cN

By definition of the pi these intervals are disjoint. By assumptions of the theorem we have

Ppi(h(X)Ii)α>0.

Lemma

Let β(0,1). There is a constant δ(0,1] such that for all N the following holds: For each k=k(N) there is a set Uk{0,1}k such that for all the pi

Ppi(XUk)β.

and for any combination pi, pj and any x

Ppi(X=x)δPpj(X=x).

Proof

Let Y=j=1kXj. For any fi by the central limit theorem the probability distribution of Y approaches the normal distribution with mean Mi/2=12+O(N) and variance O(k) (we have essentially proven this directly in the proof of another theorem).

Since r/N=o(1/k) the widths of the corresponding gaussian bell curves are much larger than any of the pi are away from each other - at least for large N. Moreover they are roughly the same and the convergence is uniform. Hence

Uk={x{0,1}k|jxjk2±c0k}

for a suitably chosen c0 (independent of k or N) does the job for all k large enough (δ could even be chosen as close to 1 as we like). For all smaller k we can simply choose Uk={0,1}k because the condition involving δ is trivially true by finiteness for a sufficiently small δ. QED.

Let Uk be as in the lemma for β chosen such that γ=α+β1>0. Moreover let

Uk(j)=h(X)IjXUk.

Then

Ppi(Uk(j))α+β1=γ>0.

By the condition in the lemma involving δ and the disjointness of the Ij we see that

Pp0(j=1rUk(j))rγδ.

The LHS is bounded by 1. On the other hand the RHS tends to as N since r. This is the desired contradiction. QED.

Exercise 6.15

Use the Cauchy–Schwarz inequality to show that for any normalized state vector |ψ and set of N orthonormal basis vectors |x,

xψx22N2N.

Proof

Let ψx=x|ψ. Observe that

ψx2=22(x|ψ)=22(ψx).

Moreover by the Cauchy-Schwarz inequality

x(ψx)x|ψx|1(x|ψx|2)12(x1)12=N.

Using this while doing a summation of the first observation yields the claim.

Exercise 6.16

Suppose we merely require that the probability of an error being made is less than 1/2 when averaged uniformly over the possible values for x, instead of for all values of x. Show that Ω(N) oracle calls are still required to solve the search problem.

Remark
The original exercise spoke about O(N) instead of Ω(N). I corrected this typo.

Proof

Below we sketch the main steps of the proof as it is presented in the book. For this exercise the only relevant modification happens at step 5. Let the notation be as in the book:

Dk=xψkxψk2,Ek=xψkxx2,Fk=xxψk2.

with ψk=(i=1kUi)ψ and ψkx=(i=1kUiOx)ψ where the Ui are arbitrary unitary operators, ψ is an arbitrary normalized state, and Ox=I2|xx| is the oracle. Moreover let

px=|x|ψkx|2

be the success probability for a given value of x. The proof from the book assumed px1/2. The exercise weakened this to xpxN/2.

  1. For all k

    Dk+1(Dk+2)2.

    This can be proved using OxI=2|xx| and the Cauchy-Schwarz inequality.

  2. For all k

    Dk4k2.

    Follows by induction from 1 and D0=0.

  3. For all k

    Dk(FkEk)2.

    Follows from the triangle inequality and the Cauchy-Schwarz inequality.

  4. For all k

    Fk2N2N.

    This follows from the Cauchy-Schwarz inequality (see exercise 6.15).

  5. If xpxN/2 then for all k

    EkN.

    Since this is the crucial step for this exercise we give a detailed proof below. (Under the stronger assumption that px1/2, for all x, the proof from the book showed the stronger statement Ek(22)N.)

  6. For all k

    Dk=Ω(N).

    Follows from 3, 4, and 5.

The claim (that Ω(N) oracle calls are necessary) follows from 2 and 6. QED.

Proof of 5

As in the main text we may assume that x|ψkx>0 (replace |x by eiθ|x if necessary). Hence

ψkxx2=22|x|ψkx|=22px.

Using 0px1 (hence pxpx) and xpxN/2 we obtain

Ek=x(22px)x(22px)N.

Note that this inequality is (essentially) sharp since equality holds if half of the px is 1 and the other half are 0 (at least for even N this is possible). QED.

Exercise 6.17 (Optimality for multiple solutions)

Suppose the search problem has M solutions. Show that Ω(N/M) oracle applications are required to find a solution.

Remark
The original exercise spoke about O(N/M) instead of Ω(N/M). I corrected this typo.

Proof

We closely follow the structure of the proof of exercise 6.16 (which in turn follows the proof from the book). Without loss of generality we may assume that

MN4.

Otherwise O(N/M)=O(1) isn't interesting anyway. Let us define, essentially as in the book, ψk=(i=1kUi)ψ and ψkf=(i=1kUiOf)ψ where the Ui are arbitrary unitary operators, ψ is an arbitrary normalized state, and Of=I2f(x)=1|xx| is the oracle for some f with M solutions for the equation f(x)=1. Let us define the projection operator

Pf=f(x)=1|xx|=(IOf)/2.

Note that there are unique normalized states χkf and ϕkf from the subspaces corresponding to f(x)=1 and f(x)=0 resepectively, and unique af,bf0 (with af2+bf2=1) such that

ψkf=afχkf+bfϕkf.

Note that the success probability (which is assumed to be at least 1/2 for any f) is given by pf=af2. Finally consider

Dk=fψkfψk2,Ek=fψkfχkf2,Fk=fχkfψk2.

Here the summation ranges over all f which have exactly M solutions for f(x)=1.

  1. For all k

    fPfψk2=MN(NM)=:S.

    Proof
    Let ψk=xcx|x. Then Pfψk2=f(x)=1|cx|2 is just a sum of squares of M of the coefficients of ψk. By symmetry in the overall sum over all f each coefficient appears equally often. All in all there are M(NM) summands of the form |cx|2. Because x|cx|2=1 (which are exactly N summands) the claim follows. QED.
  2. For all k

    Dk+1(Dk+2S)2.

    Proof

    This is just a short calculation using the Cauchy-Schwarz inequality (in the second to last step):

    Dk+1=fOfψkfψk2=fOf(ψkfψk)+(OfI)ψk2f(ψkfψk2+4ψkfψkPfψk+4Pfψk2)Dk+4DkS+4S=(Dk+2S)2.

    QED.

  3. For all k

    Dk4k2S.

    Proof
    This follows by induction from 2 and D0=0. QED.
  4. For all k

    Dk(FkEk)2.

    Proof
    This follows from the triangle inequality and the Cauchy-Schwarz inequality (the short proof is exactly the same as the analogous inequality from the book).
  5. For all k

    Fk(22MN)(NM).

    Note that here the bound on M comes in handy.

    Proof

    By Pfχkf=χkf, χkf=1, and the fact that Pf is hermitian we have

    χkfψk=22(χkf|Pfψk)22Pfψk.

    Using this together with the definition of S (see 1) and the Cauchy-Schwarz inequality we get

    Fk2(NM)2(fPfψk2)12(f1)12=2(NM)2S(NM).

    The claim follows from 1. QED.

  6. If pf1/2 for all f then for all k

    Ek(22)(NM).

    Proof

    From the relation between χkf and ψkf we get

    Ek=f((1af)2+bf2)=f(22af)

    Now the assumption on the success probability af2=pf1/2 yields the claim. QED.

  7. For all k

    Dk=Ω((NM)).

    Proof
    This follows from 4, 5, 6, and the bound on M.

The assertion k=Ω(N/M) now follows from 1, 3, and 7. QED.

Exercise 6.18

Prove that the minimum degree polynomial representing a Boolean function F(X) is unique.

Remark
The proof below shows a stronger statement: The multi-linear polynomial representing F is unique. Here (as in the book) multi-linear means that any variable Xi occurs only as Xi0=1 or Xi1 but not with higher powers. Note that any minimal polynomial is necessarily multi-linear.

Proof

Any multi-linear polynomial representing F is of the form

P(X)=ecek=0N1Xkek

where the sum ranges over all e:{0,,N1}{0,1} (binary strings of length N). In the following we use the notation bc(e) (bit count) to denote kek.

Lemma
Let P and Q be two multi-linear polynomials representing F with corresponding coefficients (ce) and (de). Let n{0,,N1}. Assume that ce=de for all e with bc(e)n. Then ce=de for all e with bc(e)n+1.
Proof

Let e be such that bc(e)=n+1. Choose X=e. By assumptions

0=F(X)F(X)=P(X)Q(X)=cede.

This (essentially) implies the claim. QED.

Also note that P(0,,0)=c(0,,0). Hence two multi-linear polynomials representing F are proven to be equal by mathematical induction (using the lemma as the induction step and the just mentioned fact as the induction start). QED.

Exercise 6.19

Show that P(X)=1k=0N1(1Xk) represents OR.

Proof

This is obvious. Probably one way to make it even more obvious is to note that

P(X)={0if Xk=0 for all k,1else.

This is precisely the behaviour of the OR function. QED.

Exercise 6.20

Show that Q0(OR)N by constructing a polynomial which represents the OR function from the output of a quantum circuit which computes OR with zero error.

Remark
  • Proving Q0(OR)N implies Q0(OR)=N since N oracle calls to all possible inputs are clearly sufficient to implement a deterministic algorithm to solve the search problem (OR corresponds to the search problem whether Xi=1 has a solution i).
  • Exactly this is proved in (Robert Beals and Harry Buhrman and Richard Cleve and Michele Mosca and Ronald de Wolf, 1998). This paper served me well as a nice introduction to the method of polynomials.

Proof

First of all note that 1k=0N1(1Xk) is the minimal polynomial of OR. This follows from the proof of exercise 6.18 since this polynomial is already multi-linear.

Hence deg(OR)=N.

Let us briefly recall how an algorithm for the zero-error case works. First a number m of qubits is prepared in some initial state, without loss of generality |0. Then a quantum circuit (i=1TUiO)U0 is applied, where the Ui are arbitrary unitary operators and O is the call to the oracle. Overall there are T calls and we have to prove that TN.

To actually return something we have to measure at least some of the qubits. Without loss of generality we may assume that all m qubits are measured (in practice one probably only measures two qubits (which is sufficient to differentiate up to four answers)). Let Y, N, I be disjoint sets whose union is {0,,2m1}. Based on whether the measurement is in Y, N, or I we return yes, no, and inconclusive. We may assume that at least Y and N are non-empty (otherwise the algorithm cannot be useful).

Let

|ψ(X)=k=02m1ck(X)|k

be the final state right before measurement. From the book we already know that each ck(X) is a complex polynomial of degree at most T. Consider

PN(X)=kN|ck(X)|2.

This is the probability of returning no (meaning that the algorithm claims that X contains no ones). Since the algorithm is zero-error we have PN(X)=0 for X0. Hence ck(X)=0 for X0 and all kN. On the other hand there must be a kN such that ck(0)0. Let

P(X)=(1ck(X)ck(0)).

Clearly this is a real polynomial of degree at most T. By construction it represents OR. Using the fact that deg(OR)=N this implies that TN, as desired. QED.

References

Robert Beals and Harry Buhrman and Richard Cleve and Michele Mosca and Ronald de Wolf (1998). Quantum Lower Bounds by Polynomials.