3
FIG. 2: Swap Test circuits. (A) The canonical Swap Test
circuit. H indicates the Hadamard gate. (B) The Swap Test
circuit adapted for IBM’s 5-qubit quantum computer, con-
structed by decomposing controlled-swap into the Toffoli gate,
via Refs. [34, 35], and then manually eliminating gates that
had no effect on the output. T is the π/8 phase gate. (C)
The structure of a Swap Test circuit, showing the locations of
the one-qubit gates and controlled-Z gates, constructed au-
tomatically by Rigetti’s compiler for their 19-qubit quantum
computer. Appendix A gives the full specification of that
circuit.
algorithms on Rigetti’s and IBM’s quantum computers,
leading to a reduction in the computational error relative
to the Swap Test.
II. MACHINE-LEARNING APPROACH
Our machine-learning approach is summarized in
Fig. 1. The variables are divided up into the hyper-
parameters (i.e. the “resources”) and the optimization
parameters (i.e. the “algorithm”).
A. Resources
The hyperparameters are the quantum resources of the
circuit. At the input, the resources are the number of
ancilla qubits and data qubits that store the input data
for the computation. At the output, the resources are
the locations of the measurements (see Fig. 1). As an
example, in the Swap Test for single-qubit states, we are
allowed access to one ancilla qubit and two data qubits
at the input, and we can measure only the ancilla qubit
at the output.
The input data may be classical or quantum, depend-
ing on the computation of interest. In the case of state
overlap, the input data are quantum states and hence
no encoding is necessary. However, for completeness, we
note that our approach also applies to classical inputs, in
which case the encoding (i.e., storing the classical data
in the quantum state of the data qubits) can be treated
as a hyperparameter that one fixes while optimizing the
algorithm.
B. Algorithm
Our approach searches for an optimal algorithm, where
we consider the algorithm to be a quantum gate sequence
with associated classical post-processing. We parameter-
ize (and hence optimize over) both the gate sequence and
the post-processing.
Let us first consider the gate sequence. We define a
gate set A = {A
j
(θ)}. Here, each gate A
j
is either a
one-qubit or two-qubit gate and may also have an inter-
nal continuous parameter θ. Hence, A is a discrete set,
but each element of A may have a continuous parame-
ter associated with it. The precise choice of A depends
on which hardware one is considering. For example, the
connectivity differs between IBM and Rigetti hardware,
and the former employs CNOT gates while the latter
employs controlled-Z gates. For IBM’s 5-qubit computer
“ibmqx4” we can write out the gate set as
A
ibmqx4
= {CNOT
10
, CNOT
20
, CNOT
21
, CNOT
32
,
CNOT
24
, CNOT
34
, U
0
(θ), U
1
(θ), U
2
(θ),
U
3
(θ), U
4
(θ)}, (1)
where U
j
(θ) is an arbitrary gate on qubit j and CNOT
jk
is a CNOT from control qubit j to target qubit k. An-
gles θ in Eq. (1) may be encoding multiple parameters.
In this article, we treat all one-qubit gates equally in
the sense that all one-qubit gates are equally complex to
implement, although our approach could easily be gener-
alized to account for different complexities for different
one-qubit gates.
We consider a generic sequence of d gates,
G
~
k
(
~
θ) = A
k
d
(θ
d
) · ·· A
k
2
(θ
2
)A
k
1
(θ
1
) , (2)
where
~
k = (k
1
, ..., k
d
) is the vector of indices describ-
ing which gates are employed in the gate sequence and
~
θ = (θ
1
, ..., θ
d
) is the vector of continuous parameters
associated with these gates.
The measurement results give rise to an outcome prob-
ability vector ~p = (p
1
, ..., p
l
, ...). The desired output
might be one of these probabilities p
l
, or it might be
some simple function of these probabilities. Hence, we
allow for some simple classical post-processing of ~p in or-
der to reveal the desired output. While there is enormous
freedom in applying a function to ~p, we consider a simple
linear combination of probabilities:
g(~p) = ~c · ~p =
X
l
c
l
p
l
(3)
where ~c is a vector of coefficients whose elements are cho-
sen according to c
l
∈ {−1, 0, 1}. This post-processing is
sufficient for the application in this paper (state overlap),