Quantum Convolutional Neural Networks

We introduce and analyze a novel quantum machine learning model motivated by convolutional neural networks (CNN). Our quantum convolutional neural network (QCNN) makes use of only $O(\log(N))$ variational parameters for input sizes of $N$ qubits, allowing for its efficient training and implementation on realistic, near-term quantum devices. We show that QCNN circuits combine the multi-scale entanglement renormalization ansatz and quantum error correction to mimic renormalization-group flow, making them capable of recognizing different quantum phases and associated phase transitions. As an example, we illustrate the power of QCNNs in recognizing a 1D symmetry-protected topological phase, and demonstrate that a QCNN trained on a set of exactly solvable points can reproduce the phase diagram over the entire parameter regime. Finally, generalizations and possible applications of QCNN are discussed.

Machine learning based on neural networks has recently provided significant advances for many practical applications 1 . Motivated by the progress in realization of quantum information processors [2][3][4][5] , it is intriguing to explore if quantum computers can be utilized to further enhance performance of machine learning systems 6 . In particular, one natural application involves the study of quantum many-body systems, where the extreme complexity of many-body states often makes theoretical analysis intractable.
Recently, a number of important connections between machine learning and quantum many-body systems have been pointed out. On the one hand, it is natural to ask whether machine learning techniques can be used for efficient description of different input states of complex quantum systems [7][8][9][10] . On the other hand, it is intriguing to inquire if quantum computers can be used to further enhance machine learning for practical problems 6,11 . For instance, one important class of quantum many-body problems involves quantum phase recognition (QPR). Given a particular quantum phase of matter P and a quantum many-body system in its (unknown) ground state |ψ G , QPR asks whether |ψ G belongs to P. While QPR represents a direct quantum analog of common machine learning tasks such as image classification, the direct application of classical machine learning is challenging. First, intrinsically quantum phenomena such as superposition and entanglement hinder a direct application of traditional neural network algorithms. In particular, for a given, limited number of copies of the input state wavefunction encoded in qubits, it is unclear how to efficiently transfer this input to a classical machine without performing quantum state tomography. Furthermore, while recent work pointed out possible connection of neural networks to physical concepts such as phase transitions and renormalization-group flow 12,13 , a full theoretical understanding of their underlying mechanism is still unknown. This makes it difficult to determine the specific neural network structure to best capture the key properties of many-body states required for QPR. Motivated by these considerations, we develop a quantum circuit model inspired by convolutional neural networks (CNNs) in which the wavefunction |ψ G is given directly as input to the quantum circuit. To demonstrate the success of our model for QPR, we theoretically analyze how our circuit structure mimics techniques used to classify quantum phases 14 , and provide a detailed numerical demonstration on an example problem.
CNNs provide a successful machine learning architecture for classification tasks such as image recognition 1,15,16 . A CNN generally consists of a sequence of different (interleaved) layers of image processing: con-volution, pooling, and fully connected layers. In each layer, an intermediate 2D array of pixels, called a feature map, is produced from the previous one (Fig. 1a) 17 . CNN's key properties involve translationally invariant convolution and pooling layers, each characterized by a constant number of parameters (independent of system size), and sequential data size reduction (i.e., a hierarchical structure). To implement these properties, the convolution layers compute new pixel values x ( ) ij from a linear combination of nearby ones in the preceding map x i+a,j+b , where the weights w a,b form a w × w matrix. Pooling layers reduce feature map size, e.g. by taking the maximum value from a few contiguous pixels, and are often followed by application of a nonlinear (activation) function. Once the feature map size becomes sufficiently small, the final output is computed from a function that depends on all remaining pixels (fully connected layer). The weight matrices and the fully connected function are optimized by training on large datasets. In contrast, variables such as the number of convolution and pooling layers and the size w of the weight matrices (known as hyperparameters) are fixed for a specific CNN 1 .

QCNN CIRCUIT MODEL
We introduce a quantum circuit model (QCNN) that extends the key properties of CNNs to the quantum domain (Fig. 1b). The input to the circuit model is an (unknown) quantum state ρ in . In a convolution layer, we apply a single quasi-local unitary (U i ) in a translationallyinvariant manner for finite depth. For pooling, a fraction of qubits are measured and their outcomes determine controlled unitary rotations applied to nearby qubits. Hence, nonlinearities in QCNN arise from reducing the number of degrees of freedom. Convolution and pooling layers are performed until the system size is sufficiently small; then, a fully connected layer is applied as a unitary F on the remaining qubits. Finally, the classification result is obtained by measuring a fixed number of output qubits. As in the classical case, circuit structures (i.e. QCNN hyperparameters) such as the number of convolution and pooling layers are fixed, and the unitaries themselves are learned.
A QCNN to classify N -qubit input states is thus characterized by O(log(N )) parameters (a constant number per layer). This corresponds to doubly exponential reduction compared to a generic quantum circuit-based classifier 11 and allows for efficient learning and experimental implementation. In particular, the learning procedure uses classified training data {(|ψ α , y α ) : α = 1, ..., M }, where |ψ α are input states and y α = 0 or 1 are the corresponding binary classification outputs. With these samples, one can quantify the performance of a QCNN through an error function. For example, if the QCNN is characterized by unitaries {U i , V j , F } and has an expected final measurement value f {Ui,Vj ,F } (|ψ α ) for each |ψ α , we can compute the mean-squared error Learning then consists of initializing all the unitaries to certain values, and successively optimizing them via, for example, gradient descent until convergence: where η is the learning rate. More advanced optimization methods can be adapted from traditional machine learning techniques (see Methods).
To gain physical insight into the mechanism underlying the operation of the QCNN, and to motivate its application to QPR, we now relate our circuit model to two well-known concepts in quantum information theorythe multiscale entanglement renormalization ansatz 18 (MERA) and quantum error correction (QEC). The MERA framework provides an efficient tensor network representation of many classes of interesting many-body wavefunctions, including those associated with the socalled critical systems [18][19][20][21] . A MERA can be understood as a quantum state generated by a sequence of unitary disentangling and isometry layers applied to an input state (e.g. |00 ). While each isometry layer introduces a set of new qubits in a predetermined state (e.g. |0 ) before applying unitary gates on nearby ones, disentangling layers simply apply quasi-local unitary gates to the existing qubits (see Fig. 1c). This exponentially growing, hierarchical structure allows for long-range correlations, which are associated with critical systems. The QCNN circuit has similar structure, but runs in the reverse direction. Hence, for any given state |ψ with a MERA representation, there is always a QCNN circuit that recognizes |ψ with deterministic measurement outcomes (e.g. all qubits in |0 ); one such QCNN is simply the inverse of the MERA circuit.
For input states other than |ψ , however, this QCNN does not generally produce deterministic outcomes, and one must then specify how the circuit operates. The additional degrees of freedom, associated with results of quantum measurements, distinguish QCNN from MERA. Specifically, we can identify the measurement outcomes as syndrome measurements in QEC 22 , which determine error correction unitaries V i one needs to apply on the remaining qubit(s). Thus, a QCNN circuit with multiple pooling layers can be viewed as a combination of MERA -an important variational ansatz for many-body wavefunctions -and nested QEC -a mechanism to detect and correct quantum errors without collapsing the entire wavefunction. This makes QCNN a powerful architecture to classify input quantum states. In particular, for the purpose of QPR, the QCNN can provide a MERA realization of a representative state |ψ 0 in the target phase. The other input states |ψ within the same phase can be viewed as |ψ 0 with additional local errors, which can be repeatedly corrected by QCNN in multiple layers. In this sense, the QCNN circuit can mimic renormalization-group (RG) flow, a methodology which successfully classifies many families of quantum phases 14 .

EXAMPLE: DETECTING A 1D SPT PHASE
We demonstrate the potential of QCNN explicitly by applying it to QPR in a class of one dimensional many-body systems. Specifically, we consider a Z 2 × Z 2 symmetry-protected topological (SPT) phase P and input states {|ψ G } that are ground states of a family of Hamiltonians on a spin-1/2 chain with open boundary conditions: (2) X i , Z i are Pauli operators for the spin at site i, and the Z 2 × Z 2 symmetry is generated by X even(odd) = i∈even(odd) X i . Figure 2a shows the phase diagram of H as a function of (h 1 /J, h 2 /J), where the SPT phase is adjacent to paramagnetic and antiferromagnetic phases. When h 2 = 0, the Hamiltonian is exactly solvable via Jordan-Wigner transformation 14 , confirming that the P is characterized by nonlocal order parameters. When h 1 = h 2 = 0, all terms are mutually commuting, and a ground state is the 1D cluster state which has zero correlation length. Our goal is to identify whether a given ground state (for arbitrary h 1 , h 2 ) drawn from the phase diagram and encoded in qubits belongs to the Z 2 × Z 2 SPT (P).
In principle, P can be detected by measuring a nonzero expectation value of a string order parameter 23,24 In practice, however, S ab decays with the system's correlation length. This decay is related to the delocalization of edge modes in SPT states near the phase boundary and makes it difficult to detect a nonzero value. In contrast, we now show that QCNN circuits can be used to extract a non-local order parameter which sharply defines the phase even near criticality.

Exact QCNN Circuit
We first present an exact, analytical QCNN circuit that recognizes P, as shown in Fig. 2b (see Methods). The convolution layers involve controlled-phase gates as well as two-qubit-controlled X gates (Toffoli gates with controls in the X-basis), and the pooling layers perform phase-flips on remaining qubits when one of the adjacent measurements yields X = −1. The convolution-pooling unit is repeated d times, where d is the depth of the QCNN. The fully connected layer measures the string operator Z i−1 X i Z i+1 on the remaining qubits. Figure  2c shows the QCNN output for a system of N = 135 spins and d = 1, ..., 4 along h 2 = 0.5J, obtained using matrix product state simulations (see Methods). As d is increased, the measurement outcomes show sharper changes around the critical point. In fact, the output of a d = 2 circuit already reproduces the phase diagram with high accuracy (Fig. 2a).

Interpretation using String Order Parameters
One intuitive way to understand the success of our circuit is to examine the final measurement operator in the Heisenberg picture. Although a QCNN performs nonunitary measurements in the pooling layers, similar to QEC circuits 22 , it is equivalent to a setup where all measurements are postponed to the very end and the pooling layers are replaced by unitary controlled gates which act on both the measured and unmeasured qubits. In this way, the QCNN can be viewed as unitary evolution of the N input qubits followed by measurements, and the classification output is equivalent to measuring a nonlocal observable where i is the index of the measured qubit in the final layer and U (l) CP is the unitary corresponding to the convolution-pooling unit at depth l. A more explicit expression of O can be obtained by commuting U CP with the Pauli operators, which yields recursive relations: In these equations, we have usedĩ to enumerate every qubit at depth l − 1, including those measured in the pooling layer (Fig. 3a). It follows that a string operator ZXX...XZ after depth l of convolution-pooling layers becomes a weighted linear combination of 16 products of string operators at depth l − 1. Thus, instead of measuring a single S ab , our QCNN circuit measures a sum of products of exponentially many different string operators (Fig. 3b): The coefficients in this sum are computed recursively in d using Eqs. (5,6). This allows the QCNN to produce a sharp classification output even when the correlation length is as long as 3 d . 8 Interpretation using MERA and QEC Additional insights into the QCNN's performance are revealed by interpreting it in terms of MERA and QEC. In particular, our QCNN circuit is designed such that the 1D cluster state (|ψ 0 ) is a fixed point; each convolutionpooling unit produces a 1D cluster state with reduced system size in the unmeasured qubits, while yielding deterministic outcomes (X = 1) in the measured qubits. In other words, our circuit contains a MERA representation of |ψ 0 . The fully connected layer then simply measures the string operator for |ψ 0 . When an input wavefunction is perturbed away from the fixed point, our QCNN performs QEC. For example, if a single X error occurs at any spin in |ψ 0 , the first pooling layer identifies its location, and controlled unitary operations correct the error propagated through the circuit (Fig. 3c). Similarly, if an initial state has multiple, sufficiently separated errors (possibly in coherent superpositions), the error density after several iterations of convolution and pooling layers will be significantly smaller 25 . If the initial input state converges to the fixed point, our QCNN classifies it into the SPT phase with high fidelity. Clearly, this mechanism resembles the classification of quantum phases based on renormalization-group (RG) flow.

Obtaining QCNN from Training Procedure
Having analytically illustrated the computational power of the QCNN circuit model, we now demonstrate how a QCNN for P can also be obtained using the learn-ing procedure. In our example, the QCNN's hyperparameters are chosen such that there are four convolution layers and one pooling layer at each depth, followed by a fully connected layer (see Methods). Initially, all the unitaries {U i , V j , F } are set to random values. Because simulating our training procedure with classical computers requires large amounts of computational resources, we focus on a relatively small system size with N = 15 spins and consider a QCNN of depth d = 1; in this case, there are a total of 1309 parameters to be learned (see Methods). Our training data consists of ground states along the line h 2 = 0, where the Hamiltonian is exactly solvable by Jordan-Wigner transformation. Specifically, we use 40 evenly spaced points with h 1 ∈ [0, 2] (e.g. gray dots in Fig. 4). Using gradient descent with the meansquared error function (1), we then iteratively update the unitaries until convergence (see Methods). The classification output of the resulting QCNN for generic h 2 is shown in Fig. 4. Remarkably, this QCNN accurately reproduces the 2D phase diagram over the entire parameter regime, even though the model was trained only on samples from a set of solvable points which does not even cross the lower phase boundary.
This example provides an important illustration of how the QCNN circuit structure avoids overfitting to training data with its exponentially reduced number of parameters. While the training dataset for this particular QPR problem consists of solvable points, more generally, such a dataset can be obtained by using traditional methods (e.g. measuring string order parameters) to identify the phases of representative states that can be efficiently generated either numerically or experimentally 27,28 .

GENERALIZATION
Our interpretation of QCNNs in terms of MERA and QEC motivates their application for recognizing more generic quantum phases. For any quantum phase P whose RG fixed-point wavefunction |ψ 0 (P) has a tensor network representation in isometric or G-isometric form 30 (Fig. 5a), one can systematically construct a corresponding QCNN circuit. This family of quantum phases includes all 1D SPT and 2D string-net phases [30][31][32] . In these cases, one can explicitly construct a commuting parent Hamiltonian for |ψ 0 (P) and a MERA structure in which |ψ 0 (P) is a fixed-point wavefunction ( Fig. 5a for 1D systems 33 ); the diagrammatic proof of this fixed-point property is given in Fig. 5b. Furthermore, any "local error" perturbing an input state away from |ψ 0 (P) can be identified by measuring a fraction of terms in the parent Hamiltonian, similar to syndrome measurements in stabilizer-based QEC 34 . Then, a QCNN for P simply consists of the MERA for |ψ 0 (P) and a nested QEC scheme in which an input state with error density below the QEC threshold 35 "flows" to the RG fixed point. Such a QCNN can be optimized via our learning procedure.
While our generic learning protocol begins with completely random unitaries, as in the classical case 1 , this initialization may not be the most efficient for gradient descent. Instead, motivated by deep learning techniques such as pre-training 1 , a better initial parameterization would consist of a MERA representation of |ψ 0 (P) and one choice of nested QEC. With such an initialization, the learning procedure serves to optimize the QEC scheme, expanding its threshold to the target phase boundary (Fig. 5c).

EXPERIMENTAL CONSIDERATIONS
Our QCNN architecture can be efficiently implemented on several state of the art experimental platforms. The key ingredients for realizing QCNNs include the efficient preparation of quantum many-body input states, the application of two-qubit gates at various length scales, and projective measurements 36 . These capabilities have already been demonstrated in multiple programmable quantum simulators consisting of N ≥ 50 qubits, based on trapped neutral atoms and ions, or superconducting qubits [37][38][39][40] . In particular, recent developments of 1D and 2D arrays of trapped Rydberg atoms 37,41 provide  Figure 5: (a) Given a state with a translationally invariant, isometric matrix product state representation (e.g. a fixed point state for a 1D SPT phase), we explicitly construct an isometry for the MERA representation of this state. Blue squares are the matrix product state tensors, while black lines are the legs of the tensor. While we have illustrated a 3-to-1 isometry, the generalization to arbitrary n-to-1 isometries is straightforward. (b) Diagrammatic proof showing that a MERA constructed from the above tensor maps the fixedpoint state back to a shorter version of itself. The first equality uses the definition of isometric tensor, and loops in the middle diagram simplify to a constant number unity. The generalization of this isometry to higher dimensions is discussed in Ref. 29. (c) One helpful initial parameterization for QPR problems consists of a MERA for the fixed point state |ψ0(P) and a choice of nested QEC, so that states within the QEC threshold flow toward |ψ0(P) . Training procedures then expand this threshold boundary to the phase boundary. a promising avenue, where long-range dipolar interactions allow high fidelity entangling gates 42 among distant qubits in a variable geometric arrangement. Note that the limited depth of our circuit ∼ log(N ) may allow its implementation even in the near term experiments with relatively short coherence time.

OUTLOOK
These considerations indicate that QCNNs provide a promising quantum machine learning paradigm. Several interesting generalizations and future directions can be considered. First, while we have only presented the QCNN circuit structure for recognizing 1D phases, it is straightforward to generalize the model to higher dimensions, where phases with intrinsic topological order such as the toric code are supported 32,43 . Studying QCNNs in two and higher dimensions could potentially help identify nonlocal order parameters for lesser-understood phases, such as quantum spin liquids 44 or anyonic chains 45 , which have sharp expectation values up to the phase boundaries. To recognize more exotic phases, we could also relax the translation-invariance constraint on convolution and pooling layers (resulting in O(N ) parameters for system size N ), or use ancilla qubits to implement multiple parallel feature maps, following traditional CNN architecture. Another promising application of QCNNs is in the design of fault-tolerant quantum storage and quantum operations, where specific noise models can be used to generate training data to optimize the design of QEC codes and operations on their code spaces. Finally, while we have used a finite-difference scheme to compute gradients in our learning demonstrations, the structural similarity of QCNN with its classical counterpart motivates potential adoption of more efficient gradient computation schemes, e.g. inspired by backpropagation 1 . Figure 6: Circuit parameterization for training a QCNN to solve QPR. Our circuit involves 4 different convolution layers (C1 − C4), a pooling layer, and a fully connected layer. The unitaries are initialized to random values, and learned via gradient descent.

Phase Diagram and QCNN Circuit Simulations
The phase diagram in the main text (Fig. 2a) was numerically obtained using infinite size density-matrix renormalization group (DMRG) algorithm. We generally follow the method outlined in Ref. 46 with the maximum bond dimension 150. In order to extract each data point in Fig. 2a, we numerically obtain the ground state energy density as a function of h 2 for a fixed h 1 and compute its second order derivative. The phase boundary points are identified from sharp peaks of the derivative.
The simulation of our QCNN in Fig. 2b also utilizes matrix product state representations. We first obtain the input ground state wavefunction using finite-size DMRG 46 with bond dimension D = 130 for a system of N = 135 qubits. Then, the circuit operations are performed by sequentially applying swap and two-qubit gates on nearest neighboring qubits 47 . Each three-qubit gate is decomposed into two-qubit unitaries 48 . We find that increasing bond dimension to D = 150 does not lead to any visible changes in our main figures, confirming a reasonable convergence of our method. The color plot in Fig. 2a is similarly generated for a system of N = 45 qubits.

Demonstration of Learning Procedure
To perform our learning procedure in a QPR problem, we choose the hyperparameters for the QCNN as shown in Fig. 6. This hyperparameter structure can be used for generic (1D) phases, and is characterized by a single integer n that determines the reduction of system size in each convolution-pooling layer, L → L/n. (Fig. 6 shows the special case where n = 3). The first convolution layer in-volves (n + 1)-qubit unitaries starting on every n th qubit. This is followed by n layers of n-qubit unitaries arranged as shown in Fig. 6. The pooling layer measures n − 1 out of every contiguous block of n qubits; each of these is associated with a unitary V j applied to the remaining qubit, depending on the measurement outcome. This set of convolution and pooling layers is repeated d times, where d is the QCNN depth. Finally, the fully connected layer consists of an arbitrary unitary on the remaining N/n d qubits, and the classification output is given by the measurement output of the middle qubit (or a any fixed choice of one of them). For our example, we choose n = 3 because the Hamiltonian in Eq. (2) involves threequbit terms.
In our simulations, we consider only N = 15 spins and depth d = 1, because simulating quantum circuits on classical computers requires a large amount of resources. We parameterize unitaries as exponentials of generalized a × a Gell-Mann matrices {Λ i }, where a = 2 w and w is the number of qubits involved in the unitary 49 This parameterization is used directly for the unitaries in the convolution layers C 2 − C 4 , the pooling layer, and the fully connected layer. For the first convolution layer C 1 , we restrict the choice of U 1 to a product of six two-qubit unitaries between each possible pair of qubits: is a two-qubit unitary acting on qubits indexed by α and β. Such a decomposition is useful for experimental implementation.
In the QCNN learning procedure, all parameters c µ are set to random values between 0 and 2π for the unitaries {U i , V j , F }. In every iteration of gradient descent, we compute the derivative of the mean-squared error function (Eq. (1) in the main text) to first order with respect to each of these coefficients c µ by using the finitedifference method: ∂MSE ∂c µ = 1 2 (MSE(c µ + ) − MSE(c µ − ))+O( 2 ). (8) Each coefficient is thus updated as c µ → c µ − η ∂MSE ∂cµ , where η is the learning rate for that iteration. We compute the learning rate using the bold driver technique from machine learning, where η is increased by 5% if the error has decreased from the previous iteration, and decreased by 50% otherwise 50 . We repeat the gradient descent procedure until the error function changes on the order of 10 −5 between successive iterations. In our simulations, we use = 10 −4 for the gradient computation, and begin with an initial learning rate of η 0 = 10.

Construction of QCNN Circuit
To construct the exact QCNN circuit in Fig. 2b, we followed the guidelines discussed in the main text. Specifically, we designed the convolution and pooling layers to satisfy the following two important properties: 1. Fixed-point criterion: If the input is a cluster state |ψ 0 of L spins, the output of the convolutionpooling layers is a cluster state |ψ 0 of L/3 spins, with all measurements deterministically yielding |0 .
2. QEC criterion: If the input is not |ψ 0 but instead differs from |ψ 0 at one site by an error which commutes with the global symmetry, the output should still be a cluster state of L/3 spins, but at least one of the measurements will result in the state |1 .
These two properties are desirable for any quantum circuit implementation of RG flow for performing QPR.
In the specific case of our Hamiltonian, the ground state (1D cluster) is a graph state, which can be efficiently obtained by applying a sequence of controlled phase gates to a product state. This significantly simplifies the construction of the MERA representation for the fixed-point criterion. To satisfy the QEC criterion, we treat the ground state manifold of the unperturbed Hamiltonian H = −J i Z i X i+1 Z i+2 as the code space of a stabilizer code with stabilizers {Z i X i+1 Z i+2 }. The remaining degrees of freedom in the QCNN convolution and pooling layers are then specified such that the circuit detects and corrects the error (i.e. measures at least one |1 and prevents propagation to the next layer) when a single-qubit X error is present.