# A hardware-efficient leakage-reduction scheme for quantum error correction with superconducting transmon qubits

AA hardware-eﬃcient leakage-reduction scheme for quantum error correction withsuperconducting transmon qubits

F. Battistel, B. M. Varbanov, and B. M. Terhal

1, 2 QuTech, Delft University of Technology, P.O. Box 5046, 2600 GA Delft, The Netherlands JARA Institute for Quantum Information, Forschungszentrum Juelich, D-52425 Juelich, Germany (Dated: February 17, 2021)Leakage outside of the qubit computational subspace poses a threatening challenge to quantumerror correction (QEC). We propose a scheme using two leakage-reduction units (LRUs) that mitigatethese issues for a transmon-based surface code, without requiring an overhead in terms of hardwareor QEC-cycle time as in previous proposals. For data qubits we consider a microwave drive totransfer leakage to the readout resonator, where it quickly decays, ensuring that this negligiblyaﬀects the coherence within the computational subspace for realistic system parameters. For ancillaqubits we apply a | (cid:105) ↔ | (cid:105) π pulse conditioned on the measurement outcome. Using density-matrixsimulations of the distance-3 surface code we show that the average leakage lifetime is reduced toalmost 1 QEC cycle, even when the LRUs are implemented with limited ﬁdelity. Furthermore, weshow that this leads to a signiﬁcant reduction of the logical error rate. This LRU scheme opens theprospect for near-term scalable QEC demonstrations. Quantum computing has recently reached the mile-stone of quantum supremacy [1] thanks to a series ofimprovements in qubit count [2, 3], gate ﬁdelities [4–15]and measurement ﬁdelities [16–18]. The next major mile-stones include showing a quantum advantage [19–22] anddemonstrating quantum error correction (QEC) [3, 23–30]. Errors accumulate over time in a quantum computer,leading to an entropy increase which severely hinders theaccuracy of its output. Thus QEC is necessary to correcterrors and remove entropy from the computing system.If the overall physical error rate is below a certain noisethreshold for a given QEC-code family, the logical errorrate decreases exponentially with the code distance d atthe price of a poly( d ) overhead, thus allowing to extendthe computational time. Recently, small-size instancesof error-detecting [29] and error-correcting [3] codes havebeen experimentally realized. To further demonstratefault tolerance it is crucial to scale up these systems andshow that larger distance codes consistently lead to lowerlogical error rates than smaller distance codes [30].Leakage outside of the computational subspace [8–10, 12, 31–36], present in leading quantum-computingplatforms such as superconducting qubits and trappedions, poses a particularly threatening challenge to faulttolerance [23, 37–47]. Leakage can increase entropy bymaking measurement outcomes no longer point to theunderlying errors and can eﬀectively reduce the codedistance [45]. Furthermore, leakage can last for manyQEC cycles [39], making operations on a leaked qubitfail and possibly spread correlated errors through thecode [30, 38, 45]. In particular, leakage falls outside thestabilizer formalism of QEC as it cannot be decomposedin terms of Pauli errors. Stabilizer codes [48, 49] andtheir decoders are thus typically ill-suited to deal withleakage, leading to a signiﬁcant increase of the logicalerror rate [41, 44, 45]. If leakage lasts on average for l L avg = O (1) QEC cycles and l L avg (cid:28) d , then for low-enough error rates a threshold is likely to exist [38] asleakage would have a relatively local eﬀect in space and time. Due to a ﬁnite energy-relaxation time, leakage doesindeed last for l L avg = O (1) QEC cycles. However, inpractice it is important how large l L avg is, since if it is lowthe noise threshold is expected to be higher. Shorteningthe relaxation time to reduce l L avg is not eﬀective as thisincreases the physical error rate as well.A leakage-reduction unit (LRU) [37, 38, 40, 41, 46, 47,50, 51] is an operation introducing a seepage mechanismbesides that of the relaxation channel. A LRU convertsleakage into regular (Pauli) errors and shortens the av-erage leakage lifetime, ideally to 1 QEC cycle. As dis-cussed above, this is expected to lead to a higher noisethreshold, but not as high as for the case without leakage,since the leakage rate eﬀectively adds to the regular errorrate thanks to the LRU. As an alternative to the use ofLRUs, post-selection based on leakage detection has beenadopted [45] as a near-term method to reduce the logicalerror rate. While leakage detection could also be usedto apply LRUs in a targeted way, post-selection is notscalable. By shortening the lifetime to l L avg = O (1) (cid:28) d ,the use of LRUs is instead a scalable approach.In its imperfect experimental implementation a LRUcan either introduce extra Pauli errors or mistakenly in-duce leakage on a non-leaked qubit. Furthermore, inthe context of the surface code the LRUs investigatedso far [40, 41, 46] introduce an overhead in terms ofhardware and QEC-cycle time. Speciﬁcally, these LRUsare variants of the swap-LRU, in which the qubits areswapped at the end of each QEC cycle, taking alterna-tively the role of data and ancilla qubits. In this wayevery qubit is measured every 2 QEC cycles. The coreof the swap-LRU is the fact that the measured qubitsare reset to the computational subspace after the mea-surement. This can be accomplished by a scheme whichunconditionally maps | (cid:105) and | (cid:105) (and possibly | (cid:105) [47])to | (cid:105) [52–54], or conditionally using real-time feed-back [28, 55]. Under the standard assumption that theSWAP gates swap the states of two qubits only if none a r X i v : . [ qu a n t - ph ] F e b of them is leaked (which does not necessarily hold in ex-periment [47]), l L avg is ideally shortened to 2 QEC cy-cles. On the downside, for the pipelined surface-codescheme in [56], the pipeline is disrupted as qubits cannotbe swapped until the measurement and reset operationsare completed, leading overall to an increase up to 50%of the QEC-cycle time depending on the reset time. Theextra CZ gates, needed to implement the SWAPs, cancause additional errors or leakage as the CZ is the majorsource of leakage in transmons [8–10, 12, 31–34]. More-over, in the surface code an extra row of qubits is neededto perform all the SWAPs [40], which is a non-negligibleoverhead in the near term. All these issues increase thephysical error rate by a considerable amount, thus requir-ing to increase the system size to compensate for that(assuming that the error rates are still below threshold).In this work we propose two separate LRUs for dataand ancilla qubits which use resources already avail-able on chip, namely the readout resonator for dataqubits (res-LRU) and a | (cid:105) ↔ | (cid:105) π pulse conditioned onthe measurement outcome for ancilla qubits ( π -LRU).In particular, the use of the res-LRU avoids the neces-sity to swap data and ancilla qubits to be able to resetthe data qubits. The res-LRU is a modiﬁcation of thetwo-drive scheme in [52–54] to a single drive to depleteonly the population in | (cid:105) but not | (cid:105) , making it a LRUrather than a reset scheme. We additionally show thatthis negligibly aﬀects the coherence within the computa-tional subspace in an experimentally accessible regime,allowing to use res-LRU unconditionally in the surfacecode in every QEC cycle. In the pipelined scheme [56]the res-LRU easily ﬁts within the time in which the dataqubits are idling while the ancilla qubits are ﬁnishing tobe measured. As the π -LRU can be executed as a shortpulse at the end of the measurement time with real-timefeedback, our LRU scheme overall does not require anyQEC-cycle time overhead. Using density-matrix simula-tions [45, 49, 57] of the distance-3 surface code (Surface-17), we show that the average leakage lifetime is reducedto almost 1 QEC cycle when res-LRU and π -LRU withrealistic performance are employed. Furthermore, com-pared to the case without LRUs, the logical error rateis reduced by up to 30%. The proposed res-LRU and π -LRU can be straightforwardly adapted to QEC-codeschemes other than [56] and the res-LRU is potentiallyapplicable to superconducting qubits with higher anhar-monicity than transmons. The demonstrated reductionserves as evidence of scalability for our LRU scheme, eventhough we cannot estimate a noise threshold as we havesimulated only one size of the surface code. To explorelarger codes it is necessary to use less computationallyexpensive simulations [23, 38, 41] that use a simpliﬁedversion of our error model at the cost of losing some in-formation contained in the density matrix. Furthermore,to optimize the noise threshold the LRUs can be suppliedwith a leakage-aware decoder [23, 38, 41, 58–60] that usesmeasurement information about leakage to better correctleakage-induced correlated errors. | i| i| i | i| i Ω √ √ g g ˜ g κ a Drive frequency ω d E n e r g y ω ∗ d | i| i b − − − ∆ ω ∗ d / π ( M H z ) c O (Ω /δ q ) O (Ω / ( δ q ) )exact0 125 250 375 500Ω / π (MHz)048 ˜ g ( ω ∗ d ) / π ( M H z ) d O (Ω) O (Ω / ( δ q ) )exact Figure 1: Concept of the readout-resonator LRU. a Thestate | (cid:105) (with the notation | transmon , resonator (cid:105) ) is con-nected to | (cid:105) by two main paths via either | (cid:105) or | (cid:105) , due tothe capacitive coupling g or the transmon-drive amplitude Ω,respectively. This generates an eﬀective coupling ˜ g which canbe used to swap | (cid:105) ↔ | (cid:105) . The latter quickly decays to | (cid:105) due to the typically high coupling κ of the readout resonatorto the transmission-line environment, overall removing leak-age from a leaked transmon. b In the rotating frame of thedrive, | (cid:105) and | (cid:105) show an avoided crossing as a functionof the drive frequency ω d , centered at ω ∗ d . The eﬀective cou-pling ˜ g ( ω ∗ d ) is equal to half the energy separation at that point. c, d ∆ ω ∗ d := ω ∗ d − (2 ω q + α − ω r ) and ˜ g ( ω ∗ d ) are evaluated eitherexactly by full numerical diagonalization of H in Eq. (1), orby approximate analytical formulas (see Section I A and Ap-pendix A) for the parameters in Table I. I. READOUT-RESONATOR LRU

The readout resonator has been used [52–54] to reset atransmon qubit to the | (cid:105) state, depleting the populationsin | (cid:105) and | (cid:105) . Targeting the | (cid:105) ↔ | (cid:105) transition, withthe notation | transmon , resonator (cid:105) , those populations areswapped onto the readout resonator, where they quicklydecay due to the strong coupling to the transmission-lineenvironment. Ref. [52] uses two drives simultaneouslywhile Refs. [53, 54] uses these drives in a three-step pro-cess. Here we adapt these techniques to use a single drivein a single step to deplete the population in | (cid:105) only.A LRU is deﬁned [37] as an operation such that 1) theincoming leakage population is reduced after the appli-cation of the LRU, 2) the induced leakage when appliedto a non-leaked state is ideally 0. We thus ensure belowthat not only leakage is reduced but also that the eﬀectthat the drive has on a non-leaked transmon is as smallas possible. A. Transmon-resonator Hamiltonian

We consider a transmon capacitively coupled to a res-onator and to a dedicated microwave drive line. The res-onator possibly employs a Purcell ﬁlter which we do notinclude explicitly. In a frame rotating at the transmon-drive frequency ω d for both the resonator and the trans-mon, the Hamiltonian is time-independent and is givenby H = H + H c + H d (1) H = δ r a † a + δ q b † b + α b † ) b (2) H c = g ( ab † + a † b ) (3) H d = Ω2 ( e iφ b + e − iφ b † ) (4)where a and b are the creation operators for the res-onator and the transmon, respectively; δ r = ω r − ω d and δ q = ω q − ω d with ω r and ω q the resonator and trans-mon frequencies, respectively; α < g corresponds to the capacitive coupling;Ω and φ are the transmon-drive amplitude and phase,respectively. The phase is not relevant for the results inthis work and we ﬁx it to φ = 0.We can qualitatively understand (see Fig. 1 a ) that H contains an eﬀective coupling ˜ g between | (cid:105) and | (cid:105) .If ω d matches the transition frequency between the“bare” | (cid:105) and | (cid:105) , these two states are degeneratein the rotating frame and they are connected by twopaths (at lowest order) via either | (cid:105) or | (cid:105) . If ∆ := ω q − ω r (cid:29) g, δ q (cid:29) Ω, then | (cid:105) and | (cid:105) are occu-pied only “virtually” and one gets purely an eﬀective | (cid:105) ↔ | (cid:105) coupling. Modulo a constant term, inthe 2D subspace S = span {| (cid:105) , | (cid:105)} we can write H in Eq. (1) as H | S ≡ − η ( ω d ) Z/ g ( ω d ) X for an ap-propriate function η (an approximation can be extractedfrom Eq. (A34)). As a function of ω d this Hamiltoniangives rise to a | (cid:105) ↔ | (cid:105) avoided crossing centered ata frequency ω ∗ d (see Fig. 1 b ) where η ( ω ∗ d ) = 0. The en-ergy separation at the center of the avoided crossing isthen 2˜ g ( ω ∗ d ).In order to quantitatively study the action of H , weunitarily transform it using a Schrieﬀer-Wolﬀ transfor-mation e S [61–64]. Let {| ij (cid:105) D } be the basis of eigen-vectors of H + H c (the transmon-resonator “dressed”basis). In the dispersive regime ( g (cid:28) ∆), with respect toa 1st-order Schrieﬀer-Wolﬀ transformation S in the per-turbation parameter g/ ∆, such that e − S | ml (cid:105) ≈ | ml (cid:105) D ,we get (see Appendix A) H D := e S He − S ≈ e S He − S (5)= H D + H Dd + H Dd (6) with H D = (cid:16) δ r − ∞ (cid:88) m =0 g ∆ − ∆ m ∆ m − | m (cid:105) (cid:104) m | (cid:17) a † a + ∞ (cid:88) m =1 (cid:16) mδ q + α m ( m −

1) + g m ∆ m − (cid:17) | m (cid:105) (cid:104) m | (7) H Dd = Ω e iφ b + h.c. (8) H Dd = Ω e iφ (cid:32) a ∞ (cid:88) m =0 g ∆ − ∆ m ∆ m − | m (cid:105) (cid:104) m | + a † ∞ (cid:88) m =0 gα √ m + 1 √ m + 2∆ m ∆ m +1 | m (cid:105) (cid:104) m + 2 | (cid:33) + h.c. , (9)where ∆ m := ∆ + αm and {| m (cid:105)} are transmon states. H D is diagonal and contains the dispersive shifts, H Dd is the transmon drive now in the unitarily transformedframe, H Dd contains an indirect resonator drive and cou-plings of the kind a † | m (cid:105) (cid:104) m + 2 | . In particular, for m = 0in Eq. (9) we get a lowest order approximation of ˜ g :˜ g ≈ Ω gα √ α ) . (10)Notice that at this order there is no dependence on ω d .Furthermore, ˜ g would vanish for α = 0, since the twopaths in Fig. 1 a fully destructively interfere in that case.Since α is low for transmons, one can expect that Ω needsto be relatively large for ˜ g to be substantial.For the drive to be most eﬀective it is importantthat ω d matches ω ∗ d . If g = 0 = Ω, there is no avoidedcrossing but | (cid:105) and | (cid:105) simply cross at ω ∗ d, ≡ ω q + α − ω r as can be straightforwardly computed from H in Eq. (2). This value is shifted due to the capacitivecoupling (as can be seen from Eq. (7)), as well as dueto the possibly strong drive. For g (cid:54) = 0 , Ω (cid:54) = 0 onecan either compute ω ∗ d by full numerical diagonalizationof H and ﬁnd the avoided crossing as a function of ω d , orone can ﬁnd an (approximate) analytical expression. Forthe latter we use another Schrieﬀer-Wolﬀ transformation(rather than the resolvent method in [53], which doesnot give the full Hamiltonian) to account for the eﬀectof the transmon drive H Dd and to compute ω ∗ d up to or-der Ω / ( δ q ) , see Appendix A. We also use this transfor-mation to compute ˜ g up to order Ω / ( δ q ) . Figures 1 c, d compare the analytical approach with the exact numeri-cal results for ∆ ω ∗ d = ω ∗ d − ω ∗ d, and ˜ g ( ω ∗ d ), respectively,given the parameters in Table I. We consider 6 energylevels for the transmon and 3 for the resonator as we seethat the exact curves converge for such choice. In c wesee that the two approximations are both pretty good,while in d we see that Eq. (10) starts to deviate signiﬁ-cantly from the exact value at high Ω, whereas the higherorder approximation stays closer. We expect that the re-maining gap would be ﬁlled by considering higher ordersin both Schrieﬀer-Wolﬀ transformations we have used. Parameter

Transmon Readout resonatorFrequency ω/ π . . α/ π −

300 MHz n.a.Coupling g/ π

135 MHzAvg. photon number ¯ n n.a. 0.005Relaxation time T µ s 100 ns( κ/ π = 10 MHz)Dephasing time T µ s 200 ns(ﬂux noise)Table I: Parameters used both in the analysis and Lind-blad simulations of the readout-resonator LRU, similar to theexperimental ones in [27]. The transmon parameters corre-spond to the target parameters of a high-frequency data qubitin Section II. B. Performance of the readout-resonator LRU

Given the theoretical understanding of the transmon-resonator system, we devise a pulse to minimize the pop-ulation in | (cid:105) on a leaked transmon. We consider thepulse shapeΩ( t ) = Ω sin ( π t t rise ) for 0 ≤ t ≤ t rise Ω for t rise ≤ t ≤ t p − t rise Ω sin ( π t p − t t rise ) for t p − t rise ≤ t ≤ t p (11)similarly to [53], where t p is the total pulse duration, ata ﬁxed frequency ω d ( t ) = ω d . Hence, there are four pa-rameters to optimize over, i.e. Ω , ω d , t p and t rise . Weﬁx t rise = 30 ns since we observe that this stronglysuppresses non-adiabatic transitions out of the mani-fold of interest: for t rise (cid:46)

10 ns there appear ripplesin e.g. the | (cid:105) or | (cid:105) populations when the drive isturned on and oﬀ, leading to a reduction in performance.We expect that an improved pulse shape can shorten t rise .However, we do not explore this given the long maxi-mum t p allowed in our surface-code scheme ( t p ≤ T slot =440 ns, see Section II A).We use Lindblad simulations of the transmon-resonator system to optimize over Ω , ω d and t p . TheLindblad equation is given by˙ ρ = − i (cid:2) H D , ρ (cid:3) + (cid:88) j (cid:0) K j ρK † j − { K † j K j , ρ } (cid:1) (12)with { K j } the quantum jump operators. We express andsolve this equation in the unitarily transformed frame. Inthe numerics we use the exact H D (of which Eq. (6) isa 1st-order approximation in g/ ∆) by ﬁnding the basisthat exactly diagonalizes H + H c . The Hamiltonian pa-rameters are the same as in Section I A and are reportedin Table I, including the noise parameters. In particular,while we neglect the transmon thermal population, weinclude it for the resonator since it determines the leak-age that the pulse induces when the transmon was notleaked, as we show below. The resonator thermal state / π (MHz)5 . . . . ω d / π ( G H z ) ρ in = | ih | ⊗ σ th a t (ns)0 . . . . . P o pu l a t i o n t p b | i| i| i| i / π (MHz) ρ in = | ih | ⊗ σ th c t (ns)0 . . . t p d t rise t p Ω ( t ) . . . p | i .

000 0 .

005 0 . p | i Figure 2: Lindblad simulations of the transmon-resonatorsystem for the readout-resonator LRU. In a, b the initialstate is | (cid:105) (cid:104) | ⊗ σ th , while in c, d it is | (cid:105) (cid:104) | ⊗ σ th , where σ th is the resonator thermal state. a, c Transmon leakage pop-ulation p | (cid:105) = (cid:104) | Tr r ( ρ ( T slot )) | (cid:105) at the end of the time slotof T slot = 440 ns. For each choice of (Ω , ω d ) we optimizethe total pulse duration t p ≤ T slot to minimize p | (cid:105) given theinitial state | (cid:105) (cid:104) | ⊗ σ th , for ﬁxed t rise = 30 ns. The whitestar indicates the optimal point (Ω / π ≈

204 MHz, ω d / π ≈ .

247 GHz, t p = 108 . p opt . | (cid:105) ≈ .

7% in a .The induced leakage in c is p | (cid:105) ≈ .

48% at the optimal point.The purple line corresponds to the higher order estimate of theoptimal drive frequency ω ∗ d as a function of Ω (see Fig. 1 c ).The heatmaps are sampled using the adaptive package [65]. b, d Time evolution of the populations in a few selected statesfor the optimal point. The vertical dashed line indicates theused t p . The inset in d shows a schematic of the pulse Ω( t ). is given by [66] σ th ≈ (cid:16) − ¯ n n (cid:17) | (cid:105) (cid:104) | + ¯ n n | (cid:105) (cid:104) | (13)for low average photon number ¯ n . We consider dressedrelaxation and dephasing, as given below, assuming thatthis is a good model in the dispersive regime. In the uni-tarily rotated frame, the employed jump operators { K j } are explicitly given by1 (cid:112) T r a = (cid:114) κ π a, (cid:114) ¯ n n (cid:114) κ π a † , (cid:115) T rφ a † a, (14)1 (cid:112) T q b, (cid:115) T qφ b † b, (15)where T φ = (1 /T − / T ) − . Note that e.g. for a , go-ing back to the original frame it holds that e − S ae S = (cid:80) ∞ l =0 √ l + 1 | l (cid:105) D (cid:104) l + 1 | D = a D by deﬁnition of e S , cor-responding indeed to relaxation in the dressed basis. Byconsidering dressed relaxation and dephasing, the eﬀec-tive relaxation time of the transmon is not shortened bythe fact that it is coupled to a lossy resonator (Purcelleﬀect). We thus mimic the use of a Purcell ﬁlter butwithout including it in the simulations since that wouldincrease the Hilbert-space dimension in a computation-ally expensive way.For each choice of (Ω , ω d ) we optimize t p such that,given the initial state | (cid:105) (cid:104) | ⊗ σ th , the leakage popula-tion p | (cid:105) = (cid:104) | Tr r ( ρ ( T slot )) | (cid:105) at the end of the avail-able time slot is minimized (see Fig. 2 a ). Since thisoptimization has many local minima as a function of t p ,corresponding to the minima of the | (cid:105) ↔ | (cid:105) oscil-lations induced by the drive, we choose to target theﬁrst minimum as in [53, 54]. This minimum is expectedto occur around t p ∼ t rise + π/ g , so we use appro-priate bounds for the optimization (using the boundedBrent method in scipy ). While using a longer t p (possi-bly even greater than the allotted T slot ) would eventuallylead to an even lower leakage population [52], it is notnecessarily desirable as a longer t p may mean that thedisturbance to a non-leaked transmon is greater as well(see Appendix B 2). In Fig. 2 a one can observe a bandwith low p | (cid:105) as desired. This band occurs at drive fre-quencies slightly above the ones that one would expectto be optimal based on Section I A. We attribute thisto the fact that a signiﬁcant share of the time is takenby the rise and fall of the pulse, where Ω( t ) is smallerthan the maximum. In the following we refer to thepoint marked by a star in Fig. 2 as the optimal point(Ω / π ≈

204 MHz, ω d / π ≈ .

247 GHz, t p = 108 . p opt . | (cid:105) ≈ .

7% and corresponds tothe minimum of p | (cid:105) in Fig. 2 a for Ω / π (cid:46)

200 MHz(higher Ωs give practically no advantage at the price ofstronger experimental requirements on the drive). Weattribute the fact that this minimum does not reach 0 totransmon decoherence (resonator pure dephasing wouldcontribute as well but here T rφ = ∞ ). We note thatin Fig. 2 a we ﬁnd good p | (cid:105) (cid:46)

5% up to Ω / π (cid:38)

70 MHz,which could be used to further ease the requirements onthe drive (see Section II).The time evolution for a few selected states is shownin Fig. 2 b for the optimal point, given the initialstate | (cid:105) (cid:104) | ⊗ σ th . The ﬁrst few ns make | (cid:105) rotateinto | (cid:105) , while the latter decays relatively fast to | (cid:105) dueto the large relaxation rate κ of the readout resonator.After T slot = 440 ns the remaining | (cid:105) population isslightly larger than in the thermal state. In the surfacecode we do not expect that this would lead to heating ofthe resonator since it is unlikely that a transmon leaksfor two or more QEC cycles in a row. As a consequence,the resonator has further time to thermalize.We now evaluate the eﬀect of the pulse on a non-leaked transmon (see Fig. 2 c, d ). There should ideallybe no eﬀect, except for an acquired single-qubit phasewhich can easily be determined and corrected by either a real or virtual Z rotation. First, if the transmon isin | (cid:105) and there is some thermal population in the res-onator, part of the state is supported on | (cid:105) , whichrotates into | (cid:105) in the same way as the opposite pro-cess by unitarity. Figure 2 c shows that indeed the in-duced leakage is greater where p | (cid:105) is lower in Fig. 2 a .However, due to the low ¯ n = 0 . p | (cid:105) ≈ .

48% in Fig. 2 c at theoptimal point, which is comparable to state-of-the-artCZ leakage rates, see Section II B) and can be madeeven lower by engineering colder resonators. If the initialstate is | (cid:105) (cid:104) | ⊗ σ th there is basically no induced leak-age as the drive is oﬀ-resonant with transitions from thisstate. Second, the pulse might aﬀect the coherence timesof the transmon by driving transitions within or outsidethe computational subspace (and back), as the small butnon-negligible transitory population in | (cid:105) in Fig. 2 b, d seems to suggest. However, we ﬁnd that both the eﬀec-tive T q and T q are barely aﬀected as a function of Ω(see Appendix B 1). This is because stronger pulses causea somewhat stronger disturbance to the qubit, but theyare shorter so that in total the eﬀect is small. II. SURFACE CODE WITH LRUSA. Layout and operation scheduling

We study the distance-3 rotated surface code(see Fig. 3 a ), nicknamed Surface-17, in the presence ofleakage and LRUs. We follow the frequency and pipelinedscheme in [56], in which the 9 data qubits are subdi-vided into 3 high- and 6 low-frequency ones. The 4 X and the 4 Z ancilla qubits have an intermediate fre-quency. We consider the ﬂux-pulse implementation ofthe CZs [9, 10, 31–33] for tunable-frequency transmons,in which the transmon with the greater frequency is low-ered towards the other one with a ﬂux pulse. With thistechnique ﬂuxed transmons are prone to leakage. Thismeans that the high-frequency data qubits and all theancilla qubits can leak. As shown in [45], leakage canlast for many QEC cycles and be quite detrimental to thelogical performance of the code. Here we address theseissues with the res-LRU for high-frequency data qubitsand with the π -LRU for ancilla qubits, as described be-low. If due to a diﬀerent implementation of the CZs alsothe low-frequency data qubits can leak, one can applythe res-LRU to them as well but we do not explore thishere.The circuit executed for each QEC cycle is shownin Fig. 3 b . The X -type and Z -type parity-check unitsare implemented in an interleaved way, with the CZs forone unit being applied while the other ancilla-qubit typeis measured. The duration of each operation is summa-rized in Appendix C 1, with a total QEC-cycle durationof 800 ns. The data qubits are idling for a considerableamount of time, namely T slot = 440 ns, while the an-cilla qubits are measured. We choose this time slot as D D D D D D D D D X X X X Z Z Z Z leakage prone a highmidlow X Z D D D D X Z H HH HH HH HH HH HH HH H π - LRU π - LRU π - LRU π - LRU

HHHHHH b res-LRUres-LRU Figure 3: a Schematic overview of the Surface-17 lay-out [45, 56]. Pink (resp. red) circles with D labels representlow- (high-) frequency data qubits, while blue (resp. green)circles with X ( Z ) labels represent ancilla qubits, which havean intermediate frequency. Ancilla qubits and high-frequencydata qubits are prone to leakage during the CZ gates. b Thequantum circuit for a single QEC cycle employed in simula-tion, for the unit-cell scheduling deﬁned in [56], in which weinsert the LRUs. The res-LRUs (orange) are applied uncon-ditionally on the high-frequency data qubits after the CZs,while the π -LRUs (teal) are applied on the ancilla qubits de-pending on the measurement outcome. Gray elements corre-spond to operations belonging to the previous or the followingQEC cycle. The duration of each operation is given in Ap-pendix C 1. The arrow at the bottom indicates the repetitionof QEC cycles. the ideal place to apply the res-LRUs, introduced in Sec-tion I, to the high-frequency data qubits. Notice thatthe optimal pulse selected in Section I B, which was sim-ulated for the target parameters of the high-frequencydata qubits, takes about 100 ns. These LRUs can thusbe easily applied even if the measurement time becomesshorter in future devices (up to ∼

250 ns).For the ancilla qubits there is no available time slotto apply the res-LRU. A possibility would be to makethe QEC-cycle time longer by inserting these LRUs when the measurement is completed. However, this approachwould lower the logical error rate of the code by a non-negligible amount. On the other hand, ancilla qubits aremeasured and the (analog) measurement outcome con-tains information about leakage [45]. We choose to usea diﬀerent type of LRU altogether which uses this in-formation. Speciﬁcally, we consider a | (cid:105) ↔ | (cid:105) π pulse,conditioned on the measurement outcome reporting a | (cid:105) .Below we discuss further details of the implementation ofthis π -LRU. B. Implementation of the LRUs in thedensity-matrix simulations

We use density-matrix simulations [49] using the open-source package quantumsim [57] to study Surface-17 withres-LRUs and π -LRUs. We include relaxation and de-phasing ( T and T ), as well as ﬂux-dependent T andleakage rate L during the CZs, following the same er-ror model as in [45]. L is deﬁned as the average leak-age from the computational to the leakage subspace [67].The state of the art is L ≈ .

1% [9, 10], althoughthe actual L is expected to be higher when operatinga multi-transmon processor [68], thus here we considerup to L = 0 . | (cid:105) to | (cid:105) in subsequent CZ gates [45] as we expectit to be negligible when LRUs make | (cid:105) short-lived.

1. res-LRU for data qubits

In the simulations, leakage-prone transmons are mod-eled as 3-level systems and non-leakage-prone ones as 2-level systems, leading to an already computationally ex-pensive size for the density matrix. As a consequence, wedo not include the readout resonator explicitly in thesesimulations. The resonator is initially in the ground stateand is returned to it at the end of the time slot, ap-proximately. We can thus trace the resonator out andmodel the res-LRU on the transmon qubit as an incoher-ent | (cid:105) (cid:55)→ | (cid:105) relaxation (see Section II B for details). Fur-thermore, in Section I B we have observed that the res-LRU can also cause a non-leaked transmon to partiallyleak, so we include that as an incoherent | (cid:105) (cid:55)→ | (cid:105) exci-tation.Calling p i | j (cid:105) , p f | j (cid:105) the populations before and after theres-LRU, we deﬁne the leakage-reduction rate 0 ≤ R ≤ R = 1 − p f | (cid:105) conditioned on an initially fully leakedtransmon, i.e. for p i | (cid:105) = 1. Furthermore, we deﬁne theaverage res-LRU leakage rate L LRU1 as the average of theinduced leakage starting from either | (cid:105) or | (cid:105) (consis-tently with the deﬁnition for CZ [67]), with probabil-ity 1 / | (cid:105) (see Section I B), this means that p f | (cid:105) ≈ p i | (cid:105) = 1 and that p f | (cid:105) ≈ L LRU1 for p i | (cid:105) = 1 (ne-glecting relaxation eﬀects as the used T = 30 µ s is rel-atively long). Combining these two deﬁnitions one getsthe expression p f | (cid:105) ≈ (1 − R ) p i | (cid:105) + 2 L LRU1 p i | (cid:105) (16)for an arbitrary incoming state. Notice that, given thesedeﬁnitions, Fig. 2 a, c respectively show a heatmapof 1 − R and 2 L LRU1 for the considered transmon-resonator parameters. In particular, the optimal pointachieves R ≈ .

3% and L LRU1 ≈ . T slot , namely R T = 1 − e − T slot / ( T / = 2 . π -LRU for ancilla qubits The dispersive readout of a transmon qubit is in gen-eral performed by sending a pulse to the readout res-onator, integrating the reﬂected signal to obtain a pointin the IQ plane and depleting the photons in the res-onator (either passively by relaxation or actively withanother pulse) [16–18]. The measured point is comparedto one or more thresholds to declare the measurementoutcome. These thresholds are determined as to op-timally separate the distributions for the diﬀerent out-comes, which have a Gaussian(-like) form. Here we as-sume that the distribution for | (cid:105) is suﬃciently separatedfrom | (cid:105) and | (cid:105) [16]. This is generally expected to bepossible thanks to the diﬀerent dispersive shift. Thenone uses three thresholds in the IQ plane to distinguishbetween | (cid:105) , | (cid:105) and | (cid:105) (or two if | (cid:105) is well-separatedfrom e.g. | (cid:105) ). We also assume that an outcome canbe declared during photon depletion, thus enabling real-time conditional feedback. This is challenging to per-form in 200-300 ns in experiment due to the classical-postprocessing requirements, but it has been previouslyachieved [28, 55]. We can then apply the π -LRU right atthe end of the depletion time. The | (cid:105) ↔ | (cid:105) π pulse is ex-pected to be implementable as a simple pulse in the sameway and time as single-qubit gates (20 ns) and with com-parable, coherence-limited ﬁdelity. If conditional feed-back is not possible in the allotted time, one can eitherincrease the QEC cycle duration or postpone the condi-tional gate to the next QEC cycle, at the cost of someperformance loss. We do not explore such variants in thiswork. Readout-declaration errors are expected to aﬀect theperformance of the π -LRU. On one hand, an incorrectdeclaration of | (cid:105) as a | (cid:105) makes the π pulse induce leak-age. On the other hand, declaring a | (cid:105) as a | (cid:105) would leadto leakage not being corrected and lasting for at least oneextra QEC cycle. We deﬁne the readout matrix M withentries M ij =: p M ( i | j ) being the probability that the ac-tual state | j (cid:105) resulting from the projective measurementis declared as an | i (cid:105) . In the simulations we use M = p M (1 |

1) 1 − p M (1 | − p M (2 | p M (2 | . (17)In particular, this means that we do not consider decla-ration errors within the computational subspace. Whilethat would change the value of the logical error rate, itis not relevant for evaluating the performance of the π -LRU. Furthermore, we assume that a | (cid:105) cannot be mis-taken as a | (cid:105) since their readout signals are often muchmore separated than the signals of | (cid:105) and | (cid:105) . We notethat this is the worst-case scenario for the π -LRU since ifa | (cid:105) (rather than a | (cid:105) ) could be mistakenly declared asa | (cid:105) , then a | (cid:105) ↔ | (cid:105) π pulse does not induce leakage. C. Average leakage lifetime and steady state

Once a qubit leaks, it tends to remain leaked for asigniﬁcant amount of time, up to 10-15 QEC cycles onaverage [45]. Starting from an initial state with no leak-age, the probability that a qubit is in the leaked statetends towards a steady state within a few QEC cycles. Itwas shown in [45] that this evolution is well captured bya classical Markov process with leakage (resp. seepage)rate Γ

C→L (Γ L→C ) per QEC cycle, where C (resp. L )is the computational (leakage) subspace. In our errormodel, without accounting for LRUs, these rates are ap-proximately given byΓ C→L ≈ N ﬂux L , (18)Γ L→C ≈ N ﬂux L + (1 − e − t c T / ) , (19)where N ﬂux is in how many CZ gates the transmonis ﬂuxed during a QEC cycle, t c is the duration of aQEC cycle and L (resp. L ) is the average leakage (seep-age) probability of a CZ [67]. Thus the two native mech-anisms that generate seepage are the CZs themselves andrelaxation.The major eﬀect of a LRU is to eﬀectively in-crease Γ L→C in Eq. (19) by introducing an extra seepagemechanism. Hence we expect that Γ

LRU

L→C ∼ Γ L→C + R for data qubits and Γ LRU

L→C ∼ Γ L→C + p M (2 |

2) for ancillaqubits, preventing leakage from accumulating and last-ing long for large R or p M (2 | l L avg is the average dura-tion of leakage and for a Markov process it is calculatedas l L avg = ∞ (cid:88) n =1 n P (stay in L for n QEC cycles) (20)= ∞ (cid:88) n =1 n (1 − Γ L→C ) n − Γ L→C = 1Γ

L→C , (21)thus assuming that the qubit starts in L . The evolutionof the leakage probability ¯ p | (cid:105) ( n ), averaged over surface-code runs, as a function of the QEC-cycle number n iswell-approximated by [45]¯ p | (cid:105) ( n ) = Γ C→L Γ C→L + Γ

L→C (1 − e − (Γ C→L +Γ L→C ) n ) . (22)The steady state is the long-time limit and is given by¯ p ss | (cid:105) = lim n →∞ ¯ p | (cid:105) ( n ) = Γ C→L Γ C→L + Γ

L→C . (23)For ancilla qubits ¯ p | (cid:105) ( n ) can be computed directly fromthe “true” measurement outcomes (i.e. without declara-tion errors on top). For data qubits it can be computedfrom the density matrix. Speciﬁcally, for data qubits weevaluate ¯ p | (cid:105) ( n ) right after the CZs.Figure 4 shows l L avg and ¯ p ss | (cid:105) extracted from the Surface-17 simulations by ﬁtting ¯ p | (cid:105) ( n ) to Eq. (22) for eachqubit. We can indeed observe that these quantities dropsubstantially for both data and ancilla qubits. The de-cays follows an inverse proportionality as e.g. for dataqubits l L avg = 1Γ LRU

L→C ∼ L→C + R ∼ R (24)¯ p ss | (cid:105) = Γ LRU

C→L Γ LRU

C→L + Γ

LRU

L→C ∼ Γ LRU

C→L Γ LRU

L→C ∼ Γ LRU

C→L R (25)for suﬃciently large R and small Γ LRU

C→L . For ancilla qubitswe expect, similarly, a 1 /p M (2 |

2) dependence. The life-time drops from values (cid:38)

10 to ≈

1, which is the min-imum value it can achieve (some points drop below 1within error bars as it is diﬃcult for the ﬁt to estimatesuch a short lifetime). As of course the LRUs do not pre-vent leakage from occurring during the CZs in the ﬁrstplace, one cannot expect the steady state to reach 0 evenfor a perfect LRU ( R = 1), but rather ¯ p ss | (cid:105) ∼ Γ LRU

C→L ≈ N ﬂux L (+ L LRU1 if the LRU can mistakenly induce leak-age). Figures 4 b, d show that this is indeed the case.Figure 4 also demonstrates that both l L avg and ¯ p ss | (cid:105) getclose to their minimum values already for R, p M (2 | (cid:38) π -LRU may notnecessarily need to be perfect to provide a good logicalperformance in Surface-17. This means that one coulduse e.g. a weaker pulse to implement the res-LRU or thatthe readout of | (cid:105) may not need to be particularly opti-mized in practice. l L a v g ( Q E C c y c l e s ) a D D D R (%)0481216 ¯ p ss | i ( % ) b L L c X X X X Z Z Z Z p M (2 |

2) (%) d L L Figure 4: Average leakage lifetime l L avg ( a, c ) and leakagesteady state ¯ p ss | (cid:105) ( b, d ) as a function of the leakage-reductionrate R for data qubits ( a, b ) and as a function of the read-out probability p M (2 |

2) for ancilla qubits ( c, d ). Here we ﬁxthe CZ leakage rate to L = 0 . b, d showthat ¯ p ss | (cid:105) tends to ≈ N ﬂux L ( N ﬂux = 4 for D , 3 for D , D ,1 for Z , Z and 2 for the remaining ancilla qubits). The verti-cal dashed lines correspond to the values used in Section II D.These results are extracted from 2 × runs of 20 QEC cy-cles each per choice of parameters. Error bars are estimatedusing bootstrapping and are mostly smaller than the symbolsize. D. Logical performance

In the simulations the logical qubit is initialized in | (cid:105) L and the logical ﬁdelity F L ( n ) is computed at the end ofeach QEC cycle as the probability that the decoder cor-rectly determines the current logical state. Then thelogical error rate ε L per QEC cycle can be extractedby ﬁtting F L ( n ) = [1 + (1 − ε L ) n − n ] /

2, where n isa ﬁtting parameter (usually close to 0) [49]. We evalu-ate ε L for the “upper bound” decoder (UB) which usesthe complete density-matrix information to infer a logi-cal error, and for the minimum-weight perfect-matchingdecoder (MWPM). The weights are extracted with theadaptive algorithm in [69] from a simulation withoutleakage and an otherwise identical error model. For de-coding we assume that a | (cid:105) is declared as a | (cid:105) since thestandard MWPM does not use information about leak-age.By mapping a leaked qubit back to the computationalsubspace, a LRU does not fully remove a leakage errorbut can at most convert it into a regular (Pauli) error.Hence, it is not to be expected that ε L in the pres-ence of leakage can be restored to the value at L = 0.We consider realistic parameters for the LRUs. Specif-ically, we use R = 95%, L LRU1 = 0 . p M (2 |

2) =90% and p M (1 |

1) = 99 . .

000 0 .

125 0 .

250 0 .

375 0 . L per CZ (%)1 . . . . . . . L og i c a l e rr o rr a t e (cid:15) L ( % ) R = 95% L LRU1 = 0 . p M (2 |

2) = 90% p M (1 |

1) = 99 . π -LRUUB, both MWPM, no LRUsMWPM, res-LRUMWPM, π -LRUMWPM, both Figure 5: Logical error rate ε L per QEC cycle forthe upper bound (UB, red) and minimum-weight perfect-matching (MWPM, green) decoders versus the CZ leakagerate L , in the cases with: no LRUs, only res-LRU, only π -LRU and both LRUs (the point without leakage at L = 0is always without LRUs as well). These results are extractedfrom 2 × runs of 20 QEC cycles each per choice of param-eters. Error bars are estimated using bootstrapping and aresmaller than the symbol size. tem, while the last two are close to be achievable in ex-periment [14, 52]. In particular, while the optimal pointhas R = 99 . R = 95% here.Notice that p M (1 |

1) = 99 .

5% is quite high. We arguethat the state of the art can be squeezed as the thresh-old to distinguish between | (cid:105) and | (cid:105) in the IQ planecould be moved towards | (cid:105) , rather than placing it in themiddle as is common practice. In this way one wouldslightly reduce p M (2 |

2) in favor of p M (1 |

1) if p M (1 |

1) isnot high enough. A broader study of the logical perfor-mance as a function of the LRU parameters can be foundin Appendix C 2.Figure 5 shows the reduction in ε L as a function of theCZ leakage rate L when LRUs with the given parametersare employed. Using only the res-LRU or the π -LRU low-ers ε MWPML by basically the same amount, while ε UBL islower for the π -LRU than for the res-LRU. We attributethis to the fact that UB directly uses the information inthe density matrix, while MWPM relies on the measuredsyndrome, thus being more susceptible to ancilla-qubitleakage. When both LRUs are used, we see that ε L isreduced by an amount which is close to the sum of thereductions when only one kind of LRU is used. As ex-pected, ε L is not restored to the value at L = 0, but thereduction is overall signiﬁcant and can reach up to 30%for both MWPM and UB compared to the case withoutLRUs. III. DISCUSSION

In this work we have introduced a leakage-reductionscheme using res-LRUs and π -LRUs which does not re-quire any additional hardware or a longer QEC cycle.Furthermore, while the scheme in [47] is applicable onlyto ancilla qubits, our combination of res-LRU for dataqubits and π -LRU for ancilla qubits enables to signiﬁ-cantly reduce leakage in the whole transmon processor.We have shown with detailed simulations using realis-tic parameters that the reset scheme in [52–54] can beadapted to be a LRU without signiﬁcantly aﬀecting thecoherence within the computational subspace, allowingto unconditionally apply the res-LRU in the surface code.The use of the res-LRU for data qubits, as well as theuse of the π -LRU for ancilla qubits, leads to a substan-tial reduction of the average leakage lifetime and leak-age steady state, preventing leakage from lasting morethan ≈ .

7, 6 . . . . | (cid:105) ↔ | (cid:105) transition around 5.1-5 . ≈ .

25 GHz), which can lead to an indirectancilla-qubit drive mediated by the bus resonator, al-beit weaker. The diﬃculty of precise frequency targetingin fabrication can further lead to undesired frequencycollisions. These issues can be alleviated by choosingslightly diﬀerent transmon/resonator frequencies and an-harmonicities to make the drive more oﬀ-resonant withthat transition (combined with better frequency target-ing [70]), or they can be mitigated altogether by usingtunable couplers [1, 11, 14]. The res-LRU is compatiblewith tunable-coupler schemes and their possibly diﬀerentoperation scheduling than in [56], as well as potentiallyapplicable to superconducting qubits which use a res-onator for dispersive readout other than the transmon.Furthermore, if the low-frequency data qubits can leak ina diﬀerent implementation of the CZ, the res-LRU canbe applied to them in the same time slot as the high-frequency ones.0The demonstrated reduction in the average leakage life-time and in the logical error rate is expected to lead toa higher noise threshold for the surface code in the pres-ence of leakage, compared to the case without LRUs. Tostudy such a noise threshold it is necessary to implementsimulations of large code sizes which use a simpliﬁed er-ror model, such as a stochastic error model for leakageand Pauli errors [23, 38, 41]. We expect that the demon-strated MWPM logical error rate can be further loweredby the use of decoders [23, 38, 41, 58–60] that use in-formation about leakage extracted directly or indirectly(e.g. with hidden Markov models [45]) from the measure-ment outcomes.

Acknowledgments

We thank L. DiCarlo for his comments on themanuscript. F.B., B.M.V. and B.M.T. are supportedby ERC grant EQEC No. 682726. Most simulationswere performed with computing resources granted byRWTH Aachen University under projects rwth0566 andrwth0669. The data and code are available upon requestfrom the corresponding author ([email protected]). [1] F. Arute, K. Arya, R. Babbush, D. Bacon, J. C.Bardin, R. Barends, R. Biswas, S. Boixo, F. G. S. L.Brandao, D. A. Buell, B. Burkett, Y. Chen, Z. Chen,B. Chiaro, R. Collins, W. Courtney, A. Dunsworth,E. Farhi, B. Foxen, A. Fowler, C. Gidney, M. Giustina,R. Graﬀ, K. Guerin, S. Habegger, M. P. Harrigan,M. J. Hartmann, A. Ho, M. Hoﬀmann, T. Huang,T. S. Humble, S. V. Isakov, E. Jeﬀrey, Z. Jiang,D. Kafri, K. Kechedzhi, J. Kelly, P. V. Klimov, S. Knysh,A. Korotkov, F. Kostritsa, D. Landhuis, M. Lind-mark, E. Lucero, D. Lyakh, S. Mandr`a, J. R. Mc-Clean, M. McEwen, A. Megrant, X. Mi, K. Michielsen,M. Mohseni, J. Mutus, O. Naaman, M. Neeley, C. Neill,M. Y. Niu, E. Ostby, A. Petukhov, J. C. Platt, C. Quin-tana, E. G. Rieﬀel, P. Roushan, N. C. Rubin, D. Sank,K. J. Satzinger, V. Smelyanskiy, K. J. Sung, M. D. Tre-vithick, A. Vainsencher, B. Villalonga, T. White, Z. J.Yao, P. Yeh, A. Zalcman, H. Neven, and J. M. Martinis,Nature , 505–510 (2019).[2] P. Jurcevic, A. Javadi-Abhari, L. S. Bishop, I. Lauer,D. F. Bogorin, M. Brink, L. Capelluto, O. G¨unl¨uk,T. Itoko, N. Kanazawa, A. Kandala, G. A. Keefe, K. Kr-sulich, W. Landers, E. P. Lewandowski, D. T. McClure,G. Nannicini, A. Narasgond, H. M. Nayfeh, E. Pritchett,M. B. Rothwell, S. Srinivasan, N. Sundaresan, C. Wang,K. X. Wei, C. J. Wood, J.-B. Yau, E. J. Zhang, O. E.Dial, J. M. Chow, and J. M. Gambetta, (2020),arXiv:2008.08571 [quant-ph] .[3] L. Egan, D. M. Debroy, C. Noel, A. Risinger, D. Zhu,D. Biswas, M. Newman, M. Li, K. R. Brown, M. Cetina,and C. Monroe, (2020), arXiv:2009.11482 [quant-ph] .[4] M. A. Rol, C. C. Bultink, T. E. O’Brien, S. R. de Jong,L. S. Theis, X. Fu, F. Luthi, R. F. L. Vermeulen, J. C.de Sterke, A. Bruno, D. Deurloo, R. N. Schouten, F. K.Wilhelm, and L. DiCarlo, Phys. Rev. Applied , 041001(2017).[5] Z. Chen, J. Kelly, C. Quintana, R. Barends, B. Camp-bell, Y. Chen, B. Chiaro, A. Dunsworth, A. G. Fowler,E. Lucero, E. Jeﬀrey, A. Megrant, J. Mutus, M. Nee-ley, C. Neill, P. J. J. O’Malley, P. Roushan, D. Sank,A. Vainsencher, J. Wenner, T. C. White, A. N. Korotkov,and J. M. Martinis, Phys. Rev. Lett. , 020501 (2016).[6] R. Barends, J. Kelly, A. Megrant, A. Veitia,D. Sank, E. Jeﬀrey, T. C. White, J. Mutus, A. G.Fowler, B. Campbell, Y. Chen, Z. Chen, B. Chiaro, A. Dunsworth, C. Neill, P. O’Malley, P. Roushan,A. Vainsencher, J. Wenner, A. N. Korotkov, A. N. Cle-land, and J. M. Martinis, Nature , 500 (2014).[7] S. Sheldon, E. Magesan, J. M. Chow, and J. M. Gam-betta, Physical Review A , 060302 (2016).[8] S. S. Hong, A. T. Papageorge, P. Sivarajah, G. Crossman,N. Didier, A. M. Polloreno, E. A. Sete, S. W. Turkowski,M. P. da Silva, and B. R. Johnson, Physical Review A (2020).[9] M. A. Rol, F. Battistel, F. K. Malinowski, C. C. Bultink,B. M. Tarasinski, R. Vollmer, N. Haider, N. Muthusubra-manian, A. Bruno, B. M. Terhal, and L. DiCarlo, Phys.Rev. Lett. , 120502 (2019).[10] V. Negˆırneac, H. Ali, N. Muthusubramanian, F. Bat-tistel, R. Sagastizabal, M. S. Moreira, J. F. Marques,W. Vlothuizen, M. Beekman, N. Haider, A. Bruno, andL. DiCarlo, (2020), arXiv:2008.07411 [quant-ph] .[11] F. Yan, P. Krantz, Y. Sung, M. Kjaergaard, D. L. Camp-bell, T. P. Orlando, S. Gustavsson, and W. D. Oliver,Physical Review Applied , 054062 (2018).[12] B. Foxen, C. Neill, A. Dunsworth, P. Roushan, B. Chiaro,A. Megrant, J. Kelly, Z. Chen, K. Satzinger, R. Barends,F. Arute, K. Arya, R. Babbush, D. Bacon, J. C.Bardin, S. Boixo, D. Buell, B. Burkett, Y. Chen,R. Collins, E. Farhi, A. Fowler, C. Gidney, M. Giustina,R. Graﬀ, M. Harrigan, T. Huang, S. V. Isakov, E. Jef-frey, Z. Jiang, D. Kafri, K. Kechedzhi, P. Klimov, A. Ko-rotkov, F. Kostritsa, D. Landhuis, E. Lucero, J. McClean,M. McEwen, X. Mi, M. Mohseni, J. Y. Mutus, O. Naa-man, M. Neeley, M. Niu, A. Petukhov, C. Quintana,N. Rubin, D. Sank, V. Smelyanskiy, A. Vainsencher,T. C. White, Z. Yao, P. Yeh, A. Zalcman, H. Neven,J. M. Martinis, and Google AI Quantum, Physical Re-view Letters (2020).[13] M. Kjaergaard, M. E. Schwartz, A. Greene, G. O.Samach, A. Bengtsson, M. O’Keeﬀe, C. M. McNally,J. Braum¨uller, D. K. Kim, P. Krantz, M. Marvian,A. Melville, B. M. Niedzielski, Y. Sung, R. Winik, J. Yo-der, D. Rosenberg, K. Obenland, S. Lloyd, T. P. Orlando,I. Marvian, S. Gustavsson, and W. D. Oliver, (2020),arXiv:2001.08838 [quant-ph] .[14] Y. Sung, L. Ding, J. Braum¨uller, A. Veps¨al¨ainen, B. Kan-nan, M. Kjaergaard, A. Greene, G. O. Samach, C. Mc-Nally, D. Kim, A. Melville, B. M. Niedzielski, M. E.Schwartz, J. L. Yoder, T. P. Orlando, S. Gustavsson, and W. D. Oliver, (2020), arXiv:2011.01261 [quant-ph] .[15] T. P. Harty, D. T. C. Allcock, C. J. Ballance, L. Guidoni,H. A. Janacek, N. M. Linke, D. N. Stacey, and D. M.Lucas, Phys. Rev. Lett. , 220501 (2014).[16] E. Jeﬀrey, D. Sank, J. Y. Mutus, T. C. White, J. Kelly,R. Barends, Y. Chen, Z. Chen, B. Chiaro, A. Dunsworth,A. Megrant, P. J. J. O’Malley, C. Neill, P. Roushan,A. Vainsencher, J. Wenner, A. N. Cleland, and J. M.Martinis, Phys. Rev. Lett. , 190504 (2014).[17] C. C. Bultink, M. A. Rol, T. E. O’Brien, X. Fu, B. C. S.Dikken, C. Dickel, R. F. L. Vermeulen, J. C. de Sterke,A. Bruno, R. N. Schouten, and L. DiCarlo, Phys. Rev.Appl. , 034008 (2016).[18] J. Heinsoo, C. K. Andersen, A. Remm, S. Krinner,T. Walter, Y. Salath´e, S. Gasparinetti, J.-C. Besse,A. Potoˇcnik, A. Wallraﬀ, and C. Eichler, Phys. Rev.Appl. , 034040 (2018).[19] S. Bravyi, D. Gosset, and R. K¨onig, Science , 308(2018).[20] H.-S. Zhong, H. Wang, Y.-H. Deng, M.-C. Chen, L.-C.Peng, Y.-H. Luo, J. Qin, D. Wu, X. Ding, Y. Hu, P. Hu,X.-Y. Yang, W.-J. Zhang, H. Li, Y. Li, X. Jiang, L. Gan,G. Yang, L. You, Z. Wang, L. Li, N.-L. Liu, C.-Y. Lu,and J.-W. Pan, Science , 1460 (2020).[21] R. Babbush, J. McClean, C. Gidney, S. Boixo, andH. Neven, (2020), arXiv:2011.04149 [quant-ph] .[22] F. Arute, K. Arya, R. Babbush, D. Bacon, J. C. Bardin,R. Barends, S. Boixo, M. Broughton, B. B. Buck-ley, D. A. Buell, B. Burkett, N. Bushnell, Y. Chen,Z. Chen, B. Chiaro, R. Collins, W. Courtney, S. De-mura, A. Dunsworth, D. Eppens, E. Farhi, A. Fowler,B. Foxen, C. Gidney, M. Giustina, R. Graﬀ, S. Habegger,M. P. Harrigan, A. Ho, S. Hong, T. Huang, W. J. Hug-gins, L. Ioﬀe, S. V. Isakov, E. Jeﬀrey, Z. Jiang, C. Jones,D. Kafri, K. Kechedzhi, J. Kelly, S. Kim, P. V. Klimov,A. Korotkov, F. Kostritsa, D. Landhuis, P. Laptev,M. Lindmark, E. Lucero, O. Martin, J. M. Martinis, J. R.McClean, M. McEwen, A. Megrant, X. Mi, M. Mohseni,W. Mruczkiewicz, J. Mutus, O. Naaman, M. Neeley,C. Neill, H. Neven, M. Yuezhen Niu, T. E. O’Brien,E. Ostby, A. Petukhov, H. Putterman, C. Quintana,P. Roushan, N. C. Rubin, D. Sank, K. J. Satzinger,V. Smelyanskiy, D. Strain, K. J. Sung, M. Szalay, T. Y.Takeshita, A. Vainsencher, T. White, N. Wiebe, Z. J.Yao, P. Yeh, and A. Zalcman, Science , 1084–1089(2020).[23] J. Kelly, R. Barends, A. G. Fowler, A. Megrant, E. Jef-frey, T. C. White, D. Sank, J. Y. Mutus, B. Campbell,Y. Chen, Z. Chen, B. Chiaro, A. Dunsworth, I. C. Hoi,C. Neill, P. J. J. O’Malley, C. Quintana, P. Roushan,A. Vainsencher, J. Wenner, A. N. Cleland, and J. M.Martinis, Nature , 66 (2015).[24] D. Rist`e, S. Poletto, M. Z. Huang, A. Bruno, V. Vesteri-nen, O. P. Saira, and L. DiCarlo, Nat. Commun. , 6983(2015).[25] M. Takita, A. D. C´orcoles, E. Magesan, B. Abdo,M. Brink, A. Cross, J. M. Chow, and J. M. Gambetta,Phys. Rev. Lett. , 210505 (2016).[26] V. Negnevitsky, M. Marinelli, K. K. Mehta, H.-Y. Lo,C. Fl¨uhmann, and J. P. Home, Nature , 527 (2018).[27] C. C. Bultink, T. E. O’Brien, R. Vollmer, N. Muthusub-ramanian, M. W. Beekman, M. A. Rol, X. Fu, B. Tarasin-ski, V. Ostroukh, B. Varbanov, A. Bruno, and L. Di-Carlo, Science Advances , eaay3050 (2020). [28] C. K. Andersen, A. Remm, S. Lazar, S. Krinner, J. Hein-soo, J.-C. Besse, M. Gabureac, A. Wallraﬀ, and C. Eich-ler, npj Quantum Information (2019), 10.1038/s41534-019-0185-4.[29] C. K. Andersen, A. Remm, S. Lazar, S. Krinner,N. Lacroix, G. J. Norris, M. Gabureac, C. Eichler, andA. Wallraﬀ, Nature Physics , 875–880 (2020).[30] Z. Chen, K. J. Satzinger, J. Atalaya, A. N. Korotkov,A. Dunsworth, D. Sank, C. Quintana, M. McEwen,R. Barends, P. V. Klimov, S. Hong, C. Jones,A. Petukhov, D. Kafri, S. Demura, B. Burkett, C. Gid-ney, A. G. Fowler, H. Putterman, I. Aleiner, F. Arute,K. Arya, R. Babbush, J. C. Bardin, A. Bengtsson,A. Bourassa, M. Broughton, B. B. Buckley, D. A. Buell,N. Bushnell, B. Chiaro, R. Collins, W. Courtney, A. R.Derk, D. Eppens, C. Erickson, E. Farhi, B. Foxen,M. Giustina, J. A. Gross, M. P. Harrigan, S. D. Harring-ton, J. Hilton, A. Ho, T. Huang, W. J. Huggins, L. B.Ioﬀe, S. V. Isakov, E. Jeﬀrey, Z. Jiang, K. Kechedzhi,S. Kim, F. Kostritsa, D. Landhuis, P. Laptev, E. Lucero,O. Martin, J. R. McClean, T. McCourt, X. Mi, K. C.Miao, M. Mohseni, W. Mruczkiewicz, J. Mutus, O. Naa-man, M. Neeley, C. Neill, M. Newman, M. Y. Niu, T. E.O’Brien, A. Opremcak, E. Ostby, B. Pat´o, N. Redd,P. Roushan, N. C. Rubin, V. Shvarts, D. Strain, M. Sza-lay, M. D. Trevithick, B. Villalonga, T. White, Z. J. Yao,P. Yeh, A. Zalcman, H. Neven, S. Boixo, V. Smelyan-skiy, Y. Chen, A. Megrant, and J. Kelly, (2021),arXiv:2102.06132 [quant-ph] .[31] F. W. Strauch, P. R. Johnson, A. J. Dragt, C. J. Lobb,J. R. Anderson, and F. C. Wellstood, Phys. Rev. Lett. , 167005 (2003).[32] L. DiCarlo, J. M. Chow, J. M. Gambetta, L. S. Bishop,B. R. Johnson, D. I. Schuster, J. Majer, A. Blais, L. Frun-zio, S. M. Girvin, and R. J. Schoelkopf, Nature , 240(2009).[33] J. M. Martinis and M. R. Geller, Phys. Rev. A , 022307(2014).[34] V. Tripathi, M. Khezri, and A. N. Korotkov, Phys. Rev.A , 012301 (2019).[35] A. P. Babu, J. Tuorila, and T. Ala-Nissila, npj QuantumInformation (2021), 10.1038/s41534-020-00357-z.[36] M. Werninghaus, D. J. Egger, F. Roy, S. Machnes,F. K. Wilhelm, and S. Filipp, (2020), arXiv:2003.05952[quant-ph] .[37] P. Aliferis and B. M. Terhal, Quantum Info. Comput. ,139 (2007).[38] A. G. Fowler, Phys. Rev. A , 042308 (2013).[39] J. Ghosh, A. G. Fowler, J. M. Martinis, and M. R. Geller,Phys. Rev. A , 062329 (2013).[40] J. Ghosh and A. G. Fowler, Phys. Rev. A , 020302(2015).[41] M. Suchara, A. W. Cross, and J. M. Gambetta, Quan-tum Info. Comput. , 997 (2015).[42] N. C. Brown and K. R. Brown, Phys. Rev. A , 052301(2018).[43] N. C. Brown, M. Newman, and K. R. Brown, New Jour-nal of Physics , 073055 (2019).[44] N. C. Brown and K. R. Brown, Phys. Rev. A , 032325(2019).[45] B. M. Varbanov, F. Battistel, B. M. Tarasinski, V. P.Ostroukh, T. E. O’Brien, L. DiCarlo, and B. M. Terhal,npj Quantum Information (2020), 10.1038/s41534-020-00330-w. [46] N. C. Brown, A. W. Cross, and K. R. Brown, (2020),arXiv:2003.05843 [quant-ph] .[47] M. McEwen, D. Kafri, Z. Chen, J. Atalaya, K. J.Satzinger, C. Quintana, P. V. Klimov, D. Sank, C. Gid-ney, A. G. Fowler, F. Arute, K. Arya, B. Buckley, B. Bur-kett, N. Bushnell, B. Chiaro, R. Collins, S. Demura,A. Dunsworth, C. Erickson, B. Foxen, M. Giustina,T. Huang, S. Hong, E. Jeﬀrey, S. Kim, K. Kechedzhi,F. Kostritsa, P. Laptev, A. Megrant, X. Mi, J. Mu-tus, O. Naaman, M. Neeley, C. Neill, M. Niu, A. Paler,N. Redd, P. Roushan, T. C. White, J. Yao, P. Yeh,A. Zalcman, Y. Chen, V. N. Smelyanskiy, J. M. Marti-nis, H. Neven, J. Kelly, A. N. Korotkov, A. G. Petukhov,and R. Barends, (2021), arXiv:2102.06131 [quant-ph] .[48] A. G. Fowler, M. Mariantoni, J. M. Martinis, and A. N.Cleland, Phys. Rev. A , 032324 (2012).[49] T. E. O’Brien, B. Tarasinski, and L. DiCarlo, npj Quan-tum Information (2017), 10.1038/s41534-017-0039-x.[50] D. Hayes, D. Stack, B. Bjork, A. Potter, C. Baldwin, andR. Stutz, Physical Review Letters (2020).[51] V. Langrock and D. P. DiVincenzo, (2020),arXiv:2012.09517 [quant-ph] .[52] P. Magnard, P. Kurpiers, B. Royer, T. Walter, J.-C.Besse, S. Gasparinetti, M. Pechal, J. Heinsoo, S. Storz,A. Blais, and A. Wallraﬀ, Phys. Rev. Lett. , 060502(2018).[53] S. Zeytino˘glu, M. Pechal, S. Berger, A. A. Abdumalikov,A. Wallraﬀ, and S. Filipp, Physical Review A (2015).[54] D. Egger, M. Werninghaus, M. Ganzhorn, G. Salis,A. Fuhrer, P. M¨uller, and S. Filipp, Phys. Rev. Applied , 044030 (2018).[55] D. Rist`e, C. C. Bultink, K. W. Lehnert, and L. DiCarlo,Physical Review Letters (2012).[56] R. Versluis, S. Poletto, N. Khammassi, B. Tarasinski,N. Haider, D. J. Michalak, A. Bruno, K. Bertels, andL. DiCarlo, Phys. Rev. Appl. , 034021 (2017).[57] The quantumsim package can be found at https://quantumsim.gitlab.io/ .[58] T. M. Stace and S. D. Barrett, Phys. Rev. A , 022317(2010).[59] S. Nagayama, A. G. Fowler, D. Horsman, S. J. Devitt,and R. V. Meter, New Journal of Physics , 023050(2017).[60] J. M. Auger, H. Anwar, M. Gimeno-Segovia, T. M. Stace,and D. E. Browne, Phys. Rev. A , 042316 (2017).[61] J. R. Schrieﬀer and P. A. Wolﬀ, Phys. Rev. , 491(1966).[62] S. Bravyi, D. P. DiVincenzo, and D. Loss, Annals ofPhysics , 2793 (2011).[63] E. Magesan and J. M. Gambetta, Physical Review A (2020).[64] M. Boissonneault, J. M. Gambetta, and A. Blais, Phys-ical Review A (2009).[65] B. Nijholt, J. Weston, J. Hoofwijk, and A. Akhmerov,“ Adaptive : parallel active learning of mathematical func-tions,” (2019).[66] H. P. Breuer and F. Petruccione,

The Theory of OpenQuantum Systems (Oxford University Press, Oxford,2002).[67] C. J. Wood and J. M. Gambetta, Phys. Rev. A ,032306 (2018).[68] S. Krinner, S. Lazar, A. Remm, C. Andersen, N. Lacroix,G. Norris, C. Hellings, M. Gabureac, C. Eichler, andA. Wallraﬀ, Phys. Rev. Applied , 024042 (2020). [69] S. T. Spitz, B. Tarasinski, C. W. J. Beenakker, and T. E.O’Brien, Advanced Quantum Technologies , 1800012(2018).[70] J. B. Hertzberg, E. J. Zhang, S. Rosenblatt, E. Mage-san, J. A. Smolin, J.-B. Yau, V. P. Adiga, M. Sand-berg, M. Brink, J. M. Chow, and J. S. Orcutt, (2020),arXiv:2009.00781 [quant-ph] . Appendix A: Approximate transmon-resonator Hamiltonian1. Schrieﬀer-Wolﬀ Transformation

In this section we explain the concept of the Schrieﬀer-Wolﬀ transformation (SWT) [61–63] and derive the equationsthat we use in the following sections.Consider a Hamiltonian H = H + (cid:15)V (A1)expressed in a certain basis {| ψ n (cid:105)} , where H is block diagonal with respect to this basis and the perturbation V can be taken as block oﬀ-diagonal without loss of generality (block-diagonal terms can be included in the deﬁnitionof H ). Furthermore, we assume || V || = O (1) and (cid:15) (cid:28) ∆ ij , where we set ∆ ij as the minimum energy separationbetween blocks i and j .The SWT corresponds to ﬁnding an anti-hermitian matrix S such that H (cid:48) := e S He − S (A2)is block diagonal. In other words, calling {| ¯ ψ n (cid:105)} the basis of eigenstates of H , e S = (cid:80) n | ψ n (cid:105) (cid:104) ¯ ψ n | . The matrix S canbe expanded in a series S = ∞ (cid:88) k =1 (cid:15) k S k (A3)where each S k is block oﬀ-diagonal. If (cid:15) (cid:28) ∆ ij one can expect the ﬁrst order ( S ) to provide a good approximation,otherwise one needs to consider higher orders depending on (cid:15) (although the series does not always converge forextensive systems [62]). Using the Baker-Campbell-Hausdorﬀ formula one gets H (cid:48) = e S He − S = ∞ (cid:88) k =0 k ! [ S, [ S, . . . [ S, (cid:124) (cid:123)(cid:122) (cid:125) k times H ] . . . ]] . (A4)The usual procedure for the SWT is to group terms of the same order in (cid:15) in this formula and set the block oﬀ-diagonalpart of H (cid:48) to 0, thus getting equations for { S k } , in the usual case with two blocks [62]. One uses the relationships[diagonal , diagonal] = diagonal , (A5)[diagonal , oﬀ-diagonal] = oﬀ-diagonal , (A6)[oﬀ-diagonal , oﬀ-diagonal] = diagonal . (A7)However, the last line only holds for the case with two blocks. In the following we consider the generalization ofthe SWT to the case with an arbitrary number of blocks [63]. We use the notation O D and O OD for the blockdiagonal and oﬀ-diagonal parts of an operator O = O D + O OD , respectively.Here we expand H and S up to k = 3 in Eq. (A4), assuming that the 4th-order block oﬀ-diagonal term is negligible.We get the following pieces:0th order : H (A8)1st order : V + [ S , H ] (A9)2nd order : [ S , V ] + 12 [ S , [ S , H ]] + [ S , H ] (A10)3rd order : [ S , V ] + 12 (cid:16) [ S , [ S , H ]] + [ S , [ S , V ]] + [ S , [ S , H ]] (cid:17) + 16 [ S , [ S , [ S , H ]]] + [ S , H ] (A11)4th order : [ S , V ] + 12 (cid:16) [ S , [ S , H ]] + [ S , [ S , H ]] + [ S , [ S , H ]] + [ S , [ S , V ]] + [ S , [ S , V ]] (cid:17) + 16 (cid:16) [ S , [ S , [ S , V ]]] + [ S , [ S , [ S , H ]]] + [ S , [ S , [ S , H ]]] + [ S , [ S , [ S , H ]]] (cid:17) + 124 [ S , [ S , [ S , [ S , H ]]]] . (A12)4Setting the block oﬀ-diagonal parts at 1st, 2nd and 3rd order to 0 we get[ H , S ] = V (A13)[ H , S ] = 12 [ S , V ] OD (A14)[ H , S ] = 12 [ S , V ] OD + 13 [ S , [ S , V ] D ] OD + 112 [ S , [ S , V ] OD ] OD , (A15)where we have used the ﬁrst equation to simplify the following ones. These equations can be solved iteratively for S k (given knowledge of the eigenstates of H ). The Hamiltonian H (cid:48) is then block diagonal up to 4th order and is explicitlygiven by H (cid:48) = H + (cid:15) S , V ] D + (cid:15) (cid:16)

12 [ S , V ] D + 112 [ S , [ S , V ] OD ] D (cid:17) + (cid:15) (cid:16)

12 [ S , V ] D − (cid:2) S , [ S , [ S , V ] D ] OD (cid:3) D −

16 [ S , [ S , V ] OD ] D + 112 [ S , [ S , V ] OD ] D (cid:17) . (A16)This expression has been simpliﬁed using Eqs. (A13) to (A15), together with the fact that e.g. [ S k , [ . . . , . . . ] D ] D = 0since S k is block oﬀ-diagonal.

2. SWT of the capacitive coupling

We consider the Hamiltonian H = H + H c + H d of a driven transmon capacitively coupled to a resonator, as givenin the main text in Eqs. (1) to (4).The SWT of H c up to 1st order in the perturbation parameter (cid:15) = g/ ∆, where ∆ = ω q − ω r , is implemented usingthe matrix [64] S = g ∞ (cid:88) m =1 √ m ∆ + α ( m − (cid:16) a | m (cid:105) (cid:104) m − | − h.c. (cid:17) , (A17)where {| m (cid:105)} are transmon states and where we have absorbed (cid:15) in the deﬁnition of S . The Hamiltonian in theunitarily transformed frame as deﬁned in the main text is then given by H D ≈ e S He − S = e S ( H + H c ) e − S + e S H d e − S (A18)with e S ( H + H c ) e − S = H + 12 [ S , H c ] (A19) ≈ δ r a † a + ∞ (cid:88) m =1 (cid:16) mδ q + α m ( m −

1) + g m ∆ m − (cid:17) | m (cid:105) (cid:104) m | − a † a ∞ (cid:88) m =0 g ∆ − ∆ m ∆ m − | m (cid:105) (cid:104) m | (A20):= H D (A21)where we deﬁne ∆ m = ∆ + αm = ∆ − | α | m as α < S , H c ], since it is proportionalto gα/ (∆ m ∆ m − ). This is negligible for low anharmonicity and, secondly, for ω r > ω q as then ∆ < | ∆ m | increases with m . If instead ω r < ω q , ∆ > | ∆ m | decreases with m , so even if the approximation is good for thetwo lowest levels, there can be some higher level which does not sit well within the dispersive regime. However, inthis work we consider a system with ω r > ω q , hence we do not need to take this into account.The drive Hamiltonian in the unitarily transformed frame takes the form e S H d e − S = Ω e iφ b + h.c. (cid:124) (cid:123)(cid:122) (cid:125) := H Dd + Ω e iφ (cid:32) a ∞ (cid:88) m =0 g ∆ − ∆ m ∆ m − | m (cid:105) (cid:104) m | + a † ∞ (cid:88) m =0 gα √ m + 1 √ m + 2∆ m ∆ m +1 | m (cid:105) (cid:104) m + 2 | (cid:33) + h.c. (cid:124) (cid:123)(cid:122) (cid:125) := H Dd (A22)The last term contains a 1st-order approximation in g/ ∆ of the | (cid:105) ↔ | (cid:105) eﬀective coupling ˜ g , which is linear in Ω.However, the “pure” drive term H Dd can be quite strong, so we need to evaluate how it aﬀects ˜ g and the rest of theHamiltonian.5

3. SWT of the pure drive Hamiltonian

Summarizing, in the unitarily transformed frame the original Hamiltonian H takes (approximately) the form H D ≈ H D + H Dd + H Dd , (A23)where H D is given in Eq. (A20) and H Dd , H Dd are given in Eq. (A22).We now want to ﬁnd an additional SWT transformation S (cid:48) = S (cid:48) + S (cid:48) + S (cid:48) , with H Dd taking the role of V in Appendix A 1, deﬁning a “double-dressed” Hamiltonian H DD := e S (cid:48) H D e − S (cid:48) = e S (cid:48) ( H D + H Dd ) e − S (cid:48) (cid:124) (cid:123)(cid:122) (cid:125) =: H DD + e S (cid:48) H Dd e − S (cid:48) (cid:124) (cid:123)(cid:122) (cid:125) =: H DDd (A24)such that H DD is fully diagonal up to 3rd order in the perturbation parameter (cid:15) = Ω /δ q . Then H DDd gives thecouplings within the manifold of interest ( | (cid:105) , | (cid:105) ) and outside of it. We absorb (cid:15) k in the deﬁnition of S (cid:48) k so it doesnot explicitly appear below.Following Appendix A 1, to ﬁnd S (cid:48) we need to solve Eq. (A13), i.e. (cid:2) H D , S (cid:48) (cid:3) = H Dd (A25)in this speciﬁc case. Bracketing it with the eigenstates {| ml (cid:105)} of H D , with the notation | transmon , resonator (cid:105) , we getthe matrix elements of S (cid:48) as (cid:104) ml | S (cid:48) | nk (cid:105) = (cid:104) ml | H Dd | nk (cid:105) E Dml − E Dnk , (A26)where { E Dml } are the eigenenergies of H D , which can be easily inferred from Eq. (A20). We neglect the dispersiveshift since it is proportional to α/ ∆. Then (cid:104) ml | S (cid:48) | nk (cid:105) = Ω2 (cid:32) − √ m + 1 δ m,n − δ l,k δ q + αm + g ∆ − ∆ m − ∆ m e iφ + √ mδ m,n +1 δ l,k δ q + α ( m −

1) + g ∆ − ∆ m − ∆ m − e − iφ (cid:33) , (A27)where δ i,j is the Kronecker delta. From this equation one can infer that S (cid:48) = − Ω2 e iφ ∞ (cid:88) m =0 √ m + 1 δ qm | m (cid:105) (cid:104) m + 1 | − h.c. , (A28)where we have deﬁned δ qm = δ q + αm + g ∆ − ∆ m − ∆ m .Having derived S (cid:48) , we can compute S (cid:48) from Eq. (A14), i.e. (cid:2) H D , S (cid:48) (cid:3) = 12 (cid:2) S (cid:48) , H Dd (cid:3) OD (A29)with (cid:2) S (cid:48) , H Dd (cid:3) = − Ω ∞ (cid:88) m =0 ˜ δ qm δ qm δ qm − | m (cid:105) (cid:104) m | − Ω ∞ (cid:88) m =0 √ m + 1 √ m + 2 (cid:16) δ qm − δ qm +1 (cid:17) ( e iφ | m (cid:105) (cid:104) m + 2 | + h.c.) , (A30)where ˜ δ qm = δ q − α + g ∆ − ∆ m ∆ m ∆ m − ∆ m − . Clearly the ﬁrst term is the diagonal part while the second term is the oﬀ-diagonalone. With a similar procedure as the one used for S (cid:48) , it follows that S (cid:48) = Ω e iφ ∞ (cid:88) m =0 √ m + 1 √ m + 2 δ qm + δ qm +1 (cid:16) δ qm − δ qm +1 (cid:17) | m (cid:105) (cid:104) m + 2 | − h.c. (A31)We can then compute S (cid:48) from Eq. (A15), i.e. (cid:2) H D , S (cid:48) (cid:3) = 12 (cid:2) S (cid:48) , H Dd (cid:3) OD + 13 (cid:2) S (cid:48) , (cid:2) S (cid:48) , H Dd (cid:3) D (cid:3) OD + 112 (cid:2) S (cid:48) , (cid:2) S (cid:48) , H Dd (cid:3) OD (cid:3) OD . (A32)6The result is S (cid:48) = Ω e iφ ∞ (cid:88) m =0 | m (cid:105) (cid:104) m + 1 | (cid:32) √ m + 1( δ qm ) (cid:16) ˜ δ qm +1 δ qm +1 − ˜ δ qm δ qm − (cid:17) + 196 δ qm (cid:32) ( m + 2) √ m + 1 δ qm + 4 δ qm +1 δ qm +1 ( δ qm + δ qm +1 ) (cid:16) δ qm − δ qm +1 (cid:17) − √ m + 1 m δ qm − + δ qm δ qm − ( δ qm − + δ qm ) (cid:16) δ qm − − δ qm (cid:17)(cid:33)(cid:33) − h.c.+ Ω e iφ ∞ (cid:88) m =0 | m (cid:105) (cid:104) m + 3 | √ m + 1 √ m + 2 √ m + 3 δ qm + δ qm +1 + δ qm +2 (cid:32) δ qm +2 − δ qm +1 − δ qm δ qm +2 ( δ qm + δ qm +1 ) (cid:16) δ qm − δ qm +1 (cid:17) − δ qm − δ qm +1 − δ qm +2 δ qm ( δ qm +1 + δ qm +2 ) (cid:16) δ qm +1 − δ qm +2 (cid:17)(cid:33) − h.c. (A33)We can eventually use Eqs. (A28), (A31) and (A33) together with Eq. (A16) to obtain H DD (deﬁned in Eq. (A24)): H DD = δ r a † a + ∞ (cid:88) m =0 | m (cid:105) (cid:104) m | (cid:32) mδ q + α m ( m −

1) + g m ∆ m − − Ω ˜ δ qm δ qm δ qm − − Ω (cid:32) m + 1( δ qm ) (cid:16) ˜ δ qm +1 δ qm +1 − ˜ δ qm δ qm − (cid:17) − m ( δ qm − ) (cid:16) ˜ δ qm δ qm − ˜ δ qm − δ qm − (cid:17)(cid:33) − Ω (cid:32) δ qm (cid:32) ( m + 2)( m + 1) δ qm + 5 δ qm +1 δ qm +1 ( δ qm + δ qm +1 ) (cid:16) δ qm − δ qm +1 (cid:17) − ( m + 1) m δ qm − + δ qm δ qm − ( δ qm − + δ qm ) (cid:16) δ qm − − δ qm (cid:17)(cid:33) − δ qm − (cid:32) ( m + 1) m δ qm − + 5 δ qm δ qm ( δ qm − + δ qm ) (cid:16) δ qm − − δ qm (cid:17) − m ( m −

1) 5 δ qm − + δ qm − δ qm − ( δ qm − + δ qm − ) (cid:16) δ qm − − δ qm − (cid:17)(cid:33)(cid:33) + Ω (cid:32) ( m + 2)( m + 1) δ qm + δ qm +1 (cid:16) δ qm − δ qm +1 (cid:17) − m ( m − δ qm − + δ qm − (cid:16) δ qm − − δ qm − (cid:17) (cid:33)(cid:33) − a † a (cid:88) m g ∆ − ∆ m ∆ m − | m (cid:105) (cid:104) m | . (A34)The approximate coupling Hamiltonian H DDd (deﬁned in Eq. (A24)) up to 2nd order in Ω /δ q is instead given by H DDd = H Dd + (cid:2) S (cid:48) , H Dd (cid:3) + (cid:2) S (cid:48) , H Dd (cid:3) + 12 (cid:2) S (cid:48) , (cid:2) S (cid:48) , H Dd (cid:3)(cid:3) (A35)=: H DD eﬀ . coupl . + H DD resid . , (A36)where H DD eﬀ . coupl . = e iφ a † ∞ (cid:88) m =0 | m (cid:105) (cid:104) m + 2 | (cid:32) ˜ g m (cid:16) − Ω (cid:16) m + 3( δ qm +2 ) + m + 2( δ qm +1 ) + m + 1( δ qm ) + m ( δ qm − ) (cid:17)(cid:17) + Ω (cid:16) √ m + 1 √ m + 3 δ qm δ qm +2 ˜ g m +1 + √ m √ m + 2 δ qm − δ qm +1 ˜ g m − (cid:17) + Ω √ m + 1 √ m + 2 (cid:32) g (cid:48) m +2 δ qm ( δ qm + δ qm +1 ) − g (cid:48) m +1 δ qm δ qm +1 + g (cid:48) m δ qm +1 ( δ qm + δ qm +1 ) (cid:33)(cid:33) + h.c.(A37)7with ˜ g m := gα Ω √ m + 1 √ m + 22∆ m ∆ m +1 (A38) g (cid:48) m := g Ω∆ − m ∆ m − , (A39)and H DD resid . = ( e iφ a + h.c.) ∞ (cid:88) m =0 | m (cid:105) (cid:104) m | (cid:32) g (cid:48) m (cid:16) − Ω (cid:16) m + 1( δ qm ) + m ( δ qm − ) (cid:17)(cid:17) + Ω (cid:16) m + 1( δ qm ) g (cid:48) m +1 + m ( δ qm − ) g (cid:48) m − (cid:17) + Ω (cid:16) √ m + 1 √ m + 2˜ g m δ qm ( δ qm + δ qm +1 ) + √ m √ m + 1˜ g m − δ qm δ qm − + √ m − √ m ˜ g m − δ qm − ( δ qm − + δ qm − ) (cid:17)(cid:33) − Ω2 e iφ a ∞ (cid:88) m =0 | m (cid:105) (cid:104) m + 1 | √ m + 1 δ qm ( g (cid:48) m +1 − g (cid:48) m ) + h.c. − Ω2 a † ∞ (cid:88) m =0 | m (cid:105) (cid:104) m + 1 | (cid:16) √ m + 1 δ qm ( g (cid:48) m +1 − g (cid:48) m ) + √ m + 2 δ qm +1 ˜ g m − √ mδ qm − ˜ g m − (cid:17) + h.c.+ Ω e iφ a ∞ (cid:88) m =0 | m (cid:105) (cid:104) m + 2 | √ m + 1 √ m + 2 (cid:32) g (cid:48) m +2 δ qm ( δ qm + δ qm +1 ) − g (cid:48) m +1 δ qm δ qm +1 + g (cid:48) m δ qm +1 ( δ qm + δ qm +1 ) (cid:33) + h.c. − Ω2 e iφ a † ∞ (cid:88) m =0 | m (cid:105) (cid:104) m + 3 | (cid:16) √ m + 1 δ qm ˜ g m +1 − √ m + 3 δ qm +2 ˜ g m (cid:17) + h.c.+ Ω e iφ a † ∞ (cid:88) m =0 | m (cid:105) (cid:104) m + 4 | (cid:32) √ m + 1 √ m + 2˜ g m +2 δ qm ( δ qm + δ qm +1 ) − √ m + 4 √ m + 1˜ g m +1 δ qm δ qm +3 + √ m + 3 √ m + 4˜ g m δ qm +3 ( δ qm +3 + δ qm +2 ) (cid:33) + h.c.(A40)All terms in H DD resid . are relatively small and oﬀ-resonant with the | (cid:105) ↔ | (cid:105) transition so we expect them to have asmall eﬀect and we do not proceed with higher orders of SWTs.

4. Analysis of the | (cid:105) ↔ | (cid:105) avoided crossing In this section we give the methods used to calculate the curves in Fig. 1 c, d .We deﬁne ω ∗ d as the drive frequency corresponding to the center of the | (cid:105) ↔ | (cid:105) avoided crossing of the fullHamiltonian H as given in the main text in Eq. (1). Then the exact value of the eﬀective | (cid:105) ↔ | (cid:105) coupling ˜ g is given by half the energy separation at that point. The avoided crossing can be found numerically by exactdiagonalization as a function of ω d .In the subspace S = span {| (cid:105) , | (cid:105)} we can write H as H | S ≡ − η ( ω d ) Z/ g ( ω d )[cos( φ ) X + sin( φ ) Y ] = − η ( ω d ) Z/ g ( ω d ) X for φ = 0 as in the main text. As we want to implement a | (cid:105) ↔ | (cid:105) π rotation, we no-tice that the choice of φ , i.e. the choice of rotation axis in the equator of the Bloch sphere, is irrelevant. We have alsoignored a term proportional to the identity I , which gives a phase diﬀerence with respect to states outside of S , inparticular between the computational and leakage subspaces of the transmon. However, this phase is irrelevant if | (cid:105) is swapped entirely onto | (cid:105) since the latter decays and dephases fast, thus suppressing any phase coherence. Asdemonstrated in Section I B the res-LRU can reach a very high R , for which the eﬀect of this phase is then minimal.Assuming that H DD resid . in Eq. (A40) is negligible, an analytical approximation of η is given by η ( ω d ) ≈ (cid:104) | H DD ( ω d ) | (cid:105) − (cid:104) | H DD ( ω d ) | (cid:105) , (A41)where we have made the dependence of H DD in Eq. (A34) on ω d explicit. This holds since then H DD accounts for allthe Stark shifts of | (cid:105) and | (cid:105) due to the capacitive coupling and the drive (up to the given orders). The center ofthe avoided crossing is found by imposing the condition η ( ω d ) = 0. As the explicit expression that can be extractedfrom Eq. (A34) is not analytically solvable, we use the secant method available in scipy to ﬁnd ω ∗ d that fulﬁlls thiscondition in Eq. (A41). It is then straightforward to compute the (approximate) analytical estimate for the eﬀectivecoupling as ˜ g ( ω ∗ d ) = | (cid:104) | H DD eﬀ . coupl . ( ω ∗ d ) | (cid:105) | from Eq. (A37), which is plotted in Fig. 1 d .8 / π (MHz)5 . . . . ω d / π ( G H z ) a / π (MHz) b . . . . . T ( µs ) 10 . . . . T ( µs ) Figure 6: Eﬀective T ( a ) and T ( b ) which account for the extra decoherence caused by the drive during the time slot T slot =440 ns. We can see that the variation is small as a function of the drive amplitude compared to the values at Ω = 0. Thewhite star indicates the optimal point (Ω / π ≈

204 MHz, ω d / π ≈ .

247 GHz, t p = 108 . ω ∗ d as a function of Ω (see Fig. 1 c ). The heatmaps aresampled using the adaptive package [65]. Appendix B: Further characterization of the readout-resonator LRU1. Eﬀective T and T due to the drive In this section we discuss the eﬀects of the readout-resonator LRU within the computational subspace when appliedto a non-leaked transmon. As pulses at diﬀerent ( ω d , Ω) points have diﬀerent duration t p , it would not be fair to reportan eﬀective T and T during t p . That is, stronger pulses potentially produce lower T and T , but they also takeless time to implement the LRU. However, the overall disturbance to the qubit is a combination of these two factors.We thus report an eﬀective T and T during the whole time slot of T slot = 440 ns, leading to a uniform metric forthe whole ( ω d , Ω) landscape. Speciﬁcally, to estimate T we prepare the state | (cid:105) (cid:104) | ⊗ σ th , we simulate the Lindbladequation in Eq. (12) and we evaluate the remaining population p | (cid:105) in | (cid:105) at the end of the time slot after tracingout the resonator. Assuming that p | (cid:105) = e − T slot /T we then compute T by inverting this formula. To estimate T weprepare | + (cid:105) (cid:104) + | ⊗ σ th and we evaluate the decay of the oﬀ-diagonal matrix element | (cid:105) (cid:104) | as this is directly availablein simulation (rather than simulating a full Ramsey experiment). We then invert |(cid:104) | Tr r ( ρ ( T slot )) | (cid:105)| = e − T slot /T / T .Figure 6 shows the resulting eﬀective T and T . One can see that T barely decreases as a function of Ω, showingthat a short t p largely counterbalances the eﬀect of a strong Ω. One can also see that the value of T is about 11 . µ sat Ω = 0, i.e. when no pulse is applied. This has to be contrasted with the input T parameter of 30 µ s inserted inthe Lindblad equation (see Table I). We assume that that implicitly accounts for dephasing caused by ﬂux noise only.Photon-shot noise from the resonator is a further dephasing source which is explicitly included in these simulations.The combination of ﬂux and photon-shot noise leads to the actual eﬀective T reported in Fig. 6 b . We note thatif ¯ n = 0 then the eﬀective T at Ω = 0 would exactly match the input of 30 µ s. While the eﬀective T can be restoredfrom 11 . µ s to 30 µ s with colder resonators or by engineering diﬀerent system parameters altogether, the importantinformation from Fig. 6 b is that T barely changes as a function of Ω. Combined with the similar result for T ,this means that the drive causes only a marginal eﬀect within the computational subspace. Notice that in the regionwhere the readout-resonator LRU is most eﬀective (just above the purple line in Fig. 6 b ), T is even slightly higherthan at Ω = 0 (11 . . µ s). We attribute this to the fact that the pulse temporarily reduces the excited-statepopulation in the resonator (see Fig. 2 d ). In this way photon-shot noise is reduced until the resonator re-thermalizes,however at the cost of some leakage of the transmon qubit.9 t (ns)0 . . . . . . P o pu l a t i o n . . . t rise Ω ( t ) | i| i| i| i Figure 7: Time evolution from the initial state | (cid:105) (cid:104) | ⊗ σ th for t rise = 30 ns and for an otherwise always-on drive. This issimulated with the same Ω / π ≈

204 MHz and ω d / π ≈ .

247 GHz as at the optimal point in Fig. 2.

2. Long-drive limit and its drawbacks as a LRU

In this section we compare the reset schemes in [53, 54] versus [52] in terms of their performance as a LRU. Theapproach of [53, 54], which we have adopted in the main text, aims at swapping | (cid:105) and | (cid:105) by targeting the ﬁrstminimum of the oscillations induced by the drive (switching the drive oﬀ afterwards). As shown in Section I B, thisapproach allows at most for a residual leakage population p opt . | (cid:105) ≈ .

7% at the optimal point (see Fig. 2 a ), given ourparameters (see Table I). While this already gives almost an optimal advantage in Surface-17 (see Appendix C 2), theapproach in [52] could be used in general to achieve an even lower p | (cid:105) for various Ωs (in particular for lower Ωs).The approach in [52] keeps the drive on for a much longer period of time (a few oscillations) allowing both thepopulations in | (cid:105) and | (cid:105) to decay to almost 0, modulo thermal excitations. Figure 7 shows that it is indeedpossible to suppress these populations to thermal-state levels (here ¯ n = 0 . , ω d ) as at theoptimal point in the main text. However, we see that for the optimal point there is almost no gain by using thisapproach. Furthermore, this approach costs much more time, even more than the allowed T slot = 440 ns. Given thechosen system parameters, the ﬁrst few minima after the ﬁrst one are slightly higher, due to transmon decoherence,and one needs to wait even longer to overcome this eﬀect. The issue of the long drive time could be mitigated by usingdiﬀerent parameters for the resonator, such as a larger relaxation rate κ or capacitive coupling g to the transmon,although each has its own drawbacks. Another disadvantage of the approach in [52] is that the disturbance to thequbit is stronger as the drive is kept on for a longer period of time. E.g., in Fig. 7 one can see that | (cid:105) and | (cid:105) reachan equilibrium thanks to the drive (even in the presence of relaxation), where the population in | (cid:105) is relatively high.Furthermore, if one chooses to use a t p > T slot then the QEC cycle would get longer, aﬀecting the coherence of allqubits, not only of the high-frequency data qubits to which the res-LRU is applied. Appendix C: Further Surface-17 characterization1. Details about the density-matrix simulations

The parameters used in this work are reported in Table II. A comprehensive review of the density-matrix simu-lations and the use of the quantumsim package [57] is available at [45, 49]. In this section we explain the speciﬁcimplementation of the newly introduced res-LRU, expressed in the Pauli Transfer Matrix formalism.We construct a “phenomenological” Lindblad model with input parameters

R, L

LRU1 and t res-LRU . We use the PauliTransfer Matrix S res-LRU = S ↑ S ↓ , where S ↓ is the Pauli Transfer Matrix of the superoperator S ↓ = e t res-LRU L ↓ and theLindbladian L ↓ has the quantum jump operator K ↓ = 1 (cid:113) t res-LRU − log(1 − R sim ) | (cid:105) (cid:104) | (C1)0 Parameter ValueRelaxation time T µ sSweetspot pure-dephasing time T φ, max µ sHigh-freq. pure-dephasing timeat interaction point T φ, int µ sMid-freq. pure-dephasing timeat interaction point T φ, int µ sMid-freq. pure-dephasing timeat parking point T φ, park µ sLow-freq. pure-dephasing timeat parking point T φ, park µ sSingle-qubit gate time t gate

20 nsTwo-qubit interaction time t int

30 nsSingle-qubit phase-correction time t pc

10 nsReadout-resonator LRU time t res-LRU

100 ns | (cid:105) ↔ | (cid:105) π -pulse time t π -LRU

20 nsMeasurement time t m

580 nsQEC-cycle time t c

800 nsTable II: The parameters for the qubit coherence times and for the gate, LRU, measurement and QEC-cycle durations used inthe density-matrix simulations. The interaction point corresponds to the frequency to which a transmon is ﬂuxed to implementa CZ, whereas the parking point to the frequency at which the ancilla qubits are parked during measurement [56]. with R sim to be determined. Besides this, L ↓ has the standard jump operators for relaxation and dephasing. On theother hand, S ↑ is the Pauli Transfer Matrix of the superoperator S ↑ = e L ↑ and the Lindbladian L ↑ has a single jumpoperator K ↑ = 1 (cid:113) − log(1 − L LRU1 ) | (cid:105) (cid:104) | (C2)since relaxation and dephasing during t res-LRU are already accounted for by S ↓ . In this way, calling p i | j (cid:105) , p f | j (cid:105) thepopulations before and after the res-LRU, if we apply S res-LRU on a non-leaked transmon we get p f | (cid:105) = 2 L LRU1 p i | (cid:105) ,consistently with Section II B 1. Instead, if we apply S res-LRU to a leaked transmon ( p i | (cid:105) = 1) we get p f | (cid:105) ≈ − R sim +2 L LRU1 . By ﬁxing R sim = R + 2 L LRU1 we match the deﬁnition of R in Section II B 1 as well. The approximation isvery good for large R and low L LRU1 , which is precisely the interesting regime for res-LRU that we have explored.

2. Logical error rate as a function of the LRU parameters

We study the variation in the logical error rate ε L per QEC cycle as a function of the performance parameters ofthe LRUs. Here we ﬁx L = 0 .

5% as it is easier to visualize variations in ε L with a relatively large L . The leakage-reduction rate R and the readout probability p M (2 |

2) play similar roles for the res-LRU and π -LRU, respectively.In Fig. 8 a, c one can see that this is the case and that the values of ε L at the parameters used in the main text( R = 95% and p M (2 |

2) = 90%) are very close to their best values (at least for this system size). This shows thatthe advantages of a larger R or p M (2 |

2) are marginal. We attribute this to the fact that leakage is exponentiallysuppressed with an already quite large exponent. Furthermore, the parameters L LRU1 and 1 − p M (1 |

1) = p M (2 | b, d show. We see that ε L is more sensitive to L LRU1 and 1 − p M (1 |

1) compared to R and p M (2 | ε L is slightly larger at the parameters used inthe main text ( L LRU1 = 0 .

25% and 1 − p M (1 |

1) = 0 . R (%)1 . . . . . . (cid:15) L ( % ) res-LRU L LRU1 = 0% a UBMWPM 0 . . . . . L LRU1 (%)res-LRU R = 95% b UBMWPM0 20 40 60 80 100 p M (2 |

2) (%)1 . . . . . . (cid:15) L ( % ) π -LRU p M (1 |

1) = 100% c UBMWPM 0 . . . . . . . − p M (1 |

1) (%) π -LRU p M (2 |

2) = 90% d UBMWPM

Figure 8: Logical error rate ε L per QEC cycle as a function of various LRU parameters. a, b use only the res-LRU, while c, d the π -LRU. We ﬁx L = 0 .

5% for all. Vertical dashed lines indicate the values considered in the main text. These results areextracted from 2 ×4