Accelerating Quantum Error Correction Research with NVIDIA Quantum

Noise is the notorious adversary of quantum computing. Qubits are sensitive to the slightest environmental perturbations, quickly causing errors to accumulate and make the results of even the simplest quantum algorithms too noisy to be meaningful.

Quantum error correction (QEC) circumvents this problem by using many noisy physical qubits to encode logical qubits effectively immune to noise. Errors are identified by repeatedly performing measurements on some subset of the noisy physical qubits to produce so-called error syndromes. These syndromes can then be decoded to infer the nature and location of errors so that they can be tracked and eventually fixed, enabling a quantum algorithm to complete without corruption.

Identifying efficient error correction protocols and understanding how to implement them at scale remain grand challenges that must be solved to realize useful quantum computing. The decoding operation described earlier is a primary challenge, as it must be performed accurately within a tight window. Increasing the complexity of a code can improve protection against errors, but puts even more pressure on this decoding step.

Decoding means that practical and scalable quantum error correction requires carefully integrated classical and quantum compute resources in a hardware architecture that tightly couples QPUs and GPUs, simultaneously leveraging a kernel-based programming model to ensure performance. It also requires libraries that can fully leverage accelerated computing for many other aspects of quantum error correction research including code generation, testing, and synthetic data generation.

At GTC 25, NVIDIA announced a collection of tools to accelerate all of these tasks and catalyze QEC research for the entire ecosystem.

Making time for decoding

The narrow window of time available for decoding is further squeezed by the additional time required to transfer data between the QPU and AI supercomputer. If latencies between supercomputer and QPU are too large, the decoder has no time left to identify and track errors, leading to a complete failure of the error correction process.?

NVIDIA and Quantum Machines developed the NVIDIA DGX Quantum reference architecture to solve this problem (Figure 1). DGX Quantum enables GPUs to connect to quantum hardware with ultra-low round trip latencies of less than 4 μs, so they can be used for calibration, control, decoding, and other key tasks.?

The DGX Quantum system combines NVIDIA Grace Hopper superchips with Quantum Machines’ OPX control system to provide scalable and modular connectivity between QPUs and AI supercomputers.

A diagram shows the modular DGX Quantum nodes, each connecting a QPU to the GPU supercomputer using a system that combined Quantum Machines’ OPX control system and NVIDIA Grace Hopper superchips. — *Figure 1. DGX Quantum system*

At GTC 25, NVIDIA and Quantum Machines announced the first set of DGX Quantum Alpha customers, who will be receiving shipments starting in April. Pioneering researchers at MIT, Fraunhofer IAF, Diraq, Academia Sinica, and Ecole Normale Supérieure de Lyon will be among the first to demonstrate how tightly coupled GPU-QPU systems accelerate quantum computing development.?

Quantum companies such as SEEQC are also working to develop solutions for tightly coupling QPUs to GPUs. SEEQC designed an entirely digital link between their Single Flux Quantum QPU controller and NVIDIA GPUs. By removing key analog to digital roadblocks, the bandwidth requirements for linking QPUs and AI supercomputers is reduced from TB/s to GB/s, eliminating the need for high bandwidth protocols.

At GTC 25, SEEQC announced the first end-to-end workflow using this protocol to enable the decoding of a five-qubit repetition code running on an emulated QPU. Using a GPU-based neural network decoder, the round-trip latency (emulated QPU to GPU and back) was only 6 μs, well within the acceptable range for effective QEC.

Expanding the CUDA-Q QEC toolbox

NVIDIA announced CUDA-Q QEC v 0.2 at GTC 25, including new tools for generating and accelerating decoding of quantum low density parity check (qLDPC) codes.

qLDPC codes are a promising class of QEC codes that more efficiently encode logical units, yet tolerate a relatively high threshold of physical qubit noise. The downside is that qLDPC codes tend to require complex qubit connectivity schemes, which are much harder to decode. This is the motivation for the continual exploration of new qLDPC codes with more favorable properties.

CUDA-Q QEC is now integrated with the Infleqtion library for generating new qLDPC codes and their associated parity check matrices. You can now input these codes directly into CUDA-Q QEC, streamlining sophisticated QEC experiments for assessing each generated code’s merit.

The bottleneck of these experiments remains the decoding step. While efficient decoding algorithms exist for certain subsets of QEC codes, the general qLDPC decoding problem is far too costly to solve in practice.

Good heuristic approaches to decoding, such as Belief Propagation and Order Statistics Decoding (BP+OSD), can decode qLDPC codes with only a cubic scaling of decoding time as the size of an error correcting code is increased.

The BP+OSD decoder works in two stages (Figure 2). The BP part is an iterative procedure that propagates local qubit information and is often sufficient to decode syndromes corresponding to few errors. More complicated syndromes require OSD, which performs matrix factorizations to rank the most likely errors to occur.

A diagram shows that syndromes corresponding to few errors are decoded using only BP, while many errors require one or more rounds of OSD. — *Figure 2. BP+OSD decoder workflow*

BP+OSD is a necessity for evaluating the performance in speed and accuracy of new qLDPC codes, so state-of-the-art decoder implementations are critical for accelerating the assessment of candidate codes.

At GTC 25, NVIDIA announced an accelerated BP+OSD decoder, now available in CUDA-Q QEC v0.2. Tested on the [[144,12,12]] code from High-threshold and low-overhead fault-tolerant quantum memory, the BP+OSD decoder provides an order-of-magnitude speedup, for two different circuit level error probabilities (Figure 3) run on a NVIDIA Grace Hopper Superchip.

More importantly, the NVIDIA implementation decodes the average syndrome on the order of a few milliseconds, much closer to the coherence time of some commercially available QPUs.

A bar chart compares the industry-standard and NVIDIA CUDA-Q QEC implementations of a BP+OSD decoder. The comparison is shown for two circuit noise levels and the NVIDIA decoder is around 30x times faster.? — *Figure 3. NVIDIA CUDA-Q QEC BP+OSD decoder comparison of average decoding latency for 12 rounds of error correction (single syndrome, non-batched)*

Using batched decoding to make more efficient use of the CPUs and GPUs can provide an additional speedup of over 40x for high-throughput scenarios.

Combining the utility of Infleqtion’s code generator and the NVIDIA- accelerated BP+OSD decoder makes CUDA-Q QEC a powerful tool for you to efficiently identify and test new qLDPC codes. This means that you can spend more time achieving QEC breakthroughs, rather than preparing and waiting for experiments.

Making a lot of noise data with CUDA-Q

For noise to be conquered, it must be understood through the collection and analysis of large amounts of data that capture the intricacies of quantum noise. Simulation provides a fast and cheap way to generate this data based on approximate noise models and, in many cases, study systems beyond the reach of experiment today. Simulation is an important complement to experiment and combining both approaches results in powerful yet cost-effective means to study quantum noise.

CUDA-Q version 0.10 solves this problem by introducing the world’s most powerful accelerated noisy statevector and tensor network-based quantum circuit simulators. You can now run multi-GPU, multi-node simulations to generate noise data in a fraction of the time and cost associated with using physical QPUs or even other simulators.

NVIDIA researchers used these capabilities to derive even greater speedups with strategic batching of preselected noisy runs, such that multiple data points can be taken from each combination of noise (Kraus) operators (Figure 4).

Running QuEra’s 35-qubit magic state distillation circuit from Experimental Demonstration of Logical Magic State Distillation, CUDA-Q’s statevector simulator generated 1T shots of noisy data in under 1.2K H100 GPU node hours on the NVIDIA Eos supercomputer.

A diagram shows that a noise model and sampling algorithm combine to make sets of noise operators that are input to the simulator and parallelized on multiple GPUs to produce massive amounts of noisy data. — *Figure 4. Workflow for GPU-accelerated noisy statevector and tensor network simulations using CUDA-Q*

CUDA-Q’s noisy tensor network simulator enabled noise generation at much larger scale to model QuEra’s 85-qubit circuit. Even for such a large circuit, 1M shots of noisy data can be generated with 3.1K H100 GPU node hours.

This simulation of noise is not only fast, but enables you to efficiently generate more realistic data. It’s not constrained to Clifford-only noise but can model the noise profiles of arbitrary quantum operations. Simulation quality is limited only by the underlying noise model previously constructed using insights from experimental data.

CUDA-Q’s noise simulator also provides options to focus data collection on error modes that are within the scope of some user-defined value, such as the most probable errors, errors within some probability range, or errors with certain characteristics.

Another benefit of CUDA-Q’s simulator is that the errors are known for each result, which would otherwise be obfuscated in experiment and most other quantum circuit simulations. This means that it’s easy to create massive amounts of labelled, high-quality synthetic data, which can easily be used for training AI decoders or assessing certain experimental defects by serving as a digital twin for QPU design.

Accelerate your research

NVIDIA is building the tools to make practical QEC a reality. Each tool helps you easily leverage AI supercomputing to catalyze your QEC research and benefit the entire community with breakthroughs towards useful quantum computing.

To get started accelerating your research, install CUDA-Q. For more information about using the BP+OSD Decoder and Infleqtion’s qLDPC code generator, see CUDA-Q QEC. For more information about how NVIDIA is working with partners across the quantum community, see NVIDIA Quantum Computing.