Accelerated computing is enabling giant leaps in performance and energy efficiency compared to traditional CPU computing. Delivering these advancements requires full-stack innovation at data-center scale, spanning chips, systems, networking, software, and algorithms. Choosing the right architecture for the right workload with the best energy efficiency is critical to maximizing the performance and��
]]>As models grow larger and are trained on more data, they become more capable, making them more useful. To train these models quickly, more performance, delivered at data center scale, is required. The NVIDIA Blackwell platform, launched at GTC 2024 and now in full production, integrates seven types of chips: GPU, CPU, DPU, NVLink Switch chip, InfiniBand Switch, and Ethernet Switch.
]]>AI and scientific computing applications are great examples of distributed computing problems. The problems are too large and the computations too intensive to run on a single machine. These computations are broken down into parallel tasks that are distributed across thousands of compute engines, such as CPUs and GPUs. To achieve scalable performance, the system relies on dividing workloads��
]]>In the era of generative AI, accelerated networking is essential to build high-performance computing fabrics for massively distributed AI workloads. NVIDIA continues to lead in this space, offering state-of-the-art Ethernet and InfiniBand solutions that maximize the performance and efficiency of AI factories and cloud data centers. At the core of these solutions are NVIDIA SuperNICs��a new��
]]>NVSHMEM is a parallel programming interface that provides efficient and scalable communication for NVIDIA GPU clusters. Part of NVIDIA Magnum IO and based on OpenSHMEM, NVSHMEM creates a global address space for data that spans the memory of multiple GPUs and can be accessed with fine-grained GPU-initiated operations, CPU-initiated operations, and operations on CUDA streams.
]]>A common technological misconception is that performance and complexity are directly linked. That is, the highest-performance implementation is also the most challenging to implement and manage. When considering data center networking, however, this is not the case. InfiniBand is a protocol that sounds daunting and exotic in comparison to Ethernet, but because it is built from the ground up��
]]>The world of computing is on the precipice of a seismic shift. The demand for computing power, particularly in high-performance computing (HPC), is growing year over year, which in turn means so too is energy consumption. However, the underlying issue is, of course, that energy is a resource with limitations. So, the world is faced with the question of how we can best shift our computational��
]]>Generative AI is rapidly transforming computing, unlocking new use cases and turbocharging existing ones. Large language models (LLMs), such as OpenAI��s GPT models and Meta��s Llama 2, skillfully perform a variety of tasks on text-based content. These tasks include summarization, translation, classification, and generation of new content such as computer code, marketing copy, poetry, and much more.
]]>Traditional cloud data centers have served as the bedrock of computing infrastructure for over a decade, catering to a diverse range of users and applications. However, data centers have evolved in recent years to keep up with advancements in technology and the surging demand for AI-driven computing. This post explores the pivotal role that networking plays in shaping the future of data centers��
]]>In MLPerf Inference v3.0, NVIDIA made its first submissions to the newly introduced Network division, which is now part of the MLPerf Inference Datacenter suite. The Network division is designed to simulate a real data center setup and strives to include the effect of networking��including both hardware and software��in end-to-end inference performance. In the Network division��
]]>We all know that AI is changing the world. For network admins, AI can improve day-to-day operations in some amazing ways: However, AI is no replacement for the know-how of an experienced network admin. AI is meant to augment your capabilities, like a virtual assistant. So, AI may become your best friend, but generative AI is also a new data center workload that brings a new paradigm��
]]>Recent years have seen a proliferation of large language models (LLMs) that extend beyond traditional language tasks to generative AI. This includes models like ChatGPT and Stable Diffusion. As this generative AI focus continues to grow, there is a rising need for a modern machine learning (ML) infrastructure that makes scalability accessible to the everyday practitioner.
]]>The most exciting computing applications currently rely on training and running inference on complex AI models, often in demanding, real-time deployment scenarios. High-performance, accelerated AI platforms are needed to meet the demands of these applications and deliver the best user experiences. New AI models are constantly being invented to enable new capabilities��
]]>Data centers can be optimized by updating key network architectures in two ways: through networking technologies or operational efficiency in NetDevOps. In this post, we identify and evaluate technologies that you can apply to your network architecture to optimize your network. We address five updates that you should consider for improving your data center: VXLAN is an overlay��
]]>The latest update to NVIDIA Nsight Systems��a performance analysis tool��is now available for download. Designed to help you tune and scale software across CPUs and GPUs, this release introduces several improvements aimed to enhance the profiling experience. Nsight Systems is part of the powerful debugging and profiling NVIDIA Nsight Tools Suite. You can start with Nsight Systems for an overall��
]]>Supercomputers are significant investments. However they are extremely valuable tools for researchers and scientists. To effectively and securely share the computational might of these data centers, NVIDIA introduced the Cloud-Native Supercomputing architecture. It combines bare metal performance, multitenancy, and performance isolation for supercomputing. Magnum IO, the I/
]]>Today��s data centers host many users and a wide variety of applications. They have even become the key element of competitive advantage for research, technology, and global industries. With the increased complexity of scientific computing, data center operational costs also continue to rise. In addition to the operational disruption of security threats, keeping a data center intact and running��
]]>This is the third post in the Accelerating IO series, which has the goal of describing the architecture, components, and benefits of Magnum IO, the IO subsystem of the modern data center. The first post in this series introduced the Magnum IO architecture; positioned it in the broader context of CUDA, CUDA-X, and vertical application domains; and listed the four major components of the��
]]>NVIDIA Collective Communications Library (NCCL) provides optimized implementation of inter-GPU communication operations, such as allreduce and variants. Developers using deep learning frameworks can rely on NCCL��s highly optimized, MPI compatible and topology aware routines, to take full advantage of all available GPUs within and across multiple nodes. NCCL is optimized for high bandwidth and��
]]>