The exponential growth of AI workloads is increasing data center power demands. Traditional 54 V in-rack power distribution, designed for kilowatt (KW)-scale racks, isn’t designed to support the megawatt (MW)-scale racks coming soon to modern AI factories. NVIDIA is leading the transition to 800 V HVDC data center power infrastructure to support 1 MW IT racks and beyond, starting in 2027.
]]>The exponential growth of generative AI, large language models (LLMs), and high-performance computing has created unprecedented demands on data center infrastructure. Traditional server architectures struggle to accommodate the power density, thermal requirements, and rapid iteration cycles of modern accelerated computing. This post explains the benefits of NVIDIA MGX…
]]>In a recent DC Anti-Conference Live presentation, Wade Vinson, chief data center distinguished engineer at NVIDIA, shared insights based upon work by NVIDIA designing, building, and operating NVIDIA DGX SuperPOD multi-megawatt data centers since 2016. NVIDIA is helping make data centers more accessible, resource-efficient, energy-efficient, and business-efficient, as well as scalable to any…
]]>AI has proven to be a force multiplier, helping to create a future where scientists can design entirely new materials, while engineers seamlessly transform these designs into production plans—all without ever setting foot in a lab. As AI continues to redefine the boundaries of innovation, this once elusive vision is now more within reach. Recognizing this paradigm shift…
]]>For the past few months, the NVIDIA Collective Communications Library (NCCL) developers have been working hard on a set of new library features and bug fixes. In this post, we discuss the details of the NCCL 2.22 release and the pain points addressed. NVIDIA Magnum IO NCCL is a library designed to optimize inter-GPU and multi-node communication, crucial for efficient parallel computing…
]]>NVSHMEM is a parallel programming interface that provides efficient and scalable communication for NVIDIA GPU clusters. Part of NVIDIA Magnum IO and based on OpenSHMEM, NVSHMEM creates a global address space for data that spans the memory of multiple GPUs and can be accessed with fine-grained GPU-initiated operations, CPU-initiated operations, and operations on CUDA streams.
]]>What is the interest in trillion-parameter models? We know many of the use cases today and interest is growing due to the promise of an increased capacity for: The benefits are great, but training and deploying large models can be computationally expensive and resource-intensive. Computationally efficient, cost-effective, and energy-efficient systems, architected to deliver real-time…
]]>At AWS re:Invent 2023, AWS and NVIDIA announced that AWS will be the first cloud provider to offer NVIDIA GH200 Grace Hopper Superchips interconnected with NVIDIA NVLink technology through NVIDIA DGX Cloud and running on Amazon Elastic Compute Cloud (Amazon EC2). This is a game-changing technology for cloud computing. The NVIDIA GH200 NVL32, a rack-scale solution within NVIDIA DGX Cloud or an…
]]>Diamond Light Source is a world-renowned synchrotron facility in the UK that provides scientists with access to intense beams of x-rays, infrared, and other forms of light to study materials and biological structures. The facility boasts over 30 experimental stations or beamlines, and is home to some of the most advanced and complex scientific research projects in the world. I08-1…
]]>Computational energy efficiency has become a primary decision criterion for most supercomputing centers. Data centers, once built, are capped in terms of the amount of power they can use without expensive and time-consuming retrofits. Maximizing insight in the form of workload throughput then means maximizing workload per watt. NVIDIA products have, for several generations…
]]>AI and its newest subdomain generative AI are dramatically accelerating the pace of change in scientific computing research. From pharmaceuticals and materials science to astronomy, this game-changing technology is opening up new possibilities and driving progress at an unprecedented rate. In this post, we explore some new and exciting applications of generative AI in science…
]]>NVIDIA T4 was introduced 4 years ago as a universal GPU for use in mainstream servers. T4 GPUs achieved widespread adoption and are now the highest-volume NVIDIA data center GPU. T4 GPUs were deployed into use cases for AI inference, cloud gaming, video, and visual computing. At the NVIDIA GTC 2023 keynote, NVIDIA introduced several inference platforms for AI workloads…
]]>You could make an argument that the history of civilization and technological advancement is the history of the search and discovery of materials. Ages are named not for leaders or civilizations but for the materials that defined them: Stone Age, Bronze Age, and so on. The current digital or information age could be renamed the Silicon or Semiconductor Age and retain the same meaning.
]]>This post was updated April 2023. Scientific instruments are being upgraded to deliver 10–100x more sensitivity and resolution over the next decade, requiring a corresponding scale-up for storage and processing. The data produced from these enhanced instruments will reach limits that Moore’s law cannot adequately address and it will challenge traditional operating models solely based on HPC…
]]>Supercomputers are significant investments. However they are extremely valuable tools for researchers and scientists. To effectively and securely share the computational might of these data centers, NVIDIA introduced the Cloud-Native Supercomputing architecture. It combines bare metal performance, multitenancy, and performance isolation for supercomputing. Magnum IO, the I/
]]>