• <xmp id="om0om">
  • <table id="om0om"><noscript id="om0om"></noscript></table>
  • Data Center / Cloud

    NVIDIA Blackwell Ultra for the Era of AI Reasoning

    An image of the NVIDIA Blackwell Ultra system on a black background.

    For years, advancements in AI have followed a clear trajectory through pretraining scaling: larger models, more data, and greater computational resources lead to breakthrough capabilities. In the last 5 years, pretraining scaling has increased compute requirements at an incredible rate of 50M times. However, building more intelligent systems is no longer just about pretraining bigger models. Instead, it’s about refining them and making them think.

    By refining AI models to specialized tasks, post-training scaling improves models to deliver more conversational responses. Tuning models with domain-specific and synthetic data enhances their ability to understand nuanced contexts and provide accurate outputs. Synthetic data generation has no upper limit as available content to teach a model, which translates to a significant need for compute resources in post-training scaling.

    Now, a new scaling law has emerged to amplify intelligence: test-time scaling. 

    Also known as long thinking, test-time scaling dynamically increases compute during AI inference to enable deeper reasoning. AI reasoning models don’t just generate responses in a single pass, they actively think, weigh multiple possibilities, and refine answers in real time. This is moving us closer to true agentic intelligence—AI that can think and act independently to work on more sophisticated tasks and provide more useful answers. 

    This shift towards post-training scaling and test-time scaling demands exponentially more compute, real-time processing, and high-speed interconnects. Post-training can require 30x more compute than pretraining to develop customized derivative models, and long thinking can require 100x more compute than a single inference pass to solve for incredibly complex tasks. 

    Blackwell Ultra: NVIDIA GB300 NVL72

    To meet this demand, NVIDIA introduced Blackwell Ultra, an accelerated computing platform built for the age of AI reasoning, which includes training, post-training, and test-time scaling. Blackwell Ultra is designed for massive-scale AI reasoning inference, delivering smarter, faster, and more efficient AI with optimal TCO. 

    Blackwell Ultra will be at the heart of NVIDIA GB300 NVL72 systems, a liquid-cooled, rack-scale solution connecting 36 NVIDIA Grace CPUs and 72 Blackwell Ultra GPUs in a single 72-GPU NVLink domain that acts as a single massive GPU with a total NVLink bandwidth of 130 TB/s.

    GB300 NVL72vs. GB200 NVL72vs. HGX H100
    FP4 Inference11.4 I 1.1 ExaFLOPS1.5x70x
    HBM Memory20 TB1.5x30x
    Fast Memory40 TB1.3x65x
    Networking Bandwidth14.4 TB/s2x20x
    Table 1. NVIDIA Blackwell Ultra specifications compared to NVIDIA GB200 NVL72 and NVIDIA HGX H100

    1With Sparsity I Without Sparsity

    Blackwell Ultra brings even more AI inference performance for real-time, multi-agent AI system pipelines and long-context reasoning. New Blackwell Ultra Tensor Cores deliver 1.5x more AI compute FLOPS compared to Blackwell GPUs or 70x more AI FLOPS for GB300 NVL72 compared to HGX H100. Blackwell Ultra supports multiple FP4 community formats, which optimizes memory usage for state-of-the-art AI.

    With up to 288 GB of HBM3e memory per GPU and up to 40 TB of high-speed GPU and CPU coherent memory per GB300 NVL72 rack, Blackwell Ultra opens the door to breakthroughs in AI, research, real-time analytics, and more. It provides the large-scale memory needed to serve many large models simultaneously and to deal with a high volume of complex tasks from many concurrent users at one time, improving performance and reducing latency.

    Blackwell Ultra Tensor Cores are also supercharged with 2x the attention-layer acceleration compared to Blackwell for processing massive end-to-end context lengths critical for real-time agentic and reasoning AI applications processing millions of input tokens.

    Optimized large-scale, multi-node inference

    Efficient orchestration and coordination of AI inference requests across large-scale GPU deployments is essential for minimizing operational costs and maximizing token-based revenue generation in AI factories.

    To support these benefits, Blackwell Ultra features a PCIe Gen6 connectivity with NVIDIA ConnectX-8 800G SuperNIC improving available network bandwidth to 800 Gb/s. 

    More network bandwidth means more performance at scale. Take advantage of this with NVIDIA Dynamo, an open-source library to scale up reasoning AI services. Dynamo is a modular inference framework for serving AI models in multi-node environments. It scales inference workloads across GPU nodes and dynamically allocates GPU workers to ease traffic bottlenecks.

    Dynamo also features disaggregated serving, which separates the context (prefill) and generation (decode) phases for large language model (LLM) inference across GPUs to optimize performance, scale more easily, and reduce costs.

    With 800 Gb/s of total data throughput available for each GPU in the system, GB300 NVL72 seamlessly integrates with the NVIDIA Quantum-X800 and NVIDIA Spectrum-X networking platforms, enabling AI factories and cloud data centers to readily handle the demands of the three scaling laws.

    50x more AI factory output

    A chart compares NVIDIA Hopper and Blackwell Ultra, showing a 50x increase in AI factory output. The vertical axis represents tokens per second for 1 Megawatt, while the horizontal axis represents tokens per second for one user. A revenue curve indicates increased output moving from Hopper to Blackwell Ultra.
    Figure 1. A 50x increase in AI factory output with GB300 NVL72 relative to Hopper

    Figure 1 shows two key parameters that determine multiple operating points for maximizing AI factory output. The vertical axis represents throughput tokens per second in a 1-Megawatt (MW) data center, while the horizontal axis quantifies user interactivity responsiveness through tokens per second (TPS) for a single user. 

    AI factories with NVIDIA GB300 NVL72 will deliver a 10x boost in TPS per user and achieve a 5x improvement in TPS per MW compared to Hopper. This combined effect yields a 50x overall potential increase in AI factory output performance.

    Summary 

    Faster AI reasoning with Blackwell Ultra enables real-time insights, more intelligent and responsive chatbots, enhanced predictive analytics, and improved more productive AI agents across industries such as finance, healthcare, and e-commerce. This cutting-edge platform enables organizations to handle larger models and AI reasoning workloads without sacrificing speed, making advanced AI capabilities more accessible and practical for real-world applications.

    NVIDIA Blackwell Ultra products are expected to be available from partners in the second half of 2025 and will be supported by all major cloud service providers and server makers. For more information, see the following resources: 

    Discuss (0)
    +10

    Tags

    人人超碰97caoporen国产