The HPC SDK v25.3 release includes support for NVIDIA Blackwell GPUs and an optimized allocator for Arm CPUs.
]]>NVIDIA AI Workbench 2025.03.10 features streamlined onboarding and enhanced UX for multicontainer projects.
]]>The NVIDIA Collective Communications Library (NCCL) implements multi-GPU and multinode (MGMN) communication primitives optimized for NVIDIA GPUs and networking. NCCL is a central piece of software for multi-GPU deep learning training. It handles any kind of inter-GPU communication, be it over PCI, NVLink, or networking. It uses advanced topology detection, optimized communication graphs��
]]>NVIDIA cuDSS is a first-generation sparse direct solver library designed to accelerate engineering and scientific computing. cuDSS is increasingly adopted in data centers and other environments and supports single-GPU, multi-GPU and multi-node (MGMN) configurations. cuDSS has become a key tool for accelerating computer-aided engineering (CAE) workflows and scientific computations across��
]]>NVIDIA AI Workbench is a free development environment manager to develop, customize, and prototype AI applications on your GPUs. AI Workbench provides a frictionless experience across PCs, workstations, servers, and cloud for AI, data science, and machine learning (ML) projects. The user experience includes: This post provides details about the January 2025 release of NVIDIA AI Workbench��
]]>The NVIDIA Collective Communications Library (NCCL) implements multi-GPU and multinode communication primitives optimized for NVIDIA GPUs and networking. NCCL is a central piece of software for multi-GPU deep learning training. It handles any kind of inter-GPU communication, be it over PCI, NVLink, or networking. It uses advanced topology detection, optimized communication graphs��
]]>Bringing support for NVIDIA Blackwell architecture across data center and GeForce products, NVIDIA cuDNN 9.7 delivers speedups of up to 84% for FP8 Flash Attention operations and optimized GEMM capabilities with advanced fusion support to accelerate deep learning workloads.
]]>The latest release of the CUDA Toolkit, version 12.8, continues to push accelerated computing performance in data sciences, AI, scientific computing, and computer graphics and simulation, using the latest NVIDIA CPUs and GPUs. This post highlights some of the new features and enhancements included with this release: CUDA Toolkit 12.8 is the first version of the Toolkit to support��
]]>RAPIDS 24.12 introduces cuDF packages to PyPI, speeds up aggregations and reading files from AWS S3, enables larger-than-GPU memory queries in the Polars GPU engine, and faster graph neural network (GNN) training on real-world graphs. Starting with the 24.12 release of RAPIDS, CUDA 12 builds of , , , and all of their dependencies are now available on PyPI. As a result��
]]>NVIDIA JetPack has continuously evolved to offer cutting-edge software tailored to the growing needs of edge AI and robotic developers. With each release, JetPack has enhanced its performance, introduced new features, and optimized existing tools to deliver increased value to its users. This means that your existing Jetson Orin-based products experience performance optimizations by upgrading to��
]]>NVIDIA JetPack SDK powers NVIDIA Jetson modules, offering a comprehensive solution for building end-to-end accelerated AI applications. JetPack 6 expands the Jetson platform��s flexibility and scalability with microservices and a host of new features. It��s the most downloaded version of JetPack in 2024. With the JetPack 6.0 production release now generally available��
]]>NVIDIA HPC SDK 24.5 updates include support for new NVPL components and CUDA 12.4.
]]>Nsight Compute 2024.2 adds Python syntax highlighting and call stacks, a redesigned report header, and source page statistics to make CUDA optimization easier.
]]>CUDA Toolkit 12.5 supports new NVIDIA L20 and H20 GPUs and simultaneous compute and graphics to DirectX, and updates Nsight Compute and CUDA-X Libraries.
]]>PhysicsNeMo v24.04 delivers an optimized CorrDiff model and Earth2Studio for exploring weather AI models.
]]>NVIDIA AI Workbench is now in beta, bringing a wealth of new features to streamline how enterprise developers create, use, and share AI and machine learning (ML) projects. Announced at SIGGRAPH 2023, NVIDIA AI Workbench enables developers to create, collaborate, and migrate AI workloads on their GPU-enabled environment of choice. To learn more, see Develop and Deploy Scalable Generative AI Models��
]]>cuBLASDx allows you to perform BLAS calculations inside your CUDA kernel, improving the performance of your application. Available to download in Preview now.
]]>This NVIDIA HPC SDK 23.9 update expands platform support and provides minor updates.
]]>NVIDIA PhysicsNeMo 23.09 is now available, providing ease-of-use updates, fixes, and other enhancements.
]]>Explore the latest streaming analytics features and advancements with this new release.
]]>NVIDIA HPC SDK version 23.7 is now available and provides minor updates and enhancements.
]]>The latest release of CUDA Toolkit 12.2 introduces a range of essential new features, modifications to the programming model, and enhanced support for hardware capabilities accelerating CUDA applications. Now out through general availability from NVIDIA, CUDA Toolkit 12.2 includes many new capabilities, both major and minor. The following post offers an overview of many of the key��
]]>