• <xmp id="om0om">
  • <table id="om0om"><noscript id="om0om"></noscript></table>
  • Simulation / Modeling / Design

    Programming Distributed Multi-GPU Tensor Operations with cuTENSOR v1.4

    Today, NVIDIA is announcing the availability of cuTENSOR, version 1.4, which supports up to 64-dimensional tensors, distributed multi-GPU tensor operations, and helps improve tensor contraction performance models. This software can be downloaded now free of charge.

    Download the cuTENSOR software.

    What’s New?

    • Supports up to 64-dimensional tensors.
    • Supports distributed, multi-GPU tensor operations.
    • Improved tensor contraction performance model (i.e., algo CUTENSOR_ALGO_DEFAULT).
    • Improved performance for tensor contraction that have an overall large contracted dimension (i.e., a parallel reduction was added).
    • Improved performance for tensor contraction that have a tiny contracted dimension (<= 8).
    • Improved performance for outer-product-like tensor contractions (e.g., C[a,b,c,d] = A[b,d] * B[a,c]).
    • Additional bug fixes.

    For more information, see the cuTENSOR Release Notes.

    About cuTENSOR

    cuTENSOR is a high-performance CUDA library for tensor primitives; its key features include:

    • Extensive mixed-precision support:
      • FP64 inputs with FP32 compute.
      • FP32 inputs with FP16, BF16, or TF32 compute.
      • Complex-times-real operations.
      • Conjugate (without transpose) support.

    Learn more

    Recent Developer posts

    Discuss (0)
    0

    Tags

    人人超碰97caoporen国产