• <xmp id="om0om">
  • <table id="om0om"><noscript id="om0om"></noscript></table>
  • Simulation / Modeling / Design

    Deploying GPUDirect RDMA on the EGX Stack with the NVIDIA Network Operator

    Edge computing takes place close to the data source to reduce network stress and improve latency. GPUs are an ideal compute engine for edge computing because they are programmable and deliver phenomenal performance per dollar. However, the complexity associated with managing a fleet of edge devices can erode the GPU’s favorable economics.

    In 2019, NVIDIA introduced the GPU Operator to simplify scale-out GPU deployment and management on the EGX stack. From that time, NVIDIA customers have successfully applied GPUs to a wide range of edge AI use cases, and the GPU Operator was featured in reference architectures published by HPE and Dell.

    Graphic of the EGX stack layers: Linux distribution, container runtime, and Kubernetes beneath the Mellanox Network Operator and GPU Operator Pods.
    Figure 1. The NVIDIA Network Operator is installed along the NVIDIA GPU Operator to automate GPUDirect RDMA configuration on the EGX stack.

    Compute performance is only half of the equation. The availability of cheap sensors continues to push data processing demands at the edge. It is not unusual for a single GPU to ingest continuous data streams from dozens of sensors simultaneously. This makes network performance a critical design consideration.

    GPUDirect RDMA is a technology that creates a fast data path between NVIDIA GPUs and RDMA-capable network interfaces. It can deliver line-rate throughput and low latency for network-bound GPU workloads. GPUDirect RDMA technology is featured in NVIDIA ConnectX SmartNICs and BlueField DPUs and plays a key role in realizing the benefits of GPUs at the edge.

    This post introduces the NVIDIA Network Operator. An analogue to the NVIDIA GPU Operator, the Network Operator simplifies scale-out network design for Kubernetes by automating aspects of network deployment and configuration that would otherwise require manual work. It loads the required drivers, libraries, and device plugins on any cluster node with a ConnectX network interface. When installed alongside the GPU Operator, it enables GPUDirect RDMA. In this post, we describe the Network Operator architecture and demonstrate testing GPUDirect RDMA in Kubernetes.

    NVIDIA Network Operator architecture

    The NVIDIA Network Operator leverages Kubernetes custom resources and the Operator framework to configure fast networking, RDMA, and GPUDirect. The Network Operator’s goal is to install the host networking components required to enable RDMA and GPUDirect in a Kubernetes cluster. It does so by configuring a high-speed data path for IO intensive workloads on a secondary network in each cluster node.

    The NVIDIA Network Operator was released as an open source project on GitHub under the Apache 2.0 license. The operator is currently in alpha state. NVIDIA does not support the network operator for production workloads.

    The Network Operator includes the following components:

    • MOFED driver
    • Kubernetes RDMA shared device plugin
    • NVIDIA peer memory driver

    MOFED driver

    The Mellanox OpenFabrics Enterprise Distribution (MOFED) is a set of networking libraries and drivers packaged and tested by NVIDIA networking team. MOFED supports Remote Direct Memory Access (RDMA) over both Infiniband and Ethernet interconnects. The Network Operator deploys a precompiled MOFED driver container onto each Kubernetes host using node labels. The container loads and unloads the MOFED drivers when it is started or stopped.

    Kubernetes RDMA shared device plugin

    The device plugin framework advertises system hardware resources to the Kubelet agent running on a Kubernetes node. The Network Operator deploys a device plugin that advertises RDMA resources to Kubelet and exposes RDMA devices to Pods running on the node. It allows the Pods to perform RDMA operations. All Pods running on the node share access to the same RDMA device files.

    NVIDIA peer memory driver

    The NVIDIA peer memory driver is a client that interacts with the network drivers to provide RDMA between GPUs and host memory. The Network Operator installs the NVIDIA peer memory driver on nodes that have both a ConnectX network controller and an NVIDIA GPU. This driver is also loaded and unloaded automatically when the container is started and stopped.

    A graphic showing host file system mounts from the GPU driver container and the Mellanox OFED driver container. The Mellanox Network Operator remounts both host file system mounts into the NVIDIA peer memory driver container.
    Figure 2. The Network Operator accesses hostPath volume mounts from the NVIDIA GPU driver container and the MOFED driver container to build and link the NVIDIA peer memory driver container.

    The NVIDIA peer memory driver requires headers from both the MOFED and NVIDIA drivers to link against the running kernel. Both the Mellanox network driver container and the NVIDIA driver container in the GPU operator expose the required header files through Kubernetes volume mounts to the host filesystem. Currently, the NVIDIA peer memory driver is delivered through the Network Operator. In the future, the NVIDIA GPU Operator will deliver it. This architecture allows ConnectX network controllers to DMA to non-NVIDIA clients in a standard way.

    Kubernetes DaemonSets ensure that each node runs the appropriate driver containers. The containers are scheduled to nodes based on the node labels and the Pod node selectors in each Pod specification.

    PodnodeSelector
    rdma-device-pluginpci-15b3.present
    ofed-driverpci-15b3.present, kernel-version.full
    nv-peer-mem-driverpci-15b3.present, pci-10de.present
    Table 1. Pod-to-node scheduling based on node selectors.

    The pci-*.present node selectors inform the Kubernetes schedulers that the Pods are required to be scheduled on nodes containing a PCI device with the corresponding vendor PCI ID. 15b3 is the PCI vendor ID for Mellanox and 10de is the vendor ID for NVIDIA. The MOFED driver DaemonSet and RDMA device plugin DaemonSet are scheduled to nodes with Mellanox devices. The nv-peer-mem-driver DaemonSet is scheduled to nodes with both Mellanox and NVIDIA devices.

    The node labels can either be added by hand or set automatically by node feature discovery. Version 0.6.0 or higher of node-feature-discovery is required to discover ConnectX devices and enable RDMA feature support.

    Deploy the Network Operator

    In this section, we describe how to deploy the Network Operator and test GPUDirect RDMA. First, prepare the environment by validating the host and GPU configuration. Next, install the network operator and configure the secondary network interface. After you apply the Kubernetes custom resource that creates the driver containers, you can launch test Pods that run a network performance benchmark to validate GPUDirect RDMA.

    Prepare the environment

    Check the operating system and kernel version on the Kubernetes nodes. You must have Ubuntu 18.04 with the 4.15.0-109-generic kernel.

    $ kubectl get nodes -o wide
    NAME       STATUS   ROLES    AGE   VERSION   INTERNAL-IP    EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION       CONTAINER-RUNTIME
    ubuntu     Ready    master   52d   v1.17.5   10.136.5.105   <none>        Ubuntu 18.04.4 LTS   4.15.0-109-generic   docker://19.3.5
    ubuntu00   Ready    <none>   48d   v1.17.5   10.136.5.14    <none>        Ubuntu 18.04.5 LTS   4.15.0-109-generic   docker://19.3.12

    Verify that the ConnectX network card and NVIDIA GPU are recognized by the operating system kernel. In this example output, the system has four NVIDIA GPUs and a dual-port “Mellanox Technologies” HCA.

    $ lspci | egrep 'Mell|NV'
    12:00.0 3D controller: NVIDIA Corporation Device 1eb8 (rev a1)
    37:00.0 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]
    37:00.1 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]
    86:00.0 3D controller: NVIDIA Corporation Device 1eb8 (rev a1)
    af:00.0 3D controller: NVIDIA Corporation Device 1eb8 (rev a1)
    d8:00.0 3D controller: NVIDIA Corporation Device 1eb8 (rev a1)

    Verify that the nodes are labeled with the PCI vendor IDs for Mellanox and NVIDIA:

    $ kubectl describe nodes  | egrep 'hostname|10de|15b3'
                        feature.node.kubernetes.io/pci-10de.present=true
                        feature.node.kubernetes.io/pci-15b3.present=true
                        kubernetes.io/hostname=ubuntu
                        feature.node.kubernetes.io/pci-10de.present=true
                        feature.node.kubernetes.io/pci-15b3.present=true
                        kubernetes.io/hostname=ubuntu00

    If the Node Feature Discovery operator is not in use, manually label the nodes with the required PCI vendor IDs:

    $ kubectl label nodes ubuntu feature.node.kubernetes.io/pci-15b3.present=true
    node/ubuntu labeled
    
    $ kubectl label nodes ubuntu00 feature.node.kubernetes.io/pci-15b3.present=true
    node/ubuntu00 labeled

    Verify that the GPU operator is installed and running:

    $ helm ls -a
    NAME                      NAMESPACE         REVISION     UPDATED                                      STATUS       CHART               APP VERSION
    gpu-operator-1595438607   gpu-operator 1             2020-07-22 10:23:30.445666838 -0700 PDT   deployed     gpu-operator-1.1.7      1.1.7 
    
    $ kubectl get ds -n gpu-operator-resources
    NAME                                 DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                                      AGE
    nvidia-container-toolkit-daemonset   2         2         2       2            2           feature.node.kubernetes.io/pci-10de.present=true   32d
    nvidia-dcgm-exporter                 2         2         2       2            2           feature.node.kubernetes.io/pci-10de.present=true   32d
    nvidia-device-plugin-daemonset       2         2         2       2            2           feature.node.kubernetes.io/pci-10de.present=true   32d
    nvidia-driver-daemonset              2         2         2       2            2           feature.node.kubernetes.io/pci-10de.present=true   32d

    Install the network operator

    Clone the network-operator git branch tagged v0.1.0:

    $ git clone --branch v0.1.0 --single-branch git@github.com:Mellanox/network-operator.git
    Cloning into 'network-operator'...
    remote: Enumerating objects: 1, done.
    remote: Counting objects: 100% (1/1), done.
    remote: Total 729 (delta 0), reused 0 (delta 0), pack-reused 728
    Receiving objects: 100% (729/729), 195.29 KiB | 1.70 MiB/s, done.
    Resolving deltas: 100% (405/405), done.
    Note: checking out '09088d76ef29e5208673e3d1ec90c787754c1bee'.

    Deploy the network operator with the deploy-operator.sh script.

    $ cd ~/network-operator/example
    
    ~/network-operator/example$ sudo ./deploy-operator.sh
    Deploying Network Operator:
    ###########################
    namespace/mlnx-network-operator created
    customresourcedefinition.apiextensions.k8s.io/nicclusterpolicies.mellanox.com created
    role.rbac.authorization.k8s.io/network-operator created
    clusterrole.rbac.authorization.k8s.io/network-operator created
    serviceaccount/network-operator created
    rolebinding.rbac.authorization.k8s.io/network-operator created
    clusterrolebinding.rbac.authorization.k8s.io/network-operator created
    deployment.apps/network-operator created

    View the resources created by the installer script.

    $ kubectl get all -n mlnx-network-operator
    NAME                                    READY   STATUS    RESTARTS   AGE
    pod/network-operator-7b6846c69f-skms6   1/1     Running   0          14m
    
    NAME                               READY   UP-TO-DATE   AVAILABLE   AGE
    deployment.apps/network-operator   1/1     1            1           14m
    
    NAME                                          DESIRED   CURRENT   READY   AGE
    replicaset.apps/network-operator-7b6846c69f   1         1         1       14m

    Configure the secondary network

    As mentioned earlier, the network operator deploys and configures MOFED drivers and an RDMA shared device plugin. It also installs GPUDirect RDMA drivers on nodes with an NVIDIA GPU. Additional configuration is required before RDMA workloads can run on the Kubernetes cluster:

    • Configuring persistent and predictable network device names.
    • Initializing network devices.
    • Deploying a secondary Kubernetes macvlan network on the RDMA network device.
    • Configuring IP address management (IPAM) for the secondary network.
    • Defining workloads to request the RDMA device and its corresponding secondary network and GPU if needed.

    This environment-specific configuration is not included in the automation.

    Edit the RDMA network custom resource definition to configure the IP address range and MACVLAN network device name. In this example, the secondary network interface is device ens2f0.

    ~/network-operator/example$ egrep 'range|master|mtu' networking/rdma-net-cr-whereabouts-ipam.yml
      # Configuration below assumes 'ens2f0' as master device for macvlan CNI,
                        "range": "192.168.111.0/24",
                    "master": "ens2f0",
                    "mtu": 9000

    Run the deploy-rdma-net-ipam.sh script. This script configures the secondary network and manages IP address assignment.

    ~/network-operator/example$ sudo ./deploy-rdma-net-ipam.sh
    Deploying Secondary Network with Whereabouts IPAM: "rdma-net-ipam" with RDMA resource : "rdma/hca_shared_devices_a"
    #######################################################################################################################
    customresourcedefinition.apiextensions.k8s.io/network-attachment-definitions.k8s.cni.cncf.io created
    clusterrole.rbac.authorization.k8s.io/multus created
    clusterrolebinding.rbac.authorization.k8s.io/multus created
    serviceaccount/multus created
    configmap/multus-cni-config created
    daemonset.apps/kube-multus-ds-amd64 created
    daemonset.apps/kube-multus-ds-ppc64le created
    customresourcedefinition.apiextensions.k8s.io/ippools.whereabouts.cni.cncf.io created
    serviceaccount/whereabouts created
    clusterrolebinding.rbac.authorization.k8s.io/whereabouts created
    clusterrole.rbac.authorization.k8s.io/whereabouts-cni created
    daemonset.apps/whereabouts created
    networkattachmentdefinition.k8s.cni.cncf.io/rdma-net-ipam created

    Set the devicePlugin.config field in the custom resource to match the RDMA netdevice name. In this example, it is ens2f0.

    ~/network-operator/example$ grep devices \
      deploy/crds/mellanox.com_v1alpha1_nicclusterpolicy_cr.yaml
        # Replace 'devices' with your (RDMA-capable) netdevice name.
                "resourceName": "hca_shared_devices_a",
                "devices": ["ens2f0"]

    Apply the NIC cluster policy custom resource definition to specify the driver container versions.

    ~/network-operator/example$ kubectl create -f deploy/crds/mellanox.com_v1alpha1_nicclusterpolicy_cr.yaml
    nicclusterpolicy.mellanox.com/example-nicclusterpolicy created

    Verify that the driver containers are deployed successfully.

    $ kubectl get pods -n mlnx-network-operator
    NAME                                                            READY   STATUS    RESTARTS   AGE
    network-operator-7b6846c69f-tccv8                               1/1     Running   2          51m
    nv-peer-mem-driver-amd64-ubuntu18.04-ds-2qrjh                   1/1     Running   16         33m
    nv-peer-mem-driver-amd64-ubuntu18.04-ds-5k5wb                   1/1     Running   0          33m
    ofed-driver-amd64-ubuntu18.04-kver4.15.0-109-generic-ds-2knvc   1/1     Running   0          75s
    ofed-driver-amd64-ubuntu18.04-kver4.15.0-109-generic-ds-c9xz9   1/1     Running   0          33m
    rdma-shared-dp-ds-4cgr2                                         1/1     Running   0          33m
    rdma-shared-dp-ds-lqmk7                                         1/1     Running   2          33m

    The OFED driver container is precompiled to support a specific MOFED version, operating system, and kernel version combination.

    ComponentVersion
    OFED version5.0-2.1.8.0
    Operating systemubuntu-18.04
    Kernel4.15.0-109-generic
    Table 2. Precompiled support in the OFED driver container.

    Validate that the NVIDIA peer memory drivers are installed on the nodes.

    $ lsmod | grep nv_peer_mem
    nv_peer_mem            16384  0
    ib_core               323584  11 rdma_cm,ib_ipoib,mlx4_ib,nv_peer_mem,iw_cm,ib_umad,rdma_ucm,ib_uverbs,mlx5_ib,ib_cm,ib_ucm
    nvidia              20385792  117 nvidia_uvm,nv_peer_mem,nvidia_modeset

    The NIC cluster policy custom resource is the configuration interface for the network operator. The following code example shows the contents of the custom resource specification.

    apiVersion: mellanox.com/v1alpha1
    kind: NicClusterPolicy
    metadata:
      name: example-nicclusterpolicy
      namespace: mlnx-network-operator
    spec:
      ofedDriver:
        image: ofed-driver
        repository: mellanox
        version: 5.0-2.1.8.0
      devicePlugin:
        image: k8s-rdma-shared-dev-plugin
        repository: mellanox
        version: v1.0.0
        config: |
          {
            "configList": [
              {
                "resourceName": "hca_shared_devices_a",
                "rdmaHcaMax": 1000,
                "devices": ["ens2f0"]
              }
            ]
          }
      nvPeerDriver:
        image: nv-peer-mem-driver
        repository: mellanox
        version: 1.0-9
        gpuDriverSourcePath: /run/nvidia/driver

    This file defines the OFED driver, RDMA shared device plugin, and the NVIDIA peer memory client driver versions. It also defines the RDMA root device name and specifies the number of shared RDMA devices to create at install time.

    Validate the installation

    Create Pods to test shared RDMA on the secondary network.

    ~/network-operator/example$ kubectl create -f rdma-test-pod1.yml
    pod/rdma-test-pod-1 created
     
    ~/network-operator/example$ kubectl create -f rdma-test-pod2.yml
    pod/rdma-test-pod-2 created
     
    ~/network-operator/example$ kubectl get pods
    NAME              READY   STATUS    RESTARTS   AGE
    rdma-test-pod-1   1/1     Running   0          7s
    rdma-test-pod-2   1/1     Running   0          5s

    Find the name of the active network device in the Pod.

    ~/network-operator/example$ kubectl exec -it -n mlnx-network-operator ofed-driver-amd64-ubuntu18.04-kver4.15.0-109-generic-ds-2knvc ibdev2netdev
    mlx5_0 port 1 ==> ens2f0 (Up)
    mlx5_1 port 1 ==> ens2f1 (Down)
    mlx5_2 port 1 ==> eno5 (Down)
    mlx5_3 port 1 ==> eno6 (Down)

    Run the perftest server on Pod 1.

    ~/network-operator/example$ kubectl exec -it rdma-test-pod-1 -- bash
    [root@rdma-test-pod-1 /]# ib_write_bw -d mlx5_0 -a -F --report_gbits -q 1
    ************************************
    * Waiting for client to connect... *
    ************************************

    In a separate terminal, print the network address of the secondary interface on Pod 1.

    ~/network-operator/example$ kubectl exec rdma-test-pod-1 -- ip addr show dev net1
    5: net1@if24: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP group default
        link/ether 62:51:fb:13:88:ce brd ff:ff:ff:ff:ff:ff link-netnsid 0
        inet 192.168.111.1/24 brd 192.168.111.255 scope global net1
           valid_lft forever preferred_lft forever

    Start an interactive terminal on Pod 2 and print the interface on Pod 1.

    ~$ kubectl exec -it rdma-test-pod-2 -- bash
    [root@rdma-test-pod-2 /]# ping -c 3 192.168.111.1
    PING 192.168.111.1 (192.168.111.1) 56(84) bytes of data.
    64 bytes from 192.168.111.1: icmp_seq=1 ttl=64 time=0.293 ms
    64 bytes from 192.168.111.1: icmp_seq=2 ttl=64 time=0.120 ms
    64 bytes from 192.168.111.1: icmp_seq=3 ttl=64 time=0.125 ms
    --- 192.168.111.1 ping statistics ---
    3 packets transmitted, 3 received, 0% packet loss, time 2037ms
    rtt min/avg/max/mdev = 0.120/0.179/0.293/0.081 ms
    Run the perftest write benchmark. This test measures maximum RDMA network throughput between the Pods.
    [root@rdma-test-pod-2 /]# ib_write_bw -d mlx5_0 -a -F --report_gbits -q 1 192.168.111.1
    ---------------------------------------------------------------------------------------
                        RDMA_Write BW Test
     Dual-port       : OFF          Device         : mlx5_0
     Number of qps   : 1            Transport type : IB
     Connection type : RC           Using SRQ      : OFF
     TX depth        : 128
     CQ Moderation   : 100
     Mtu             : 4096[B]
     Link type       : Ethernet
     GID index       : 2
     Max inline data : 0[B]
     rdma_cm QPs : ON
     Data ex. method : rdma_cm
    ---------------------------------------------------------------------------------------
     local address: LID 0000 QPN 0x01fd PSN 0x4ebaee
     GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:111:02
     remote address: LID 0000 QPN 0x01f8 PSN 0xbe97c2
     GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:111:01
    ---------------------------------------------------------------------------------------
     #bytes     #iterations    BW peak[Gb/sec]    BW average[Gb/sec]   MsgRate[Mpps]
     2          5000           0.083670            0.082376            5.148474
     4          5000             0.17               0.17                  5.229830
     8          5000             0.34               0.33                  5.229340
     16         5000             0.68               0.67                  5.213672
     32         5000             1.35               1.34                  5.248994
     64         5000             2.70               2.68                  5.228968
     128        5000             5.41               5.40                  5.275896
     256        5000             10.80              10.73                 5.239736
     512        5000             21.42              21.30                 5.200598
     1024       5000             42.67              42.66                 5.207166
     2048       5000             76.99              76.27                 4.655341
     4096       5000             96.15              90.05                 2.748027
     8192       5000             97.64              97.29                 1.484576
     16384      5000             97.84              97.74                 0.745729
     32768      5000             97.82              97.76                 0.372912
     65536      5000             97.96              97.95                 0.186826
     131072     5000             97.94              97.57                 0.093055
     262144     5000             97.95              97.54                 0.046513
     524288     5000             97.95              97.65                 0.023282
     1048576    5000             98.02              98.01                 0.011684
     2097152    5000             98.03              98.03                 0.005843
     4194304    5000             97.38              97.27                 0.002899
     8388608    5000             98.02              97.64                 0.001455
    -------------------------------------------------------------------------------------

    The benchmark achieved approximately 98 Gbps throughput, close to the maximum line rate of this 100 Gb controller.

    Delete the rdma test Pods.

    $ kubectl delete pod rdma-test-pod-1
    pod "rdma-test-pod-1" deleted
    $ kubectl delete pod rdma-test-pod-2
    pod "rdma-test-pod-2" deleted

    Repeat the same test using the rdma-gpu-test podspecs. These Pods test the GPUdirect RDMA performance between the network card in one system and the GPU in the other.

    ~/network-operator/example$ kubectl create -f rdma-gpu-test-pod1.yml
    pod/rdma-gpu-test-pod-1 created
    
    ~/network-operator/example$ kubectl create -f rdma-gpu-test-pod2.yml
    pod/rdma-gpu-test-pod-2 created

    Start an interactive terminal on one of the Pods.

    $ kubectl exec -it rdma-gpu-test-pod-1 -- bash

    Start the performance benchmark with the --use_cuda option.

    rdma-gpu-test-pod-1:~# ib_write_bw -d mlx5_0 -a -F --report_gbits -q 1 --use_cuda=0
    -------------------------------------------------------------------------------------
    ************************************
    * Waiting for client to connect... *
    ************************************

    Connect to the second Pod and start the performance test.

    $ kubectl exec -it rdma-gpu-test-pod-2 -- bash
    rdma-gpu-test-pod-2:~# ib_write_bw -d mlx5_0 -a -F --report_gbits -q 1 192.168.111.1
                        RDMA_Write BW Test
     Dual-port       : OFF          Device         : mlx5_0
     Number of qps   : 1            Transport type : IB
     Connection type : RC           Using SRQ      : OFF
     PCIe relax order: Unsupported
     ibv_wr* API     : OFF
     TX depth        : 128
     CQ Moderation   : 100
     Mtu             : 4096[B]
     Link type       : Ethernet
     GID index       : 2
     Max inline data : 0[B]
     rdma_cm QPs : OFF
     Data ex. method : Ethernet
    -------------------------------------------------------------------------------------
     local address: LID 0000 QPN 0x020e PSN 0x29b8e1 RKey 0x014e8d VAddr 0x007f4a2f74b000
     GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:111:02
     remote address: LID 0000 QPN 0x020d PSN 0x24bc9e RKey 0x012864 VAddr 0x007fca15800000
     GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:111:01
    -------------------------------------------------------------------------------------
     #bytes     #iterations    BW peak[Gb/sec]    BW average[Gb/sec]   MsgRate[Mpps]
     2          5000           0.076617            0.074692            4.668252
     4          5000             0.16               0.15                  4.781188
     8          5000             0.31               0.31                  4.838186
     16         5000             0.62               0.62                  4.812724
     32         5000             1.25               1.23                  4.817500
     64         5000             2.49               2.42                  4.724497
     128        5000             4.99               4.93                  4.816670
     256        5000             9.99               9.85                  4.811366
     512        5000             19.97              19.56                 4.776414
     1024       5000             39.64              34.57                 4.219684
     2048       5000             75.75              73.78                 4.503022
     4096       5000             81.89              81.70                 2.493145
     8192       5000             80.60              80.58                 1.229511
     16384      5000             82.08              82.08                 0.626200
     32768      5000             82.11              82.11                 0.313206
     65536      5000             82.12              81.95                 0.156307
     131072     5000             82.13              82.12                 0.078317
     262144     5000             82.10              82.10                 0.039147
     524288     5000             82.12              82.10                 0.019575
     1048576    5000             82.12              82.11                 0.009789
     2097152    5000             82.11              82.06                 0.004891
     4194304    5000             82.12              82.06                 0.002446
     8388608    5000             82.12              82.07                 0.001223
    -------------------------------------------------------------------------------------

    The RDMA sample Pods do not enforce NUMA alignment between the GPU, network controller, and the Pod CPU socket. Performance can vary across runs, depending on which resources are presented to the Pod.

    Delete the RDMA GPU test Pods.

    $ kubectl delete pod rdma-gpu-test-pod-1
    pod "rdma-gpu-test-pod-1" deleted
    
    $ kubectl delete pod rdma-gpu-test-pod-2
    pod "rdma-gpu-test-pod-2" deleted

    Conclusion and next steps

    The NVIDIA GPU Operator automates GPU deployment and management on Kubernetes. This post introduced the NVIDIA Network Operator: software that automates the deployment and configuration of the network stack on Kubernetes. When deployed together, they enable GPUDirect RDMA, a fast data path between NVIDIA GPUs and RDMA-capable network interfaces. This is a critical technology enabler for data-intensive edge workloads.

    The Network Operator is still in early stages of development and is not yet supported by NVIDIA. Clone the network-operator GitHub repo and contribute code or raise issues. Directions for future work include improving NUMA alignment between the devices, expanding the device plugin to include SR-IOV, supporting additional operating systems, and configuring GPUDirect RDMA between multiple NIC/GPU pairs.

    Discuss (0)
    +1

    Tags

    人人超碰97caoporen国产