• <xmp id="om0om">
  • <table id="om0om"><noscript id="om0om"></noscript></table>
  • Networking / Communications

    Choosing a Development Environment for NVIDIA BlueField DPU Applications

    NVIDIA DOCA libraries simplify the development process of BlueField DPU applications

    Step-A 

    Step-B 

    Go get a cup of coffee… 

    Step-C 

    How often have you seen “Go get a coffee” in the instructions? As a developer, I found early on that this pesky quip is the bane of my life. Context switches, no matter the duration, are a high cost to pay in the application development cycle. Of all the steps that require you to step away, waiting for an application to compile is the hardest to shake off. 

    As we all enter the new world of NVIDIA Bluefield DPU application development, it is important to set up the build-step efficiently, to allow you to {code => compile => unit-test} seamlessly. In this post, I go over different ways to compile an application for the DPU. 

    Free range routing with the DOCA dataplane plugin 

    In the DPU application development series, I talked about creating a DOCA dataplane plugin in FRR for offloading policies. FRR’s code count is close to a million lines (789,678 SLOC), which makes it a great candidate for measuring build times.  

    Developing directly on the Bluefield DPU 

    The DPU has an Arm64 architecture and one quick way to get started on DPU applications is to develop directly on the DPU. This test is with an NVIDIA BlueField2 with 8G RAM and 8xCortex-A72 CPUs. 

    I installed the Bluefield boot file (BFB), which provides the Ubuntu 20.04.3 OS image for the DPU. It also includes the libraries for DOCA-1.2 and DPDK-20.11.3. To build an application with the DOCA libraries, I add the DPDK pkgconfig location to the PKG_CONFIG path.

    root@dpu-arm:~# export PKG_CONFIG_PATH=$PKG_CONFIG_PATH:/opt/mellanox/dpdk/lib/aarch64-linux-gnu/pkgconfig 

    Next, I set up my code workspace on the DPU by cloning FRR and switching to the DOCA dataplane plugin branch.

    root@dpu-arm:~/code# git clone https://github.com/AnuradhaKaruppiah/frr.git 
    root@dpu-arm:~/code# cd frr 
    root@dpu-arm:~/code/frr# git checkout dp-doca 

    FRR requires a list of constantly evolving prerequisites that are enumerated in the FRR community docs. With those dependencies installed, I configured FRR to include the DPDK and DOCA dataplane plugins.

    root@dpu-arm:~/code/frr# ./bootstrap.sh?

    root@dpu-arm:~/code/frr# ./configure --build=aarch64-linux-gnu --prefix=/usr --includedir=\${prefix}/include --mandir=\${prefix}/share/man --infodir=\${prefix}/share/info --sysconfdir=/etc --localstatedir=/var --disable-silent-rules --libdir=\${prefix}/lib/aarch64-linux-gnu --libexecdir=\${prefix}/lib/aarch64-linux-gnu --disable-maintainer-mode --disable-dependency-tracking --enable-exampledir=/usr/share/doc/frr/examples/ --localstatedir=/var/run/frr --sbindir=/usr/lib/frr --sysconfdir=/etc/frr --with-vtysh-pager=/usr/bin/pager --libdir=/usr/lib/aarch64-linux-gnu/frr --with-moduledir=/usr/lib/aarch64-linux-gnu/frr/modules "LIBTOOLFLAGS=-rpath /usr/lib/aarch64-linux-gnu/frr" --disable-dependency-tracking --disable-dev-build --enable-systemd=yes --enable-rpki --with-libpam --enable-doc --enable-doc-html --enable-snmp --enable-fpm --disable-zeromq --enable-ospfapi --disable-bgp-vnc --enable-multipath=128 --enable-user=root --enable-group=root --enable-vty-group=root --enable-configfile-mask=0640 --enable-logfile-mask=0640 --disable-address-sanitizer --enable-cumulus=yes --enable-datacenter=yes --enable-bfdd=no --enable-sharpd=yes --enable-dp-doca=yes --enable-dp-dpdk=yes?

    As I used the DPU as my development environment, I built and installed the FRR binaries in place:

    root@dpu-arm:~/code# make –j12 all; make install 

    Here’s how the build times fared. I measured that multiple ways:

    • Time to build and install the binaries using make -j12 all and make install
    • Time to build the same binaries but also assemble them into a Debian package using dpkg-buildpackage –j12 –uc –us 

    The first method is used for coding and unit testing. The second method of generating debs is needed to compare with build times on other external development environments.

    DPU-ARM build Times

    Real  

    User 

    Sys 

    DPU Arm  

    (Complete make) 

    2min 40.529 sec 

    16min 29.855 sec 

    2min 1.534 sec 

    DPU Arm  

    (Debian package) 

    5min 23.067 sec 

    20min 33.614 sec 

    2min 49.628sec 

    Table 1. DPU-Arm build times

    The difference in times is expected. Generating a package involves several additional steps. 

    There are some clear advantages to using the DPU as your development environment.

    • You can code, build and install, and then unit-test without leaving your workspace.
    • You can optimize the build for incremental code changes.

    The last option is usually a massive reduction in build time compared to a complete build. For example, I modified the DOCA dataplane code in FRR and rebuilt with these results:

    root@dpu-arm:~/code/frr# time make –j12 

    >>>>>>>>>>>>> snipped make output >>>>>>>>>>>> 

    real    0m3.119s 

    user   0m2.794s 

    sys     0m0.479s 

    While that may make things easier, it requires reserving a DPU indefinitely for every developer for the sole purpose of application development or maintenance. Your development environment may also require more memory and horsepower, making this a less viable option long-term. 

    Developing on an x86 server 

    My Bluefield2 DPU was hosted by an x86-64 Ubuntu 20.04 server, and I used this server for my development environment.

    root@server1-x86:~# lscpu |grep "CPU(s):\|Model name" 

    CPU(s):               32 

    Model name:    Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz 

    root@server1-x86:~# grep MemTotal /proc/meminfo 

    MemTotal:       131906300 kB 

    In this case, the build-machine is x86 and the host-machine where the app is going to run is DPU-Arm64. There are several ways to do this:

    • Use an Arm emulation on the x86 build-machine. A DOCA development container is available as a part of the DOCA packages.
    • Use a cross-compilation toolchain. 

    In this test, I used the first option as it was the easiest. The second option can give you a different performance but creating that toolchain has its challenges

    I downloaded and loaded the bfb_builder_doca_ubuntu_20.04 container on my x86 server and fired it up.

    root@server1-x86:~# sudo docker load -i bfb_builder_doca_ubuntu_20.04-mlnx-5.4.tar 
    root@server1-x86:~# docker run -v ~/code:/code --privileged -it -e container=dock 
    er doca_v1.11_bluefield_os_ubuntu_20.04-mlnx-5.4:latest 

    The DOCA and DPDK libraries come preinstalled in this container, and I just had to add them to the PKG_CONFIG path.

    root@86b87b0ab0c2:/code # export PKG_CONFIG_PATH=$PKG_CONFIG_PATH:/opt/mellanox/dpdk/lib/aarch64-linux-gnu/pkgconfig 

    I set up the workspace and FRR prerequisites within the container, same as with the previous option.

    root@86b87b0ab0c2:/code # git clone https://github.com/AnuradhaKaruppiah/frr.git 
    root@86b87b0ab0c2:/code # cd frr 
    root@86b87b0ab0c2:/code/frr # git checkout dp-doca 

    I could build my application within this DOCA container, but I couldn’t test it in place. So, the FRR binaries had to be built and packaged into debs, which are then copied over to the Bluefield DPU for testing. I set up the FRR Debian rules to match the FRR build configuration used in the previous option and generated the package:

    root@86b87b0ab0c2:/code/frr # dpkg-buildpackage –j12 –uc -us 

    Table 2 shows how the build time compares with previous methods.

    DPU-Arm & X86 Build Times

    Real  

    User 

    Sys 

    DPU Arm 

    (Complete make) 

    2min 40.529sec 

    16min 29.855sec 

    2min 1.534sec 

    DPU Arm 

    (Debian package) 

    5min 23.067sec 

    20min 33.614sec 

    2min 49.628sec 

    X86 + DOCA dev container 

    (Debian package) 

    24min 19.051sec 

     

    139min 39.286s 

     

    3min 58.081sec 

     

    Table 2. DPU-Arm and X86 build times

    The giant jump in build time surprised me because I have an amply stocked x86 server and no Docker limits. So, it seems throwing CPUs and RAM at a problem doesn’t always help! This performance degradation is because of the cross architecture, as you can see with the next option. 

    Developing in an AWS Graviton instance 

    Next, I tried building my app natively on Arm but this time on an external server with more horsepower. I used an Amazon EC2 Graviton instance for this purpose with specs comparable to my x86 server. 

    • Arm64 arch, Ubuntu 20.04 OS
    • 128G RAM 
    • 32 vCPUs 
    root@ip-172-31-28-243:~#  lscpu |grep "CPU(s):\|Model name" 
    CPU(s):              32 
    Model name:   Neoverse-N1 
    root@ip-172-31-28-243:~# grep MemTotal /proc/meminfo 
    MemTotal:       129051172 kB 

    To set up the DOCA and DPDK libraries in this instance, I installed the DOCA SDK repo meta package.

    root@ip-172-31-28-243:~#  dpkg -i doca-repo-aarch64-ubuntu2004-local_1.1.1-1.5.4.2.4.1.3.bf.3.7.1.11866_arm64.deb 
    root@ip-172-31-28-243:~#  apt update 
    root@ip-172-31-28-243:~# apt install doca-sdk 

    The remaining steps for cloning and building the FRR Debian package are the same as the previous option.  

    Table 3 shows how the build fared on the AWS Arm instance.

    DPU-Arm, X86 & AWS-Arm Build Times

    Real  

    User 

    Sys 

    DPU Arm 

    (Complete make) 

    2min 40.529sec 

    16min 29.855sec 

    2min 1.534sec 

    DPU Arm 

    (Debian package) 

    5min 23.067sec 

    20min 33.614sec 

    2min 49.628sec 

    X86 + DOCA dev container 

    (Generate Debian package) 

    24min 19.051sec 

     

    139min 39.286sec 

     

    3min 58.081sec 

     

    AWS-Arm  

    (Generate Debian package) 

    1min 30.480sec 

     

    6min 6.056sec 

    0min 35.921sec 

     

    Table 3. DPU-Arm, X86 and AWS-Arm build times

     This is a clear winner, no coffee needed.

    Figure 1 shows the compile times in these environments.

    Build times through different development environments
    Figure 1. FRR build times with different options

    Summary 

    In this post, I discussed several development environments for DPU applications:

    • Bluefield DPU 
    • DOCA dev container on an x86 server
    • AWS Graviton compute instance 

    You can prototype your app directly on the DPU, experiment with developing in the x86 DOCA development container, and grab an AWS Graviton instance with DOCA to punch it into hyperspeed! 

    For more information, see the following resources:

    Discuss (0)
    +1

    Tags

    人人超碰97caoporen国产