Samples

The following samples showcase schedule execution on different hardware platforms:

CPU Simple sample : This sample executes a schedule (cpu_siple.stm) that contains the constraints for CPU runnables defined in the client stm_test_cpu_simple.
CPU-GPU Simple sample : This sample executes a schedule (gpu_multistream_multiprocess.stm) that contains the constraints for GPU submitters and submittees defined in the clients stm_test_gpuX & stm_test_gpuY.
VPU Simple sample : This sample executes a schedule (vpu_simple.stm) that contains the constraints for PVA submitters and submittees defined in the client stm_test_vpu.
Schedule Switch sample : STM packages a sample schedule manager. This sample schedule manager switches between two schedules (cpu_gpu1.stm and cpu_gpu2.stm) to demonstrate the schedule switch functionality.

The following are the instructions to run sample files:

Prerequisites

Make sure that /etc/nvsciipc.cfg on target contains the entries in

/usr/local/driveworks/targets/aarch64-Linux/config/nvsciipc.cfg

(can append to existing /etc/nvsciipc.cfg file if they are not present). Ensure that the entries are unique in /etc/nvsciipc.cfg. Please reboot the system after this step. NOTE: Ensure that there no newlines at the end of the file in /etc/nvsciipc.cfg. Run "sudo service nv_nvsciipc_init status" after the reboot. If this command returns an error, please re-check the contents of /etc/nvsciipc.cfg

Mqueue length of at least 4096 needs to be supported. On Linux, do either of the following:
- Change the contents of file /proc/sys/fs/mqueue/msg_max to 4096 (does not persist across reboots).
- Add fs.mqueue.msg_max = 4096 and fs.mqueue.queues_max = 1024 to /etc/sysctl.conf and restart (persists across reboot)
Besides the minimum mqueue number of messages, the total mqueue size (ulimit -q) needs to be increased since this build uses larger sized messages. Also, max realtime thread priorities must be updated so that STM has the ability to set correct thread priorities. Either run as sudo or add these lines to /etc/security/limits.conf
```
\<user\> soft msgqueue unlimited
\<user\> hard msgqueue unlimited
\<user\> soft rtprio 99
\<user\> hard rtprio 99
```
Allows the <user> (change it to appropriate name) to have unlimited sized mqueue

Run Sample Binaries

To run the sample binaries directly on x86

ps -ef | grep -e framesync -e stm_ | grep -v grep | awk \c '{print $2}' | xargs -rt sudo kill -s KILL || true
Note: The following command must be run if PDK < 6.0.5.0 only
- sudo rm -rf /dev/shm/* /dev/mqueue/*
export CUDA_VISIBLE_DEVICES=1
export LD_LIBRARY_PATH=/usr/local/driveworks/targets/x86_64-Linux/lib:/usr/local/cuda-11.4/lib:/usr/local/cuda-11.4/lib64:$LD_LIBRARY_PATH

Commands for each sample:

cpu_simple: sudo /usr/local/driveworks/bin/stm_master -s /usr/local/driveworks/bin/cpu_simple.stm -l x.log -e 50 & sudo /usr/local/driveworks/bin/stm_test_cpu_simple
cpu_gpu_simple: sudo /usr/local/driveworks/bin/stm_master -s /usr/local/driveworks/bin/gpu_multistream_multiprocess.stm -l x.log -e 50 & sudo /usr/local/driveworks/bin/stm_test_gpuX & sudo /usr/local/driveworks/bin/stm_test_gpuY

Commands for Schedule Switch Sample

Execute the following commands in order and in different terminals to view the schedule switch:
- Run the stm_master along with list of schedules:
  - sudo /usr/local/driveworks/bin/stm_master -s /usr/local/driveworks/bin/cpu_gpu1.stm,/usr/local/driveworks/bin/cpu_gpu2.stm -l x.log -e 500 -i 2 -N default
- Run the test schedule manager binary:
  - sudo /usr/local/driveworks/bin/stm_sample_manager default -v
- Run client binaries
  - sudo /usr/local/driveworks/bin/stm_sample_gpuX & sudo /usr/local/driveworks/bin/stm_sample_gpuY
Each cycle of execution has 1 schedule switch (one switch between the two schedules passed as input to stm_master) and by default the schedules will switch with a time period of 1000 milliseconds. There should be 10 cycles of execution for the above commands.The schedule switches can be seen in the logs of stm_sample_manager. Use -v with stm_sample_manager for verbose outputs.

To run the sample binaries directly on the target

ps -ef | grep -e framesync -e stm_ | grep -v grep | awk '{print $2}' | xargs -rt sudo kill -s KILL || true
Note: The following command must be run if PDK < 6.0.5.0 only
- sudo rm -rf /dev/shm/* /dev/mqueue/*
export CUDA_VISIBLE_DEVICES=1
[For Linux] : export LD_LIBRARY_PATH=/usr/local/driveworks/targets/aarch64-Linux/lib:/usr/local/cuda-11.4/lib:/usr/local/cuda-11.4/lib64:$LD_LIBRARY_PATH [For QNX] : export LD_LIBRARY_PATH=/usr/local/driveworks/targets/aarch64-qnx/lib:/usr/local/cuda-11.4/lib:/usr/local/cuda-11.4/lib64:$LD_LIBRARY_PATH

Commands for each sample on the target:

CPU Simple: sudo /usr/local/driveworks/bin/stm_master -s /usr/local/driveworks/bin/cpu_simple.stm -l x.log -e 50 & sudo /usr/local/driveworks/bin/stm_test_cpu_simple
CPU-GPU Simple: sudo /usr/local/driveworks/bin/stm_master -s /usr/local/driveworks/bin/gpu_multistream_multiprocess.stm -l x.log -e 50 & sudo /usr/local/driveworks/bin/stm_test_gpuX & sudo /usr/local/driveworks/bin/stm_test_gpuY

VPU Simple:

sudo /usr/local/driveworks/bin/stm_master -s /usr/local/driveworks/bin/vpu_simple.stm -l x.log -e 50 & sudo /usr/local/driveworks/bin/stm_test_vpu

To run the vpu_simple sample, you must edit the PVA allowlist as follows:

# For Linux:
echo 0 | sudo tee /sys/kernel/debug/pva0/vpu_app_authentication

# Set allowlist value back to 1 after sample runs:
echo 1 | sudo tee /sys/kernel/debug/pva0/vpu_app_authentication

# For QNX:
echo 0 > /dev/nvpvadebugfs/pva0/allowlist_ena

# Set allowlist value back to 1 after sample runs:
echo 1 > /dev/nvpvadebugfs/pva0/allowlist_ena

(Note: The vpu_simple app is only available for PDKs 6.0.4.0+ and requires the presence of cuPVA SDK v2.0.0 libraries)

Commands for Schedule Switch Sample on the target:

Execute the following commands in order and in different terminals to view the schedule switch: Execute the following commands in order and in different terminals:
- Run the stm_master along with list of schedules :
  - sudo /usr/local/driveworks/bin/stm_master -s /usr/local/driveworks/bin/cpu_gpu1.stm,/usr/local/driveworks/bin/cpu_gpu2.stm -l x.log -e 500 -i 2 -N default
- Run the test schedule manager binary:
  - sudo /usr/local/driveworks/bin/stm_sample_manager default -v
- Run client binaries
  - sudo /usr/local/driveworks/bin/stm_sample_gpuX & sudo /usr/local/driveworks/bin/stm_sample_gpuY
Each cycle of execution has 1 schedule switch (one switch between the two schedules passed as input to stm_master) and by default the schedules will switch with a time period of 1000 milliseconds. There should be 10 cycles of execution for the above commands. The schedule switches can be seen in the logs of stm_sample_manager. Use -v with stm_sample_manager for verbose outputs.

To use the tools given by STM on x86

STMCompiler : /usr/local/driveworks/tools/stmcompiler -i /path/to/input_file.yml -o /path/to/output_file.stm
STMVizschedule : /usr/local/driveworks/tools/stmvizschedule -i /path/to/input_file.stm -o /path/to/output_file.html
STMVizGraph : /usr/local/driveworks/tools/stmvizgraph -i /path/to/input_file.yml -o /path/to/output_file.svg

NOTE: STMVizGraph needs GraphViz installed on the system (sudo apt install graphviz)

STM Analytics: /usr/local/driveworks/tools/stmanalyze -s /path/to/input_file.stm -l /path/to/log_file -f html

NOTE: The log file is obtained after running the sample binaries above.

To compile and run samples from src

cd /usr/local/driveworks/samples/src/stm/src/

STM Compiler Step per sample:

CPU Simple: /usr/local/driveworks/tools/stmcompiler -i cpu_simple/cpu_simple.yml -o cpu_simple.stm
CPU-GPU Simple: /usr/local/driveworks/tools/stmcompiler -i cpu_gpu_simple/gpu_multistream_multiprocess.yml -o gpu_multistream_multiprocess.stm
VPU Simple: /usr/local/driveworks/tools/stmcompiler -i vpu_simple/vpu_simple.yml -o vpu_simple.stm
Schedule Switch: /usr/local/driveworks/tools/stmcompiler -i sample_complete_swap/cpu_gpu1.yml -o cpu_gpu1.stm /usr/local/driveworks/tools/stmcompiler -i sample_complete_swap/cpu_gpu2.yml -o cpu_gpu2.stm

STM Runtime:

NOTES:

For cross compilation, ensure that driveworks_stm_cross.deb is installed
If you are using the NVIDIA DRIVE OS NGC Docker Container for NVONLINE:
```
export NV_WORKSPACE=/drive
```
If you installed STM and DRIVE OS using the NVIDIA DRIVE OS Debian Package Repository for NVONLINE ensure the environment variable NV_WORKSPACE is set to the same value that you used during installation.

If you installed STM and DRIVE OS using NVIDIA SDK Manager:

export NV_WORKSPACE=$HOME/nvidia/nvidia_sdk/DRIVE_OS_*_SDK_Linux_DRIVE_AGX_ORIN_DEVKITS/DRIVEOS

Adjust the path as appropriate if you chose a non-default Target HW Image Folder in the NVIDIA SDK Manager GUI.

Steps to compile:

cd /usr/local/driveworks/samples/src/stm/src/
mkdir stm-build && cd stm-build

To cross-compile for aarch64-Linux:

cmake -DCMAKE_BUILD_TYPE:STRING=Release .. \
-DCMAKE_TOOLCHAIN_FILE:FILEPATH=cmake/Toolchain-V5L.cmake \
-DVIBRANTE_PDK:PATH=$NV_WORKSPACE/drive-linux \
-DCUDA_TOOLKIT_ROOT_DIR:PATH=/usr/local/cuda \
-DSTM_BASE_DIR:PATH=/usr/local/driveworks/targets/aarch64-Linux/ \
-DVIBRANTE_PDK_FOUNDATION:PATH=$NV_WORKSPACE/drive-foundation

To cross-compile for QNX:

cmake -DCMAKE_BUILD_TYPE:STRING=Release .. \
-DCMAKE_TOOLCHAIN_FILE:FILEPATH=cmake/Toolchain-V5Q.cmake \
-DVIBRANTE_PDK:PATH=$NV_WORKSPACE/drive-qnx \
-DCUDA_TOOLKIT_ROOT_DIR:PATH=/usr/local/cuda-safe-11.4 \
-DSTM_BASE_DIR:PATH=/usr/local/driveworks/targets/aarch64-QNX/ \
-DVIBRANTE_PDK_FOUNDATION:PATH=$NV_WORKSPACE/drive-foundation

For x86:

cmake -DCMAKE_BUILD_TYPE=Release .. \
-DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda \
-DSTM_BASE_DIR=/usr/local/driveworks/targets/x86_64-Linux/

make install -j \<number of jobs\>

To run the built samples on x86

ps -ef | grep -e framesync -e stm_ | grep -v grep | awk '{print $2}' | xargs -rt sudo kill -s KILL || true
Note: The following command must be run if PDK < 6.0.5.0 only
- sudo rm -rf /dev/shm/* /dev/mqueue/*
export CUDA_VISIBLE_DEVICES=1
export LD_LIBRARY_PATH=/usr/local/driveworks/targets/x86_64- Linux/lib:/usr/local/cuda-11.4/lib:/usr/local/cuda-11.4/lib64:$LD_LIBRARY_PATH

Commands for each sample on x86:

CPU Simple: sudo /usr/local/driveworks/bin/stm_master -s /usr/local/driveworks/samples/src/stm/src/cpu_simple.stm -l x.log -e 50 & sudo /usr/local/driveworks/samples/src/stm/src/stm-build/cpu_simple/client/stm_test_cpu_simple
CPU-GPU Simple: sudo /usr/local/driveworks/bin/stm_master -s /usr/local/driveworks/samples/src/stm/src/gpu_multistream_multiprocess.stm -l x.log -e 50 & sudo /usr/local/driveworks/samples/src/stm/src/stm-build/cpu_gpu_simple/clientX/stm_test_gpuX & sudo /usr/local/driveworks/samples/src/stm/src/stm-build/cpu_gpu_simple/clientY/stm_test_gpuY

Commands for Schedule Switch Sample on x86

Execute the following commands in order and in different terminals to view the schedule switch::
- Run the stm_master along with list of schedules :
  - sudo /usr/local/driveworks/bin/stm_master -s /usr/local/driveworks/samples/src/stm/src/cpu_gpu1.stm,/usr/local/driveworks/samples/src/stm/src/cpu_gpu2.stm -l x.log -e 500 -i 2 -N default
- Run the test schedule manager binary:
  - sudo /usr/local/driveworks/bin/stm_sample_manager default -v
- Run client binaries
  - sudo /usr/local/driveworks/bin/stm_sample_gpuX & sudo /usr/local/driveworks/bin/stm_sample_gpuY
Each cycle of execution has 1 schedule switch (one switch between the two schedules passed as input to stm_master) and by default the schedules will switch with a time period of 1000 milliseconds. There should be 10 cycles of execution for the above commands.The schedule switches can be seen in the logs of stm_sample_manager. Use -v with stm_sample_manager for verbose outputs.

To run the built samples on the Target

NOTE: Rsync the built samples to the equivalent folder in Target

ps -ef | grep -e framesync -e stm_ | grep -v grep | awk '{print $2}' | xargs -rt sudo kill -s KILL || true
Note: The following command must be run if PDK < 6.0.5.0 only
- sudo rm -rf /dev/shm/* /dev/mqueue/*
export CUDA_VISIBLE_DEVICES=1
export LD_LIBRARY_PATH=/usr/local/driveworks/targets/aarch64-Linux/lib:/usr/local/cuda-11.4/lib:/usr/local/cuda-11.4/lib64:$LD_LIBRARY_PATH

Commands for each sample:

CPU Simple: sudo /usr/local/driveworks/bin/stm_master -s /usr/local/driveworks/samples/src/stm/src/cpu_simple.stm -l x.log -e 50 & sudo /usr/local/driveworks/samples/src/stm/src/stm-build/cpu_simple/client/stm_test_cpu_simple
CPU-GPU Simple: sudo /usr/local/driveworks/bin/stm_master -s /usr/local/driveworks/samples/src/stm/src/gpu_multistream_multiprocess.stm -l x.log -e 50 & sudo /usr/local/driveworks/samples/src/stm/src/stm-build/cpu_gpu_simple/clientX/stm_test_gpuX & sudo /usr/local/driveworks/samples/src/stm/src/stm-build/cpu_gpu_simple/clientY/stm_test_gpuY

Commands for Schedule Switch Sample on target

Execute the following commands in order and in different terminals to view the schedule switch::
- Run the stm_master along with list of schedules:
  - sudo /usr/local/driveworks/bin/stm_master -s /usr/local/driveworks/samples/src/stm/src/cpu_gpu1.stm,/usr/local/driveworks/samples/src/stm/src/cpu_gpu2.stm -l x.log -e 500 -i 2 -N default
- Run the test schedule manager binary:
  - sudo /usr/local/driveworks/bin/stm_sample_manager default -v
- Run client binaries
  - sudo /usr/local/driveworks/bin/stm_sample_gpuX & sudo /usr/local/driveworks/bin/stm_sample_gpuY
Each cycle of execution has 1 schedule switch (one switch between the two schedules passed as input to stm_master) and by default the schedules will switch with a time period of 1000 milliseconds. There should be 10 cycles of execution for the above commands.The schedule switches can be seen in the logs of stm_sample_manager. Use -v with stm_sample_manager for verbose outputs. Use the stmanalyze tool given by STM on x86, to obtain the final performance of the logs produced by these steps.