The following samples showcase schedule execution on different hardware platforms:
cpu_siple.stm
) that contains the constraints for CPU runnables defined in the client stm_test_cpu_simple
.gpu_multistream_multiprocess.stm
) that contains the constraints for GPU submitters and submittees defined in the clients stm_test_gpuX
& stm_test_gpuY
.vpu_simple.stm
) that contains the constraints for PVA submitters and submittees defined in the client stm_test_vpu
.cpu_gpu1.stm
and cpu_gpu2.stm
) to demonstrate the schedule switch functionality.The following are the instructions to run sample files:
Make sure that /etc/nvsciipc.cfg
on target contains the entries in
/usr/local/driveworks/targets/aarch64-Linux/config/nvsciipc.cfg
(can append to existing /etc/nvsciipc.cfg
file if they are not present). Ensure that the entries are unique in /etc/nvsciipc.cfg. Please reboot the system after this step. NOTE: Ensure that there no newlines at the end of the file in /etc/nvsciipc.cfg. Run "sudo service nv_nvsciipc_init status" after the reboot. If this command returns an error, please re-check the contents of /etc/nvsciipc.cfg
/proc/sys/fs/mqueue/msg_max
to 4096 (does not persist across reboots).fs.mqueue.msg_max = 4096
and fs.mqueue.queues_max = 1024
to /etc/sysctl.conf
and restart (persists across reboot)/etc/security/limits.conf
\<user\> soft msgqueue unlimited \<user\> hard msgqueue unlimited \<user\> soft rtprio 99 \<user\> hard rtprio 99Allows the <user> (change it to appropriate name) to have unlimited sized mqueue
ps -ef | grep -e framesync -e stm_ | grep -v grep | awk \c '{print $2}' | xargs -rt sudo kill -s KILL || true
sudo rm -rf /dev/shm/* /dev/mqueue/*
export CUDA_VISIBLE_DEVICES=1
export LD_LIBRARY_PATH=/usr/local/driveworks/targets/x86_64-Linux/lib:/usr/local/cuda-11.4/lib:/usr/local/cuda-11.4/lib64:$LD_LIBRARY_PATH
Commands for each sample:
sudo /usr/local/driveworks/bin/stm_master -s /usr/local/driveworks/bin/cpu_simple.stm -l x.log -e 50 & sudo /usr/local/driveworks/bin/stm_test_cpu_simple
sudo /usr/local/driveworks/bin/stm_master -s /usr/local/driveworks/bin/gpu_multistream_multiprocess.stm -l x.log -e 50 & sudo /usr/local/driveworks/bin/stm_test_gpuX & sudo /usr/local/driveworks/bin/stm_test_gpuY
Commands for Schedule Switch Sample
sudo /usr/local/driveworks/bin/stm_master -s /usr/local/driveworks/bin/cpu_gpu1.stm,/usr/local/driveworks/bin/cpu_gpu2.stm -l x.log -e 500 -i 2 -N default
sudo /usr/local/driveworks/bin/stm_sample_manager default -v
sudo /usr/local/driveworks/bin/stm_sample_gpuX & sudo /usr/local/driveworks/bin/stm_sample_gpuY
stm_sample_manager
. Use -v
with stm_sample_manager
for verbose outputs.ps -ef | grep -e framesync -e stm_ | grep -v grep | awk '{print $2}' | xargs -rt sudo kill -s KILL || true
sudo rm -rf /dev/shm/* /dev/mqueue/*
export CUDA_VISIBLE_DEVICES=1
export LD_LIBRARY_PATH=/usr/local/driveworks/targets/aarch64-Linux/lib:/usr/local/cuda-11.4/lib:/usr/local/cuda-11.4/lib64:$LD_LIBRARY_PATH
[For QNX] : export LD_LIBRARY_PATH=/usr/local/driveworks/targets/aarch64-qnx/lib:/usr/local/cuda-11.4/lib:/usr/local/cuda-11.4/lib64:$LD_LIBRARY_PATH
Commands for each sample on the target:
sudo /usr/local/driveworks/bin/stm_master -s /usr/local/driveworks/bin/cpu_simple.stm -l x.log -e 50 & sudo /usr/local/driveworks/bin/stm_test_cpu_simple
sudo /usr/local/driveworks/bin/stm_master -s /usr/local/driveworks/bin/gpu_multistream_multiprocess.stm -l x.log -e 50 & sudo /usr/local/driveworks/bin/stm_test_gpuX & sudo /usr/local/driveworks/bin/stm_test_gpuY
sudo /usr/local/driveworks/bin/stm_master -s /usr/local/driveworks/bin/vpu_simple.stm -l x.log -e 50 & sudo /usr/local/driveworks/bin/stm_test_vpu
# For Linux: echo 0 | sudo tee /sys/kernel/debug/pva0/vpu_app_authentication # Set allowlist value back to 1 after sample runs: echo 1 | sudo tee /sys/kernel/debug/pva0/vpu_app_authentication # For QNX: echo 0 > /dev/nvpvadebugfs/pva0/allowlist_ena # Set allowlist value back to 1 after sample runs: echo 1 > /dev/nvpvadebugfs/pva0/allowlist_ena(Note: The vpu_simple app is only available for PDKs 6.0.4.0+ and requires the presence of cuPVA SDK v2.0.0 libraries)
Commands for Schedule Switch Sample on the target:
sudo /usr/local/driveworks/bin/stm_master -s /usr/local/driveworks/bin/cpu_gpu1.stm,/usr/local/driveworks/bin/cpu_gpu2.stm -l x.log -e 500 -i 2 -N default
sudo /usr/local/driveworks/bin/stm_sample_manager default -v
sudo /usr/local/driveworks/bin/stm_sample_gpuX & sudo /usr/local/driveworks/bin/stm_sample_gpuY
stm_sample_manager
. Use -v
with stm_sample_manager
for verbose outputs./usr/local/driveworks/tools/stmcompiler -i /path/to/input_file.yml -o /path/to/output_file.stm
/usr/local/driveworks/tools/stmvizschedule -i /path/to/input_file.stm -o /path/to/output_file.html
/usr/local/driveworks/tools/stmvizgraph -i /path/to/input_file.yml -o /path/to/output_file.svg
NOTE: STMVizGraph needs GraphViz installed on the system (sudo apt install graphviz)
/usr/local/driveworks/tools/stmanalyze -s /path/to/input_file.stm -l /path/to/log_file -f html
NOTE: The log file is obtained after running the sample binaries above.
cd /usr/local/driveworks/samples/src/stm/src/
STM Compiler Step per sample:
/usr/local/driveworks/tools/stmcompiler -i cpu_simple/cpu_simple.yml -o cpu_simple.stm
/usr/local/driveworks/tools/stmcompiler -i cpu_gpu_simple/gpu_multistream_multiprocess.yml -o gpu_multistream_multiprocess.stm
/usr/local/driveworks/tools/stmcompiler -i vpu_simple/vpu_simple.yml -o vpu_simple.stm
/usr/local/driveworks/tools/stmcompiler -i sample_complete_swap/cpu_gpu1.yml -o cpu_gpu1.stm
/usr/local/driveworks/tools/stmcompiler -i sample_complete_swap/cpu_gpu2.yml -o cpu_gpu2.stm
STM Runtime:
NOTES:
export NV_WORKSPACE=/drive
NV_WORKSPACE
is set to the same value that you used during installation.export NV_WORKSPACE=$HOME/nvidia/nvidia_sdk/DRIVE_OS_*_SDK_Linux_DRIVE_AGX_ORIN_DEVKITS/DRIVEOS
Steps to compile:
cd /usr/local/driveworks/samples/src/stm/src/
mkdir stm-build && cd stm-build
cmake -DCMAKE_BUILD_TYPE:STRING=Release .. \ -DCMAKE_TOOLCHAIN_FILE:FILEPATH=cmake/Toolchain-V5L.cmake \ -DVIBRANTE_PDK:PATH=$NV_WORKSPACE/drive-linux \ -DCUDA_TOOLKIT_ROOT_DIR:PATH=/usr/local/cuda \ -DSTM_BASE_DIR:PATH=/usr/local/driveworks/targets/aarch64-Linux/ \ -DVIBRANTE_PDK_FOUNDATION:PATH=$NV_WORKSPACE/drive-foundation
cmake -DCMAKE_BUILD_TYPE:STRING=Release .. \ -DCMAKE_TOOLCHAIN_FILE:FILEPATH=cmake/Toolchain-V5Q.cmake \ -DVIBRANTE_PDK:PATH=$NV_WORKSPACE/drive-qnx \ -DCUDA_TOOLKIT_ROOT_DIR:PATH=/usr/local/cuda-safe-11.4 \ -DSTM_BASE_DIR:PATH=/usr/local/driveworks/targets/aarch64-QNX/ \ -DVIBRANTE_PDK_FOUNDATION:PATH=$NV_WORKSPACE/drive-foundation
cmake -DCMAKE_BUILD_TYPE=Release .. \ -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda \ -DSTM_BASE_DIR=/usr/local/driveworks/targets/x86_64-Linux/
make install -j \<number of jobs\>
ps -ef | grep -e framesync -e stm_ | grep -v grep | awk '{print $2}' | xargs -rt sudo kill -s KILL || true
sudo rm -rf /dev/shm/* /dev/mqueue/*
export CUDA_VISIBLE_DEVICES=1
export LD_LIBRARY_PATH=/usr/local/driveworks/targets/x86_64- Linux/lib:/usr/local/cuda-11.4/lib:/usr/local/cuda-11.4/lib64:$LD_LIBRARY_PATH
Commands for each sample on x86:
sudo /usr/local/driveworks/bin/stm_master -s /usr/local/driveworks/samples/src/stm/src/cpu_simple.stm -l x.log -e 50 & sudo /usr/local/driveworks/samples/src/stm/src/stm-build/cpu_simple/client/stm_test_cpu_simple
sudo /usr/local/driveworks/bin/stm_master -s /usr/local/driveworks/samples/src/stm/src/gpu_multistream_multiprocess.stm -l x.log -e 50 & sudo /usr/local/driveworks/samples/src/stm/src/stm-build/cpu_gpu_simple/clientX/stm_test_gpuX & sudo /usr/local/driveworks/samples/src/stm/src/stm-build/cpu_gpu_simple/clientY/stm_test_gpuY
Commands for Schedule Switch Sample on x86
sudo /usr/local/driveworks/bin/stm_master -s /usr/local/driveworks/samples/src/stm/src/cpu_gpu1.stm,/usr/local/driveworks/samples/src/stm/src/cpu_gpu2.stm -l x.log -e 500 -i 2 -N default
sudo /usr/local/driveworks/bin/stm_sample_manager default -v
sudo /usr/local/driveworks/bin/stm_sample_gpuX & sudo /usr/local/driveworks/bin/stm_sample_gpuY
stm_sample_manager
. Use -v
with stm_sample_manager
for verbose outputs.NOTE: Rsync the built samples to the equivalent folder in Target
ps -ef | grep -e framesync -e stm_ | grep -v grep | awk '{print $2}' | xargs -rt sudo kill -s KILL || true
sudo rm -rf /dev/shm/* /dev/mqueue/*
export CUDA_VISIBLE_DEVICES=1
export LD_LIBRARY_PATH=/usr/local/driveworks/targets/aarch64-Linux/lib:/usr/local/cuda-11.4/lib:/usr/local/cuda-11.4/lib64:$LD_LIBRARY_PATH
Commands for each sample:
sudo /usr/local/driveworks/bin/stm_master -s /usr/local/driveworks/samples/src/stm/src/cpu_simple.stm -l x.log -e 50 & sudo /usr/local/driveworks/samples/src/stm/src/stm-build/cpu_simple/client/stm_test_cpu_simple
sudo /usr/local/driveworks/bin/stm_master -s /usr/local/driveworks/samples/src/stm/src/gpu_multistream_multiprocess.stm -l x.log -e 50 & sudo /usr/local/driveworks/samples/src/stm/src/stm-build/cpu_gpu_simple/clientX/stm_test_gpuX & sudo /usr/local/driveworks/samples/src/stm/src/stm-build/cpu_gpu_simple/clientY/stm_test_gpuY
Commands for Schedule Switch Sample on target
sudo /usr/local/driveworks/bin/stm_master -s /usr/local/driveworks/samples/src/stm/src/cpu_gpu1.stm,/usr/local/driveworks/samples/src/stm/src/cpu_gpu2.stm -l x.log -e 500 -i 2 -N default
sudo /usr/local/driveworks/bin/stm_sample_manager default -v
sudo /usr/local/driveworks/bin/stm_sample_gpuX & sudo /usr/local/driveworks/bin/stm_sample_gpuY
stm_sample_manager
. Use -v
with stm_sample_manager
for verbose outputs. Use the stmanalyze tool given by STM on x86, to obtain the final performance of the logs produced by these steps.