The following samples showcase schedule execution on different hardware platforms:
cpu_siple.stm
) that contains the constraints for CPU runnables defined in the client stm_test_cpu_simple
.gpu_multistream_multiprocess.stm
) that contains the constraints for GPU submitters and submittees defined in the clients stm_test_gpuX
& stm_test_gpuY
.vpu_simple.stm
) that contains the constraints for PVA submitters and submittees defined in the client stm_test_vpu
.cpu_gpu1.stm
and cpu_gpu2.stm
) to demonstrate the schedule switch functionality.The following are the instructions to run sample files:
Make sure that /etc/nvsciipc.cfg
on target contains the entries in
/usr/local/driveworks/targets/aarch64-Linux/config/nvsciipc.cfg
(can append to existing /etc/nvsciipc.cfg
file if they are not present). Please reboot the system after this step.
/proc/sys/fs/mqueue/msg_max
to 4096 (does not persist across reboots).fs.mqueue.msg_max = 4096
to /etc/sysctl.conf
and restart (persists across reboot)/etc/security/limits.conf
\<user\> soft msgqueue unlimited \<user\> hard msgqueue unlimitedAllows the <user> (change it to appropriate name) to have unlimited sized mqueue
ps -ef | grep -e framesync -e stm_ | grep -v grep | awk \c '{print $2}' | xargs -rt sudo kill -s KILL || true
sudo rm -rf /dev/shm/* /dev/mqueue/*
export CUDA_VISIBLE_DEVICES=1
export LD_LIBRARY_PATH=/usr/local/driveworks/targets/x86_64-Linux/lib:/usr/local/cuda-11.4/lib:/usr/local/cuda-11.4/lib64:$LD_LIBRARY_PATH
Commands for each sample:
sudo /usr/local/driveworks/bin/stm_master -s /usr/local/driveworks/bin/cpu_simple.stm -l x.log -e 50 & sudo /usr/local/driveworks/bin/stm_test_cpu_simple
sudo /usr/local/driveworks/bin/stm_master -s /usr/local/driveworks/bin/gpu_multistream_multiprocess.stm -l x.log -e 50 & sudo /usr/local/driveworks/bin/stm_test_gpuX & sudo /usr/local/driveworks/bin/stm_test_gpuY
Commands for Schedule Switch Sample
sudo /usr/local/driveworks/bin/stm_master -s /usr/local/driveworks/samples/src/stm/src/sample_complete_swap/cpu_gpu1.stm,/usr/local/driveworks/samples/src/stm/src/sample_complete_swap/cpu_gpu2.stm -l x.log -e 500 -i 2 -N default
sudo /usr/local/driveworks/bin/stm_sample_manager default -v
sudo /usr/local/driveworks/bin/stm_sample_gpuX & sudo /usr/local/driveworks/bin/stm_sample_gpuY
ps -ef | grep -e framesync -e stm_ | grep -v grep | awk '{print $2}' | xargs -rt sudo kill -s KILL || true
sudo rm -rf /dev/shm/* /dev/mqueue/*
export CUDA_VISIBLE_DEVICES=1
export LD_LIBRARY_PATH=/usr/local/driveworks/targets/aarch64-Linux/lib:/usr/local/cuda-11.4/lib:/usr/local/cuda-11.4/lib64:$LD_LIBRARY_PATH
Commands for each sample on target:
sudo /usr/local/driveworks/bin/stm_master -s /usr/local/driveworks/bin/cpu_simple.stm -l x.log -e 50 & sudo /usr/local/driveworks/bin/stm_test_cpu_simple
sudo /usr/local/driveworks/bin/stm_master -s /usr/local/driveworks/bin/gpu_multistream_multiprocess.stm -l x.log -e 50 & sudo /usr/local/driveworks/bin/stm_test_gpuX & sudo /usr/local/driveworks/bin/stm_test_gpuY
sudo /usr/local/driveworks/bin/stm_master -s /usr/local/driveworks/bin/vpu_simple.stm -l x.log -e 50 & sudo /usr/local/driveworks/bin/stm_test_vpu
(Note: The vpu_simple app is only available for PDKs 6.0.4.0+ and requires the presence of cuPVA SDK v2.0.0 libraries)
Commands for Schedule Switch Sample on target
sudo /usr/local/driveworks/bin/stm_master -s /usr/local/driveworks/samples/src/stm/src/sample_complete_swap/cpu_gpu1.stm,/usr/local/driveworks/samples/src/stm/src/sample_complete_swap/cpu_gpu2.stm -l x.log -e 500 -i 2 -N default
sudo /usr/local/driveworks/bin/stm_sample_manager default -v
sudo /usr/local/driveworks/bin/stm_sample_gpuX & sudo /usr/local/driveworks/bin/stm_sample_gpuY
stm_sample_manager
. Use -v
with stm_sample_manager
for verbose outputs./usr/local/driveworks/tools/stmcompiler -i /path/to/input_file.yml -o /path/to/output_file.stm
/usr/local/driveworks/tools/stmvizschedule -i /path/to/input_file.stm -o /path/to/output_file.html
/usr/local/driveworks/tools/stmvizgraph -i /path/to/input_file.yml -o /path/to/output_file.svg
NOTE: Needs GraphViz installed on the system (sudo apt install graphviz)
/usr/local/driveworks/tools/stmanalyze -s /path/to/input_file.stm -l /path/to/log_file -f html
NOTE: The log file is obtained after running the sample binaries above.
cd /usr/local/driveworks/samples/src/stm/src/
STM Compiler Step:
/usr/local/driveworks/tools/stmcompiler -i test_cpu_gpu_simple/gpu_multistream_multiprocess.yml -o gpu_multistream_multiprocess.stm
/usr/local/driveworks/tools/stmcompiler -i test_cpu_simple/cpu_simple.yml -o cpu_simple.stm
/usr/local/driveworks/tools/stmcompiler -i test_vpu_simple/vpu_simple.yml -o vpu_simple.stm
STM Runtime Step:
NOTE: For cross compilation, ensure that driveworks_stm_cross.deb is installed
cd /usr/local/driveworks/samples/src/stm/src/
mkdir stm-build & cd stm-build
cmake -DCMAKE_BUILD_TYPE=Release .. - DCMAKE_TOOLCHAIN_FILE=cmake/Toolchain-V5L.cmake - DVIBRANTE_PDK:STRING=/drive/drive-linux - DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda - DSTM_BASE_DIR=/usr/local/driveworks/targets/aarch64-Linux/ - DVIBRANTE_PDK_FOUNDATION:STRING=/drive/drive-foundation
cmake -DCMAKE_BUILD_TYPE=Release .. - DCMAKE_TOOLCHAIN_FILE=cmake/Toolchain-V5Q.cmake - DVIBRANTE_PDK:STRING=/drive/drive-qnx - DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-safe-11.4 - DSTM_BASE_DIR=/usr/local/driveworks/targets/aarch64-QNX/ - DVIBRANTE_PDK_FOUNDATION:STRING=/drive/drive-foundation
cmake -DCMAKE_BUILD_TYPE=Release .. - DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda - DSTM_BASE_DIR=/usr/local/driveworks/targets/x86_64-Linux/
make install -j \<number of jobs\>
ps -ef | grep -e framesync -e stm_ | grep -v grep | awk '{print $2}' | xargs -rt sudo kill -s KILL || true
sudo rm -rf /dev/shm/* /dev/mqueue/*
export CUDA_VISIBLE_DEVICES=1
export LD_LIBRARY_PATH=/usr/local/driveworks/targets/x86_64- Linux/lib:/usr/local/cuda-11.4/lib:/usr/local/cuda-11.4/lib64:$LD_LIBRARY_PATH
Commands for each sample on x86:
sudo /usr/local/driveworks/bin/stm_master -s /usr/local/driveworks/samples/src/stm/src/cpu_simple.stm -l x.log -e 50 & sudo /usr/local/driveworks/samples/src/stm/src/stm-build/test_cpu_simple/client/stm_test_cpu_simple
sudo /usr/local/driveworks/bin/stm_master -s /usr/local/driveworks/samples/src/stm/src/gpu_multistream_multiprocess.stm -l x.log -e 50 & sudo /usr/local/driveworks/samples/src/stm/src/stm-build/test_cpu_gpu_simple/clientX/stm_test_gpuX & sudo /usr/local/driveworks/samples/src/stm/src/stm-build/test_cpu_gpu_simple/clientY/stm_test_gpuY
Commands for Schedule Switch Sample on x86
sudo /usr/local/driveworks/bin/stm_master -s /usr/local/driveworks/samples/src/stm/src/sample_complete_swap/cpu_gpu1.stm,/usr/local/driveworks/samples/src/stm/src/sample_complete_swap/cpu_gpu2.stm -l x.log -e 500 -i 2 -N default
sudo /usr/local/driveworks/bin/stm_sample_manager default -v
sudo /usr/local/driveworks/bin/stm_sample_gpuX & sudo /usr/local/driveworks/bin/stm_sample_gpuY
stm_sample_manager
. Use -v
with stm_sample_manager
for verbose outputs.NOTE: Rsync the built samples to the equivalent folder in Target
ps -ef | grep -e framesync -e stm_ | grep -v grep | awk '{print $2}' | xargs -rt sudo kill -s KILL || true
sudo rm -rf /dev/shm/* /dev/mqueue/*
export CUDA_VISIBLE_DEVICES=1
export LD_LIBRARY_PATH=/usr/local/driveworks/targets/aarch64-Linux/lib:/usr/local/cuda-11.4/lib:/usr/local/cuda-11.4/lib64:$LD_LIBRARY_PATH
Commands for each sample:
sudo /usr/local/driveworks/bin/stm_master -s /usr/local/driveworks/samples/src/stm/src/cpu_simple.stm -l x.log -e 50 & sudo /usr/local/driveworks/samples/src/stm/src/stm-build/test_cpu_simple/client/stm_test_cpu_simple
sudo /usr/local/driveworks/bin/stm_master -s /usr/local/driveworks/samples/src/stm/src/gpu_multistream_multiprocess.stm -l x.log -e 50 & sudo /usr/local/driveworks/samples/src/stm/src/stm-build/test_cpu_gpu_simple/clientX/stm_test_gpuX & sudo /usr/local/driveworks/samples/src/stm/src/stm-build/test_cpu_gpu_simple/clientY/stm_test_gpuY
Commands for Schedule Switch Sample on target
sudo /usr/local/driveworks/bin/stm_master -s /usr/local/driveworks/samples/src/stm/src/sample_complete_swap/cpu_gpu1.stm,/usr/local/driveworks/samples/src/stm/src/sample_complete_swap/cpu_gpu2.stm -l x.log -e 500 -i 2 -N default
sudo /usr/local/driveworks/bin/stm_sample_manager default -v
sudo /usr/local/driveworks/bin/stm_sample_gpuX & sudo /usr/local/driveworks/bin/stm_sample_gpuY
stm_sample_manager
. Use -v
with stm_sample_manager
for verbose outputs. Use the stmanalyze tool given by STM on x86, to obtain the final performance of the logs produced by these steps.