NvSciStream Performance Test Application
NvSciStream provides a test application to measure KPIs when streaming buffers between a CPU producer and CPU consumers. This test focuses on NvSciStream performance, which does not use CUDA, NvMedia, or other hardware engines. To simplify measuring packet-delivery latency for each payload, the stream uses FIFO mode.
This test application is for performance testing purposes. It may simplify some setup steps and set unnecessary synchronization objects or fences to the CPU endpoints to include the fence transport latency in the measurement. To see how to create a stream with NvSciStream API, refer to NvSciStream Sample Application
This test uses the NvPlayFair library (see Benchmarking Library) to record timestamps, set the rate limit, save raw latency data, and calculate the latency statistics (such as the min, max, and mean value) on different platforms and operating systems.
The test app supports a variety of test cases:
- Single-process, inter-process and inter-chip streaming
- Unicast and multicast streaming
The test can set different stream configurations:
- Number of packets allocated in pool.
- Number of payloads transmitted between producer and consumers.
- Buffer size for each element.
- Number of synchronization objects used by each endpoint.
- Frame rate, frequency of the payloads presented by the producer.
- Memory type, vidmem, or sysmem.
The test measures several performance KPIs:
- Latency for each process:
- Total initialization time
- Stream setup time
- Streaming time
- Latency for each payload:
- Duration to wait for an available or ready packet
- End-to-end packet-delivery latency
- PCIe bandwidth in inter-chip stream
The README file in the test folder explains these KPIs with more details.
Prerequisites
NvSciIpc?
Where inter-process streaming is used, the performance test application streams packets between a producer process and a consumer process using inter-process communication (NvSciIpc) channels.
INTER_PROCESS nvscistream_0 nvscistream_1 16 24576?
INTER_PROCESS nvscistream_2 nvscistream_3 16 24576?
INTER_PROCESS nvscistream_4 nvscistream_5 16 24576?
INTER_PROCESS nvscistream_6 nvscistream_7 16 24576?
Where inter-chip streaming is used, the sample application stream packets between different chips via NvSciIpc (INTER_CHIP, PCIe) Channels. For more information, see Chip-to-Chip Communication.
NvPlayFair
This performance test application uses the performance utility functions in the NvPlayFair library.
Building the NvSciStream Performance Test Application
The NvSciStream performance test includes source code, README, and a Makefile.
On the host system, navigate to the test directory:
cd <top>/drive-linux/samples/nvsci/nvscistream/perf_tests/
make clean
make
Running the NvSciStream Performance Test Application
Option | Meaning | Default |
---|---|---|
-h | Prints supported test options | |
-n <count> |
Specifies the number of consumers. Set in the producer process. |
1 |
-k <count> |
Specifies the number of packets in pool. Set in the producer process for primary pool. Set in the consumer process for c2c pool. |
1 |
-f <count> |
Specifies the number of payloads. Set in all processes. |
100 |
-b <size> | Specifies the buffer size (MB) per packet. | 1 |
-s <count> |
Specifies the number of sync objects per client. Set by each process. |
1 |
-r <count> | Specifies the producer frame-present rate (fps) | |
-t <0|1> | Specifies the memory type. 0 for sysmem and 1 for vidmem. Pass dGPU UUID using -u, if using vidmem. | 0 |
-u |
Required for vidmem buffers. Can be retrieved from 'nvidia-smi -L' command on x86 |
|
-l |
Measure latency. Skip if vidmen is used. Set in all processes. |
False |
-v |
Save the latency raw data in csv file. Ignored if not measuring latency. |
False |
-a <target> |
Specifies the average KPI target (us) for packet-delivery latency. Compare the test result with the input target with 5% tolerance. Ignored if not measuring latency. |
|
-m <target> |
Specifies the 99.99 percentile KPI target (us) for packet-delivery latency. Compare the test result with the input target with 5% tolerance. Ignored if not measuring latency. |
|
For inter-process operation: | ||
-p | Inter-process producer. | |
-c <index> | Inter-process indexed consumer. | |
For inter-chip operations: | ||
-P <index> <Ipc endpoint> | Inter-SoC producer, NvSciIpc endpoint name connected to indexed consumer. | |
-C <index> <Ipc endpoint> | Inter-SoC consumer, NvSciIpc endpoint used by this indexed consumer. |
cp <top>/drive-
linux/samples/nvsci/nvscistream/perf_tests/test_nvscistream_perf
<top>/drive-linux/targetfs/home/nvidia/
Following are examples of running the performance test application with different configurations:
- Measure latency for single-process unicast stream with default
setup:
./test_nvscistream_perf -l
- Measure latency for single-process unicast stream with three packets in
pool:
./test_nvscistream_perf -l -k 3
- Measure latency for single-process multicast stream with two
consumers:
./test_nvscistream_perf -n 2 -l
- Measure latency for inter-process unicast stream with default
setup:
./test_nvscistream_perf -p -l &
./test_nvscistream_perf -c 0 -l
- Measure latency for inter-process unicast stream with a fixed producer-present rate at
100 fps, which transmits 10,000 payloads:
./test_nvscistream_perf -p -f 10000 -l -r 100 &
./test_nvscistream_perf -c 0 -f 10000 -l
- Measure latency and save raw latency data in nvscistream_*.csv file for inter-process
unicast stream, which transmits 10 payloads:
./test_nvscistream_perf -p -f 10 -l -v &
./test_nvscistream_perf -c 0 -f 10 -l -v
-
Measure PCIe bandwidth for the inter-chip unicast stream with 12.5 MB buffer size per packet, which transmits 10,000 frames. The two commands are run on different SoCs with <pcie_s0_1> <pcie_s1_1> PCIe channel:
On chip s0:
./test_nvscistream_perf -P 0 pcie_s0_1 -l -b 12.5 -f 10000
On chip s1:
./test_nvscistream_perf -C 0 pcie_s1_1 -l -b 12.5 -f 10000
The test_nvscistream_perf application must run as root user (with sudo).
For the inter-process use case:
sudo rm -rf /dev/mqueue/*
sudo rm -rf /dev/shm/*
The use case is only supported on Linux.
Ensure different SoCs are set with different SoC IDs. For Tegra-x86 use
cases, set a non-zero SoC ID on the NVIDIA Tegra side, because x86 uses 0
as the SoC ID. For more information, refer to the "Bind Options" section in the AV PCT
Configuration chapter of the NVIDIA DRIVE OS 6.0 Developer Guide.