Description

The NVIDIA^® DriveNet sample is a sophisticated, multi-class, higher- resolution example that uses the NVIDIA^® DriveNet proprietary deep neural network (DNN) to perform object detection.

The DriveNet sample application detects objects by performing inferences on each frame of a RAW video or camera stream. It clusters these objects with parameters defined within the sample application.

A follow-up algorithm clusters detections from both images to compute a more stable response.

Running the Sample

The DriveNet sample, sample_drivenet, accepts the following optional parameters. If none are specified, it performs detections on a supplied pre-recorded video.

./sample_drivenet --input-type=[video|camera|cameraCustom]
                  --video=[path/to/video]
                  --camera-type=[camera]
                  --camera-group=[a|b|c|d]
                  --camera-index=[0|1|2|3]
                  --cameraCustomString=[camera-parameter-string]
                  --slave=[0|1]
                  --precision=[int8|fp16|fp32]
                  --useCudaGraph=[0|1]
                  --stopFrame=[frame]
                  --enableUrgency=[0|1]
                  --stateless=[0|1]

Where:

--input-type=[video|camera|cameraCustom]
        Defines if the input is from live camera or from a recorded video.
        Live camera is supported only on NVIDIA DRIVE(tm) platforms.
        It is not supported on Linux (x86 architecture) host systems.
        Default value: video

--video=[path/to/video]
        Specifies the absolute or relative path of a raw, lraw or h264 recording.
        Only applicable if --input-type=video.
        Default value: path/to/data/samples/lraw/AR820_RGGB_8MP_v10.lraw

--camera-type=[camera]
        Only applicable if --input-type=camera.
        Default value: ar0231-rccb-bae-sf3324

--camera-group=[a|b|c|d]
        Is the group where the camera is connected to.
        Only applicable if --input-type=camera.
        Default value: a

--camera-index=[0|1|2|3]
        Indicates the camera index on the given port.
        Default value: 0

--cameraCustomString=[camera-parameter-string]
        Parameter string for custom cameras.
        Only applicable if --input-type=cameraCustom
        Default value: camera-name=SF3324,interface=csi-a,link=0,output-format=processed

--slave=[0|1]
        Setting this parameter to 1 when running the sample on Xavier B accesses the camera
        on Xavier A.
        Applicable only when --input-type=camera.
        Default value: 0

--precision=[int8|fp16|fp32]
        Defines the precision of the DriveNet DNN. The following precision levels are supported.
        - int8
          - 8-bit signed integer precision.
          - Supported GPUs: compute capability >= 6.1.
          - Faster than fp16 and fp32 on GPUs with compute capability = 6.1 or compute capability > 6.2.
        - fp16 (default)
          - 16-bit floating point precision.
          - Supported GPUs: compute capability >= 6.2
          - Faster than fp32.
          - If fp16 is selected on a Pascal GPU, the precision will be set to fp32.
        - fp32
          - 32-bit floating point precision.
          - Supported GPUs: Only Pascal GPUs (compute capability 6.1)
          - Default for Pascal GPUs.
        When using DLA engines only fp16 is allowed.
        Default value: fp16

--useCudaGraph=[0|1]
        Setting this parameter to 1 runs Drivenet DNN inference by CUDAGraph if the hardware supports.
        Default value: 0

--stopFrame=[number]
        Runs DriveNet only on the first <number> frames and then exits the application.
        The default value for `--stopFrame` is 0, for which the sample runs endlessly.
        Default value: 0

--enableUrgency=[0|1]
        Enables the object urgency prediction by a temporal model.
        Only supports predicting the urgency for cars and pedestrians on the front camera with 60&deg; field of view.
        Default value: 0

--stateless=[0|1]
        Setting this parameter to 0 runs the stateful temporal model. Setting it to 1 runs the stateless temporal model.
        The stateful model uses all past frames to predict urgency, while the stateless model only uses the most recent frames.
        Only applicable if --enableUrgency=1.
        Default value: 0

Examples

To run the sample on a video

./sample_drivenet --input-type=video --video=<video file.raw>

To run the sample on a camera on NVIDIA DRIVE platforms

./sample_drivenet --input-type=camera --camera-type=<camera type> --camera-group=<camera group> --camera-index=<camera idx on camera group>

where <camera type> is a supported RCCB sensor. See Cameras Supported for the list of supported cameras for each platform.

To run the sample on a DLA engine on an NVIDIA DRIVE platform

On NVIDIA DRIVE^™ platforms, you can run DriveNet on DLA engines with the following command line:

./sample_drivenet --dla=1 --dlaEngineNo=0

To run the sample on a video for the first 3000 frames

./sample_drivenet --video=<video file.raw> --stopFrame=3000

To run the sample with different precisions

./sample_drivenet --precision=int8

To run the sample with urgency predictions

./sample_drivenet --enableUrgency=1

Output

The sample creates a window, displays a video, and overlays bounding boxes for detected objects. The color of the bounding boxes represents the classes that the sample detects, as follows:

Red: Cars and Trucks (both labeled as cars).
Green: Traffic Signs.
Blue: Bicycles.
Magenta: Pedestrians.
Orange: Traffic Lights.
Yellow: Curb
Cyan: other
Grey: unknown

When urgency prediction is enabled, the predicted urgency value is displayed behind the object class name. The color of the bounding boxes represents urgency value with a green, white, red smoothly transitioned color map. In this color map, green indicates negative urgency, white indicates zero urgency, and red indicates positive urgency.

Multiclass object detector on an RCCB stream using DriveNet

Limitations

Warning

DriveNet DNN currently has limitations that could affect its performance:

It is optimized for daytime, clear-weather data. As a result, it does not perform well in dark or rainy conditions.
It is trained primarily on data collected in the United States. As a result, it may have reduced accuracy in other locales, particularly for road sign shapes that do not exist in the U.S.

Table of Contents