- Welcome
- Getting Started With the NVIDIA DriveWorks SDK
- Modules
- Samples
- Tools
- Tutorials
- SDK Porting Guide
- DriveWorks API
- More
The image module is composed of 3 submodules
The image module contains structures and methods that allow the user to create and set images handles that are compatible with NVIDIA® DriveWorks modules. An image is represented generically as a handle dwImageHandle_t
, which can be passed to a DriveWorks module for processing, or more specifically as a C struct. The struct differs in content based on the type of image and the properties. All images share common properties:
The image properties are:
DW_IMAGE_MEMORY_TYPE_DEFAULT
, DW_IMAGE_MEMORY_TYPE_PITCH
, DW_IMAGE_MEMORY_TYPE_BLOCK
and represents the arrangement of data in memory. Only CUDA and NVMEDIA can handle both types, CPU is stricktly pitch and GL is stricktly block, The default memory layout will automatically choose the proper layout (once given to a DW module) In the dwImageMetaData
we can find extra information about the image, including sensor statistics and important flags. Among these info there is also dwImageAllocationAttrListHandle_t
which is filled by APIs from other modules in the form dwX_appendAllocationAttributes() (see dwSensorCamera_appendAllocationAttributes
for example). These attributes are external and are required by HW units in order to interface with the created image. The call can be done on the same dwImageProperties for all the modules that will require to use the image that is being created. Special care must be taken when using dwImageProperties in case it has this set, which can have multiple attributes from multiple modules leading to incorrect allocation if not intended.Any image can be created by calling dwImage_create()
and should be followed by a dwImage_destroy()
when the image is not needed anymore. The creation is specific to the type of image and there are 4 supported types. After the image is created it is possible to pass the handle to DriveWorks modules, if they accept the opaque handle, otherwise it's possible to retrieve a struct specific to the image type. The struct allows direct access to the content of the image and any modification will affect original image.
A CPU image is stored as a pitch memory buffer represented by an array of pointers, an array of pitches and properties. Its content can be retrieved from a dwImageHandle_t by calling dwImage_getCPU() and it will return a dwImageCPU and it contains:
dwTime_t
timestamp_us : the timestamp of acquisition from a sensor. If the image is created by the user, it is 0The CPU image is created by specifying DW_IMAGE_CPU type in the properties and calling
A CUDA Image can have 2 forms, a Pitch pointer or CUDA Array form. The two forms are allocated and occupy different domains on GPU memory, one being a Pitch linear pointer, the other being a Block memory cuda Array (thought of as a Texture). It is possible to retrieve the content by calling dwImage_getCUDA() and receiving a dwImageCUDA struct, containing:
dwTime_t
timestamp_us : the timestamp of acquisition from a sensor. If the image is created by the user, it is 0The CUDA image is created by specifying DW_IMAGE_CUDA type in the properties and calling
Note: CUDA image created with format listed in (see NvMedia Images section below) are streamable from CUDA to NvMedia.
A GL image is stored as a GLuint texture present on the GPU. An invalid texture has texID of 0 but it has a positive value when properly created. It is possible to retrieve the ocntent by calling dwImage_getGL() and will receive a dwImageGL and it contains:
dwTime_t
timestamp_us : the timestamp of acquisition from a sensor. If the image is created by the user, it is 0The GL image is created by specifying DW_IMAGE_GL type in the properties and calling
An NvMedia image is stored as a pointer to the low level NvMedia API image struct. For specific information on NvMedia images, see the following information in NVIDIA DRIVE 5.1 PDK:
It is possible to access the pointer by calling dwImage_getNvMedia() and receive a dwImageNvMedia that contains:
dwTime_t
timestamp_us : the timestamp of acquisition from a sensor. If the image is created by the user, it is 0The NvMedia image is created by specifying DW_IMAGE_NVMEDIA type in the properties and calling dwImage_create()
. This will create the handle and also create a NvMediaImage using low level NvMedia API calls, based on properties. Destroying such image will also destroys the NvMediaImage using the low level NvMedia API. Here is a list of supported formats:
Calling dwImage_createAndBindNvMedia() will create the handle and use NvMediaImage created by the user. The function trusts that the user NvMediaImage matches the properties specified. Destroying such image will only destroy the handle, the ownership of the NvMediaImage remains on the user. Note that images created with this API are not streamable to CUDA
Images can be stored in memory in various formats. One dimension of this variation is interleaved vs planar storage for multi-channel images. For example, an interleaved RGB image has 1 plane with 3 channels. A YUV420 planar image has 3 planes, with 1 channel each.
Memory layout can be either pitch or block, depending on the type. CPU images are always pitch, GL images are always block, whereas CUDA and NvMedia images can be either.
The image format describes data type, color space and arrangement of the pixels
Images can be converted into a different format, while retaining the same type. The user must allocate the output image and the conversion will be based on the properties of the input and output images. Only CUDA and NvMedia images support this operation. The converter will not change the size of the image. If all properties are identical, the converter will perform an identical copy.
The following table showcases the formats allowed in conversion. This list is for CUDA images in pitch memory. A subset of those images are also convertible in NvMedia image, indicated with *
An image streamer streams an image from a type X to a type Y, preserving the rest of the properties. All streamers (see note B) need to be initialized in order to allocate the necessary resources for the streaming (for example an image pool), depending on the type of streamer. On low level, all streamers differ in behavior and performance, so the choice and number of streamers should be planned wisely (Note D). The idea of streaming is based on the logic of producer and consumer.
Driveworks 5.0 is changing image streamers backend from Egl based image streamers to Nvsci based image streamers. This is applicable for cross process as well as intra-process image streamers.
Intra process image streamer handles everything internally and applications using Driveworks image streamer APIs would not require any code changes. DW 5.0 image streamers use NvSciBuf internally and hence involves no copy when streaming to another API. Note: To use images with intra process streamers, they should be created with dwImage_create() API. Binding extrenally allocated buffers using dwImage_createAndBindBuffer() are not NvSciBuf based and hence are non streamable. For bind image case, cross process streamer can be leveraged for streaming.
Cross process image streamers are based on NvStreams API and involves copy from application's image to an image allocation maintained internally by NvSci pool. The allocations of this pool are available in producer and consumer process. Since this is copy based, images created using both dwImage_create() as well as dwImage_createAndBindBuffer() are supported.
Note: The copy is done only in producer side using hardware engines available.
Cross process image streamers in DW 5.0 abstracts the functionality internally to setup NvSciStream for streaming images from producer to consumer. As shown in below table, comma separated Key-Value pairs can be specified in parameters string as preferred by application. Of these, only the steam name is mandatory while others are optional.
Key | Value Type | description |
---|---|---|
streamName | string | corresponds to endpoint name as specified in nvsciipc.cfg on Linux or DTB on QNX |
streamNames | csv strings | Use for multicast on producer side. See example below |
fifo-size | numeric | Optional parameter. Can be used to specify fifo size to be used by streamer. By default this value isinternally set to 4 |
timeoutUsec | numeric | timeout value in micro seconds passed to NvSciStream for timeout. Producer and consumer should use large enough value in to account for latency of one created after the other |
Examples of parameter string:
The following table describes the possible streaming combinations, given by image type (dwImageType).
From (column) \ To (row) | CPU | GL | CUDA | NvMedia |
---|---|---|---|---|
CPU | - | X* | X* | X |
GL | X* | - | X | X |
CUDA | X | X* | X (only cross-process) | X |
NvMedia | X | X | X | X (only cross-process) |
CUDA->CPU and vice versa support all formats.
NvMedia -> CUDA and viceversa support:
CPU/NvMedia/CUDA -> GL and viceversa support only DW_IMAGE_FORMAT_RGBA_UINT8
Note A: The streamable images are based on NvSci and are required to be in sysmem.
Note B: The streamer CUDA->GL on DGPU due to technical limitations, allocates extra resources from the one needed and perform extra operations during the stream, leading to performance penalties. Cross process streamers also keep a pool of internal resources that are shared across process and are copied to from image streamed by application. In process streamers incur zero copy when images are streamed from one API to another within process.
Note C: Some formats are stored by NvMedia in a different order compared to the format name. Specifically YUV420/422 planar, the UV planes are actually ordered as VU. The order is restored to the one of the format name when streamed to either CPU or CUDA
The following table describes the mechanism for each streaming combination. 'X' indicates the combination is not available.
From (column) \ To (row) | CPU | CUDA Pitch | CUDA Block | GL | NvMedia |
---|---|---|---|---|---|
CPU | X | cudaMemcpy2DAsync | cudaMemcpy2DToArrayAsync | glBufferData - GL_STATIC_DRAW | NvSci mapping |
CUDA Pitch | cudaMemcpy2DAsync | X | X | cudaMemcpy3DAsync (iGPU, X86) - GL->CPU->CUDA (dGPU) | NvSci mapping |
CUDA Block | cudaMemcpy2DFromArrayAsync | X | X | cudaMemcpy3DAsync (iGPU, X86) - GL->CPU->CUDA (dGPU) | NvSci mapping |
GL | glReadPixels | cudaMemcpy3DAsync (iGPU, X86) - X (dGPU) | cudaMemcpy3DAsync (iGPU, X86) - X (dGPU) | X | EGL |
NvMedia | direct map (only for pitch linear) | NvSci mapping | NvSci mapping | EGL | X |
Note: EGL is not avilable in safety build and will be discontinued in Drive OS 6.0
The NvSci streaming mechanism, within the same process, has minimal overhead. Note also that when creating images, the pointers will reside on the GPU current to the time of creation, therefore accessing and streaming must be done ensuring the same GPU is current (see dwContext_getCurrentGPU
)
The following table gives the streaming performance on NVIDIA DRIVE AGX Developer Kit. Values are given in microseconds and represent the average of 1000 runs; std and spike values are in parenthesis.
'D' indicates dGPU performance and 'I' iGPU. If 'D' or 'I' is not specified, then the performance is independent of the GPU.
RGBA 8bit | RAW 16bit | YUV 420 SP 8bit | |
---|---|---|---|
CPU->CUDA | 20 D (4.2, 117) 402 I (38.8, 643) | 20 D (5.1, 160) 364 I (38.0, 804) | 34 D (8.6, 404) 426 I (38.3, 654) |
CPU->GL | 11 (7.9, 263) | NA | NA |
CPU->NvMedia | 19 (3.5, 56) | 690 (4.1, 711) | NA |
CUDA->CPU | 24 D (6.4, 139) 407 I (29.2. 616) | 23 D (4.6, 147) 422 I (35.6, 798) | 41 D (7.1, 168) 449 I (56.1, 632) |
CUDA->GL | 175 (73.9, 1436) | NA | NA |
CUDA->NvMedia | NA | NA | NA |
NvMedia->CPU | 7 (3.9, 71) | 8 (3.1, 35) | 14 (5.3, 138) |
NvMedia->CUDA | 52 D (11.6, 2161) 34 I (7.5, 908) | 49 D (9.4, 2020) 37 I (11.8, 724) | 71 D (12.6, 3786) 36 I (16.5, 923) |
NvMedia->GL | 38 (13.2, 282) | NA | NA |
GL->CPU | 75 (25.1, 784) | NA | NA |
GL->CUDA | 1950 (146.4, 2411) | NA | NA |
GL->NvMedia | 136 (180.9, 1635) | NA | NA |
Note 1: GL-based times were taken on iGPU
Note 2: Some streamers, especially EGL-based, have spikes for the first few frames, due to hidden optimizations that are performed during the first few iterations. Similar spikes may also occur for CUDA images.
A frame capture has 2 purposes: