- Welcome
- Getting Started With the NVIDIA DriveWorks SDK
- Modules
- Samples
- Tools
- Tutorials
- SDK Porting Guide
- DriveWorks API
- More
Many of the types in DriveWorks require alignment, e.g. dwVector4f, dwTransformation3f. There are some special considerations to take into account when using aligned types. This page tries to cover the basics and potential pitfalls.
Vectorized instructions can drastically improve efficiency. For example, loading from memory a dwVector4f can be done with a single vectorized 16-byte operation (ld.global.v4.f32), whereas an non-vectorized operations requires 4x 4-byte operations (ld.global.f32). This can have an enormous impact on latency. The caveat is that memory must be aligned to use vectorized instructions. That is, for a 16-byte operation, the memory address must be a multiple of 16.
The compiler will automatically use vectorized operations whenever possible. So all we need to take advantage of them is to tag our types with the required alignment. For example:
In DriveWorks we limit the maximum alignment to 16-bytes to avoid interoperability issues between GCC and NVCC. This might be increased in the future.
Members of structs that require alignment will have an impact on the memory layout of the struct. First, the struct will take the alignment of the member with the biggest alignment requirement. Second, the compiler will introduce padding bytes between members to ensure that all members are aligned (given that the start of the struct is already aligned). For example, consider the following struct
The compiler propagates the 16-byte alignment from color
to Vertex
. Moreover, 8 bytes of padding are added between position
and color
.
Note that, by definition, the alignment of a type cannot be larger than the sizeof
the type itself. That means that an array of objects of the same type never has padding between objects.
The default allocation functions (malloc
for C and operator new
for C++14) do not take alignment into account. That means that they might assign a non-aligned address to a variable requiring alignment. This can lead to memory corruption and crashes. To avoid this we recommend aligning all allocations to the maximum possible alignment (i.e. 16 bytes). In C use posix_memaligned
and in C++ override the operator new
to use posix_memaligned
. The provided samples framework has an example on how to override the operator new
that is used to ensure alignment in all the samples.
Code that expects a specific memory layout (like OpenGL) might not support padding. For the DriveWorks renderers dwRenderEngine
supports padding like in the Vertex
struct above, but dwRenderer
doesn't.
Forcing a memory layout, like using pragma pack 1
in GCC, will remove padding and break alignment. The compiler may still generate vectorized operations for those types and the program may crash or corrupt memory. Forcing a memory layout with types requiring alignment is strongly discouraged.
Unchecked reinterpret casts may violate alignment. For example
This is only a summary on alignment issues. For more details the different compilers and instruction sets have more details on alignment issues.
Useful links: