Why alignment?

Vectorized instructions can drastically improve efficiency. For example, loading from memory a dwVector4f can be done with a single vectorized 16-byte operation (ld.global.v4.f32), whereas an non-vectorized operations requires 4x 4-byte operations (ld.global.f32). This can have an enormous impact on latency. The caveat is that memory must be aligned to use vectorized instructions. That is, for a 16-byte operation, the memory address must be a multiple of 16.

The compiler will automatically use vectorized operations whenever possible. So all we need to take advantage of them is to tag our types with the required alignment. For example:

struct dwVector4f
{
   alignas(16)
   float32_t x;
   float32_t y;
   float32_t z;
   float32_t w;
};

In DriveWorks we limit the maximum alignment to 16-bytes to avoid interoperability issues between GCC and NVCC. This might be increased in the future.

Padding

Members of structs that require alignment will have an impact on the memory layout of the struct. First, the struct will take the alignment of the member with the biggest alignment requirement. Second, the compiler will introduce padding bytes between members to ensure that all members are aligned (given that the start of the struct is already aligned). For example, consider the following struct

struct Vertex                   // implicitly aligned to 16 bytes
{
        dwVector2f position;    // aligned to 8
                                // 8-byte padding is inserted here
        dwVector4f color;       // aligned to 16
};

The compiler propagates the 16-byte alignment from color to Vertex. Moreover, 8 bytes of padding are added between position and color.

Note that, by definition, the alignment of a type cannot be larger than the sizeof the type itself. That means that an array of objects of the same type never has padding between objects.

Potential pitfalls

1. Allocation

The default allocation functions (malloc for C and operator new for C++14) do not take alignment into account. That means that they might assign a non-aligned address to a variable requiring alignment. This can lead to memory corruption and crashes. To avoid this we recommend aligning all allocations to the maximum possible alignment (i.e. 16 bytes). In C use posix_memaligned and in C++ override the operator new to use posix_memaligned. The provided samples framework has an example on how to override the operator new that is used to ensure alignment in all the samples.

2. Padding

Code that expects a specific memory layout (like OpenGL) might not support padding. For the DriveWorks renderers dwRenderEngine supports padding like in the Vertex struct above, but dwRenderer doesn't.

3. Packing

Forcing a memory layout, like using pragma pack 1 in GCC, will remove padding and break alignment. The compiler may still generate vectorized operations for those types and the program may crash or corrupt memory. Forcing a memory layout with types requiring alignment is strongly discouraged.

4. Reinterpreting

Unchecked reinterpret casts may violate alignment. For example

uint8_t buffer[256];                                     // No alignment requirement
dwVector4f a = reinterpret_cast<dwVector4f*>(buffer);    // Maybe aligned, maybe not
a = {1.f, 2.f, 3.f, 4.f};                                // Will crash if not aligned

Table of Contents