Video Memory Overview

Video memory is much more heterogenous than system memory.

NVIDIA video memory allocation algorithms have to take the following into account:

There are multiple types of video memory types. The number and names of the types vary by GPU model, but GPUs generally have at least two; linear, which is essentially unformatted, and one or more GPU specific types. The GPU tracks different types of memory, and will access and write them differently. The types are important because GPU native types can be faster for a given set of operations; in some GPU architectures, the difference is small, on the order of 10-15%. On others, it can be quite large, more than 100% faster than linear memory.
Video memory is often banked, especially for mipmapped textures. In most architecture, alternating mipmap levels for a given texture must be put in separate banks. This separation is mandatory in most NVIDIA GPUs.
In addition to the restrictions above, different memory regions have different alignment restrictions, to match host pages, improve DMA performance, or speed up framebuffer scan out. These alignment requirements may be orthogonal to the memory types, adding further complication.
The allocator may have other special restrictions that enhance performance, such as distributing allocations to a sequence of different banks to improve allocation speed.
These extra constraints complicate the video memory allocator, and make allocations much more sensitive to reductions in available video memory. This is the major reason why NVIDIA does not support multiple independent heaps in video memory, instead requiring the application to allocate in such a way as to minimize fragmentation.