GPU management and access API.
For related information, see Discrete GPU in the Development Guide.
nvrm_gpu follows object-oriented design. In general, objects are created by API functions ending with "Open" or "Create" and they are destroyed by API functions ending with "Close". For example NvRmGpuLibOpen() creates the library object and NvRmGpuLibClose() destroys the object.
The objects typically have a real-world counterpart or they represent a logical construct. For example, a device object represents an NVIDIA GPU.
All objects are referred by handles. A handle is a pointer-to-struct type where the struct provides typing. For example, a library object is referred by a variable of type NvRmGpuLib *
. Handles cannot be dereferenced by the nvrm_gpu user.
The handle lifetime is the same as the underlying object lifetime. That is, a valid handle is created by an Open/Create function and a Close function will invalidate the handle as well as destroy the object. For example, NvRmGpuDeviceOpen() creates a device object and returns a handle to the device. NvRmGpuDeviceClose() destroys the device object and invalidates the device handle.
In general, before destroying an object, any possible child object must be destroyed first. For example, when closing the library with NvRmGpuLibClose(), all devices opened within the library scope must be closed first with NvRmGpuDeviceClose(). Failing to do so will produce undefined behavior in form of dangling pointers in nvrm_gpu internal objects and data structures.
Unless otherwise stated, it is illegal to cross-reference handles under different NvRmGpuDevice instances. For instance, a channel belonging to one device object cannot be bound to the address space belonging to another device object.
nvrm_gpu uses the following common error codes throughout the API:
Error code | Build configurations | Description | Remarks |
---|---|---|---|
NvError_GpuInvalidHandle | Safety builds only | Invalid API object handle | nvrm_gpu standard does not perform handle validation for performance reasons. |
NvError_GpuInvalidDvmsState | QNX builds only | API function not available in the current DVMS state | None |
NvError_GpuHwError | All | Error encountered during direct communication with the HW. | Examples: Unexpected register read value; unexpected control buffer value |
NvError_GpuOutOfOrderFree | Safety builds only | A parent (or other dependency) object freed before children (or other dependencies). | None |
NvError_InsufficientMemory | All | Insufficient memory to perform the operation. | None |
NvError_ResourceError | All | Error communicating with a kernel-mode driver (Linux) or a resource manager (QNX) | None |
NvError_GpuFatalLockdown | Safety builds only | nvrm_gpu is in lockdown mode due to a previous fatal error. | Once a fatal error has been encountered, most API calls return this error. The exceptions are API calls that never fail (e.g. NvRmGpuLibGetVersionInfo()) and API calls that always fail (e.g., calls disabled in the build). When a fatal error has been encountered, the integrity of nvrm_gpu is considered lost. The lockdown mode prevents any further undefined behavior. |
NvError_GpuFatalConsistencyError | Safety builds only | Internal state consistency check failed. | Potential sources include memory corruption or a programming defect. |
NvError_GpuFatalLogicError | Safety builds only | Internal logic error detected. | These errors are generally triggered due defensive mechanisms. For example: internal out-of-bounds check failed; integer overflow check failed; predicate check failed. |
NvError_GpuFatalOsError | Safety builds only | An OS call or a library call that should never fail with correct programming returned an error. | Example: pthread_mutex_lock() or pthread_mutex_unlock() returned an error. |
NvError_GpuFatalUncheckedException | Safety builds only | An unexpected (unchecked) exception was caught. | None |
Parameter sanity checking:
NULL
pointers are expected to be valid.Objects and their relations:
NULL
handles, unless otherwise stated. Closing a NULL
handle is a no-operation. This allows the nvrm_gpu users to unconditionally close all handles in deinitialization, regardless of whether they refer to valid objects or NULL
. (See NvRmGpuLibClose() for an example.)Handles:
NULL
. Individual API functions may relax the non-NULL
requirement by marking input handles as optional in the API function documentation, in which case a NULL
pointer may be passed, instead.Pointers:
Forwards and backwards compatibility:
nvrm_gpu is thread-safe with the following rule: an object may be closed only if there are no concurrent operations on-going on it. Attempting to close an object in one thread when another thread is still accessing it is a fatal error.
nvrm_gpu internally uses fine-grain locking to promote high-performance multi-threaded programming. The nvrm_gpu implementation uses the partial lock ordering technique to avoid deadlocks related to nested locking.
nvrm_gpu is subject to safety certification for specific releases. The nvrm_gpu library in the safety-certified release comes in form of special build called the safety build of nvrm_gpu. The safety build supports a subset of the nvrm_gpu API functionality. This is referred as the safety subset.
The safety subset available in the safety build is denoted by API groups that have "safety subset" in their name. Functionality that is not within the "safety subset" groups is not available in the safety build and must not be attempted to be used in a safety-critical context. The top-level safety subset API group is GPU access API (safety subset).
The use of nvrm_gpu in a safety-critical context is further subject to certain assumptions, restrictions, and recommendations. These are described in the safety manual provided in the safety-certified release.
The safety build of nvrm_gpu has additional internal checks enabled that are not available in the regular builds for performance reasons.
Modules | |
GPU access API: Device management | |
Device control, device capabilities, and device memory management. | |
GPU access API: Library | |
Library management and device discovery. | |