Bring NVIDIA ACE AI Characters to Games with the New In-Game Inferencing SDK

NVIDIA ACE is a suite of digital human technologies that bring game characters and digital assistants to life with generative AI. ACE on-device models enable agentic workflows for autonomous game characters that can perceive their environment, understand multi-modal inputs, strategically plan a set of actions and execute them all in real time, providing dynamic experiences for players.

To run these models alongside the game engine, the NVIDIA In-Game Inferencing (NVIGI) SDK enables you to integrate AI inference directly into C++ games and applications for optimal performance and latency.

This post shows how NVIGI integrates with ACE to enable seamless AI inference in game development. We cover the NVIGI architecture, key features, and how to get started creating autonomous characters with NVIDIA ACE on-device models.

NVIDIA ACE on-device models

NVIDIA ACE enables speech, intelligence, and animation powered by generative AI. It delivers a suite of AI models that enable game characters to perceive, reason, and act based on player interactions in real time:

Perception: The upcoming NeMoAudio-4B-Instruct model enhances character interactions with greater contextual awareness from audio. You can easily integrate more multimodal models to further expand these capabilities by incorporating additional sensory inputs.
Cognition: The Mistral-Nemo-Minitron-Instruct family of small language models top the charts in terms of instruction following capabilities enabling characters to accurately roleplay.
Memory: Embedding models like E5-Large-Unsupervised enable characters to recall past interactions, enriching immersion.
Animation: Real-time AI-driven animation such as Audio2Face delivers accurate lip sync for dynamic and lively emotions.
Action: Simple interfaces and custom logic enable characters to take meaningful actions, from selecting in-game responses to executing strategic plans through model-driven decision-making and prompts.

What is the NVIDIA In-Game Inferencing SDK?

A diagram shows the NVIGI SDK core plugins, helper plugins, unified APIs, and local execution; NVIDIA ACE AI models; and the local PC. — *Figure 1. NVIGI SDK architecture*

The NVIGI SDK is a GPU-optimized, plugin-based inference manager designed to simplify the integration of ACE models into gaming and interactive applications. It offers the following functionality:

Plugin flexibility: Add, update, and manage AI plugins (ASR, language models, embeddings) with ease.
Windows-native DLLs: Streamline workflows for C++ games and applications.
GPU optimization: Use compute-in-graphics (CIG) technology for efficient AI inference alongside rendering tasks.

By combining NVIGI with ACE, you can create autonomous characters with advanced generative AI capabilities, such as real-time NPC dialogue, contextual memory, and lifelike animation.

How NVIGI works

At its core, the NVIGI architecture is based on modular plugins that enable flexible integration of various AI functionalities:

Core plugins: Provide AI capabilities such as automatic speech recognition (ASR), generative reasoning, and embedding retrieval.
Helper plugins: Handle utilities such as GPU scheduling and network communication.
Unified APIs: Simplify plugin management and reduce code complexity.
Local and cloud execution: Supports both on-device inference (CPU/GPU) and cloud-based AI workflows.

These components work together to deliver a seamless AI-driven gaming experience. For example, imagine a player asking a question to an NPC. NVIGI orchestrates an agentic workflow, enabling characters to listen, reason, speak, and animate in real time.

This process follows a few key steps:

Listen to users with ASR: The NPC processes player speech using NVIDIA Riva ASR, converting spoken input into text for further reasoning.
Generate a response with an SLM: The transcribed text is passed to a small language model (SLM), such as Mistral-Nemo-Minitron-128K-Instruct, which generates a dynamic, contextually relevant response. If additional context is needed, a retrieval-augmented generation (RAG) approach can be used, where an embedding model like E5-Large-Unsupervised converts text into vector representations. These vectors are then used in a similarity search to retrieve relevant knowledge, enriching the SLM’s response with additional context.
Give characters a voice with TTS: The generated response is synthesized into lifelike voices, ensuring natural intonation and expressiveness.
Create interactive animation: The spoken response then drives facial blendshapes for real-time animation using Audio2Face-3D, ensuring accurate lip sync and expressive character movements.

Throughout this process, GPU scheduling and CiG ensure that AI inference workloads run alongside rendering tasks while minimizing impact to frame rates. This ensures a seamless, real-time experience.

Video 1. NVIDIA ACE | inZOI – Create Simulated Cities with Co-Playable Characters

For more information about implementing these plugins and models for in-process execution, see the next sections.

Getting started with ACE on-device inference

This section outlines how to use NVIGI with ACE SLMs to enable in-process, AI inference alongside your rendering workloads.

NVIGI includes batch files that fetch models from repositories like NVIDIA NGC and Huggingface. This keeps the initial download minimal and ensures that you only have to download the models you want to work with. It also includes a suite of local inference plugins, which we highlight later to help you get started.

Initialize NVIGI

The first step is to initialize the NVIGI framework. This process sets up plugin paths, logging, and core configurations needed for NVIGI to function within your game.

The following code example shows how to initialize NVIGI in your game:

nvigi::Preferences preferences{};
preferences.logLevel = nvigi::LogLevel::eVerbose; // Enable verbose logging
preferences.utf8PathsToPlugins = {"path/to/plugins"}; // Set plugin path
preferences.utf8PathToLogsAndData = "path/to/logs";    // Define log path
 
if (nvigiInit(preferences, nullptr, nvigi::kSDKVersion) != nvigi::kResultOk) {
    std::cerr << "Failed to initialize NVIGI." << std::endl;
}

Load plugins and model

The NVIGI architecture is built around a plugin system that offers modularity and flexibility. Plugins are included for AI model classes such as LLMs, ASR, and embedding retrieval. These plugins enable you to deploy AI models from the ACE range to implement the features and behaviors you need.

Each plugin is designed to support multiple models that share a designated backend and underlying API. NVIGI includes in-process plugins with support for popular backends such as GGML (llama.cpp, whisper.cpp, embedding.cpp), ONNX Runtime, and DirectML.

Here are some examples:

nvigi.plugin.asr.ggml.cuda: Converts speech into text using GGML and CUDA.
nvigi.plugin.gpt.ggml.cuda: Powers the AI-generated dialogue and reasoning.
nvigi.plugin.embed.ggml.*: Finds relevant text based on sentiment, to provide better context.

To use the GPT plugin, first load its interface to query the model’s capabilities:

// Load GPT plugin interface
nvigi::IGeneralPurposeTransformer* gpt = nullptr;
nvigiGetInterfaceDynamic(nvigi::plugin::gpt::ggml::cuda::kId, &gpt, ptr_nvigiLoadInterface);

NVIGI comes pre-loaded with a selection of manually downloadable models to get started, but you can get more GGUF models either from NGC or on Huggingface. You then define model parameters for the plugin, such as the following example:

// Configure model parameters
nvigi::CommonCreationParameters common{};
common.utf8PathToModels = "path/to/models";
common.numThreads = 8;                    // Number of CPU threads
common.vramBudgetMB = vram;               // VRAM allocation in MB
common.modelGUID = "{YOUR_MODEL_GUID}";   // Model GUID

When the interface and model parameters are loaded, configure an instance or in-process inference:

// Create GPT instance
nvigi::InferenceInstance* gptInstance = nullptr;
if (gpt->createInstance(common, &gptInstance) != nvigi::kResultOk || !gptInstance) {
    std::cerr << "Failed to create GPT instance." << std::endl;
}
 
// Use GPT instance for inference

Create the runtime configuration and inference

The InferenceInstance interface provides the API for executing inference tasks and is configured through InferenceExecutionContext. This interface enables the setup of input slots, runtime parameters, and callback mechanisms to retrieve model responses.

Inference tasks rely on input data, such as player text or audio, combined with runtime configurations. The context for defining a game character’s personality and role can be established using the nvigi::kGPTDataSlotSystem slot:

// Define NPC role in a system prompt
std::string npcPrompt = "You are a helpful NPC named TJ in a fantasy game.";
nvigi::CpuData systemPromptData(npcPrompt.length() + 1, npcPrompt.c_str());
nvigi::InferenceDataText systemPromptSlot(systemPromptData);
 
// Set runtime parameters
nvigi::GPTRuntimeParameters runtime{};
runtime.tokensToPredict = 200;  // Limit token prediction to 200 tokens
runtime.interactive = true;     // Enable multi-turn conversations
std::vector<nvigi::InferenceDataSlot> slots = {
    {nvigi::kGPTDataSlotSystem, &systemPromptSlot}
};
 
// Inference context
nvigi::InferenceExecutionContext gptExecCtx{};
gptExecCtx.instance = gptInstance;
gptExecCtx.runtimeParameters = runtime;
gptExecCtx.inputs = slots.data();

Dynamic interaction between the player and game character can be managed as follows:

std::string userInput = "What’s your name?"; // example user input
nvigi::CpuData userInputData(userInput.length() + 1, userInput.c_str());
nvigi::InferenceDataText userInputSlot(userInputData);
 
slots = {{nvigi::kGPTDataSlotUser, &userInputSlot}};
gptExecCtx.inputs = slots.data();

Run inference and handling of the response asynchronously:

if (gptExecCtx.instance->evaluate(&gptExecCtx) == nvigi::kResultOk) {
    std::cout << "Inference completed successfully!" << std::endl;
}

Each turn of the conversation is processed through the nvigi::kGPTDataSlotUser input slot, maintaining context for multi-turn dialogue.

You can also implement a callback function to capture the character response for display in-game, with the example shown in the next section.

Enable GPU scheduling and rendering integration

AI workloads in games run alongside rendering tasks, so effective GPU scheduling is crucial to maintain frame rates. NVIGI uses CIG to schedule GPU workloads efficiently.

To schedule graphics and compute efficiently, NVIGI must get the D3D direct queue that your game is using for graphics. The D3D12Parameters structure ensures that NVIGI integrates directly with the rendering pipeline, enabling AI tasks to run in parallel without impacting graphical performance.

The following code example shows how to enable CIG for AI inference using the NVIGI IHWICuda interface, using the ASR plugin as an example:

// Enable Compute-in-Graphics (CIG)
nvigi::IHWICuda* icig = nullptr;
if (nvigiGetInterface(nvigi::plugin::hwi::cuda::kId, &icig) != nvigi::kResultOk || !icig) {
    std::cerr << "Failed to load CIG interface." << std::endl;
    return;
}
 
// Set up D3D12 parameters
nvigi::D3D12Parameters d3d12Params{};
d3d12Params.device = myD3D12Device;            // D3D12 device used for rendering
d3d12Params.queue = myD3D12CommandQueue;      // Graphics command queue
 
// Activate GPU scheduling for both inference and rendering
if (icig->enableComputeInGraphics(d3d12Params) != nvigi::kResultOk) {
    std::cerr << "Failed to enable Compute-in-Graphics." << std::endl;
    return;
}
 
std::cout << "Compute-in-Graphics enabled successfully." << std::endl;

To set up NVIGI GPU scheduling in Unreal Engine 5 (UE5), use the global dynamic rendering hardware interface (RHI) to access the game’s D3D device and command queue.

Configuring CIG in UE5 is straightforward:

// UE5-specific code to retrieve D3D12 resources
#include "ID3D12DynamicRHI.h"
 
ID3D12DynamicRHI* RHI = nullptr;
if (GDynamicRHI && GDynamicRHI->GetInterfaceType() == ERHIInterfaceType::D3D12)
{
    RHI = static_cast<ID3D12DynamicRHI*>(GDynamicRHI);
}
 
ID3D12CommandQueue* CmdQ = nullptr;
ID3D12Device* D3D12Device = nullptr;
 
if (RHI) {
    CmdQ = RHI->RHIGetCommandQueue();  // Get graphics command queue
    int DeviceIndex = 0;
    D3D12Device = RHI->RHIGetDevice(DeviceIndex);  // Get D3D12 device
}
 
// Configure D3D12 parameters for IGI
nvigi::D3D12Parameters d3d12Params{};
d3d12Params.device = D3D12Device;
d3d12Params.queue = CmdQ;
 
// Pass the parameters to IGI instances
nvigi::CommonCreationParameters commonParams{};
commonParams.chain(d3d12Params);
 
// Example: Creating an ASR instance with CIG
nvigi::ASRCreationParameters asrParams{};
asrParams.common = &commonParams;
 
nvigi::InferenceInstance* asrInstance = nullptr;
iasr->createInstance(asrParams, &asrInstance);

Run inference

Inference tasks on NVIGI involve setting up a conversation context, processing user inputs, and generating responses dynamically. The following steps outline how to run inference tasks efficiently in your game environment.

To execute inference, you must create an inference context, which includes the following:

Input slots: Prepare input data (user text, voice data) in a format that the model can process.
Runtime parameters: Define the behavior of the inference, such as the number of tokens to predict or interactivity settings.
Callback mechanisms: Specify how to handle the output results.

The inference context defines how inputs and outputs are processed. Start by enabling interactive mode and preparing runtime parameters:

// Configure runtime parameters for GPT
nvigi::GPTRuntimeParameters runtime{};
runtime.tokensToPredict = 200;  // Predict up to 200 tokens
runtime.interactive = true;     // Enable interactive mode
 
// Set up inference context
nvigi::InferenceExecutionContext gptExecCtx{};
gptExecCtx.instance = gptInstance;   // Use the GPT instance created earlier
gptExecCtx.runtimeParameters = runtime;
 
gptExecCtx.callback = [](const nvigi::InferenceExecutionContext* execCtx, nvigi::InferenceExecutionState state, void* userData) {
    if (state == nvigi::kInferenceExecutionStateDone && execCtx->outputs){
        const nvigi::InferenceDataText* responseText = nullptr;
        execCtx->outputs->findAndValidateSlot(nvigi::kGPTDataSlotResponse, &responseText);
        if (responseText) {
            std::cout << "NPC Response: " << responseText->getUtf8Text() << std::endl;
        }
    }
    return state;
};

You can begin the conversation by providing a system prompt that defines the NPC’s personality or role. Use the nvigi::kGPTDataSlotSystem slot for this purpose:

// Set up conversation context
std::string npcPrompt = "You are a helpful NPC in a fantasy game. Respond thoughtfully to player questions.";
 
nvigi::CpuData systemPromptData(npcPrompt.length() + 1, npcPrompt.c_str());
nvigi::InferenceDataText systemPromptSlot(systemPromptData);
 
std::vector<nvigi::InferenceDataSlot> slots = {
    {nvigi::kGPTDataSlotSystem, &systemPromptSlot}  // Set the system prompt
};
 
gptExecCtx.inputs = slots.data();
gptExecCtx.numInputs = slots.size();
 
// Execute to initialize conversation context
if (gptExecCtx.instance->evaluate(&gptExecCtx) != nvigi::kResultOk) {
    std::cerr << "Failed to initialize conversation context." << std::endl;
    return;
}

List of available NVIGI plugins

You can begin building agentic frameworks for on-device inference with the following speech and intelligence models today.

NVIGI Plugin	Supported Inference Hardware	Supported Models
Speech – ASR Local GGML	CUDA-Enabled GPU or CPU	Whisper ASR
Speech – ASR Local TRT	CUDA-Enabled GPU	NVIDIA RIVA ASR (coming soon)
Language – GPT Local ONNX DML	ONNX-supported GPU or CPU	Mistral-7B-Instruct
Language- GPT Local GGML	CUDA-Enabled GPU or CPU	Llama-3.2-3b Instruct Nemotron-Mini-4B-Instruct Mistral-Nemo-Minitron-2B-128k-Instruct Mistral-Nemo-Minitron-4B-128k-Instruct Mistral-Nemo-Minitron-8B-128k-Instruct Nemovision-4B-Instruct
RAG – Embed Local GGML	CUDA-Enabled GPU or CPU	E5 Large Unsupervised

Table 1. Available speech and intelligence models

Conclusion

NVIDIA ACE and NVIGI represent the next step in the evolution of AI-driven game development. By combining ACE’s advanced generative AI models with NVIGI seamless integration and GPU-optimized performance, you can unlock new levels of interactivity and immersion.

From dynamic NPC dialogue and real-time speech recognition to lifelike animations and contextual memory, ACE and NVIGI provide a cost-effective, scalable solution for creating intelligent, autonomous characters.

Get started with NVIDIA ACE with NVIGI today.