• <xmp id="om0om">
  • <table id="om0om"><noscript id="om0om"></noscript></table>
  • Generative AI

    Bring NVIDIA ACE AI Characters to Games with the New In-Game Inferencing SDK

    A character face on a black background.

    NVIDIA ACE is a suite of digital human technologies that bring game characters and digital assistants to life with generative AI. ACE on-device models enable agentic workflows for autonomous game characters that can perceive their environment, understand multi-modal inputs, strategically plan a set of actions and execute them all in real time, providing dynamic experiences for players.

    To run these models alongside the game engine, the NVIDIA In-Game Inferencing (NVIGI) SDK enables you to integrate AI inference directly into C++ games and applications for optimal performance and latency. 

    This post shows how NVIGI integrates with ACE to enable seamless AI inference in game development. We cover the NVIGI architecture, key features, and how to get started creating autonomous characters with NVIDIA ACE on-device models.

    NVIDIA ACE on-device models

    NVIDIA ACE enables speech, intelligence, and animation powered by generative AI. It delivers a suite of AI models that enable game characters to perceive, reason, and act based on player interactions in real time:

    • Perception: The upcoming NeMoAudio-4B-Instruct model enhances character interactions with greater contextual awareness from audio. You can easily integrate more multimodal models to further expand these capabilities by incorporating additional sensory inputs.
    • Cognition: The Mistral-Nemo-Minitron-Instruct family of small language models top the charts in terms of instruction following capabilities enabling characters to accurately roleplay.
    • Memory: Embedding models like E5-Large-Unsupervised enable characters to recall past interactions, enriching immersion.
    • Animation: Real-time AI-driven animation such as Audio2Face delivers accurate lip sync for dynamic and lively emotions.
    • Action: Simple interfaces and custom logic enable characters to take meaningful actions, from selecting in-game responses to executing strategic plans through model-driven decision-making and prompts.

    What is the NVIDIA In-Game Inferencing SDK?

    A diagram shows the NVIGI SDK core plugins, helper plugins, unified APIs, and local execution; NVIDIA ACE AI models; and the local PC.
    Figure 1. NVIGI SDK architecture

    The NVIGI SDK is a GPU-optimized, plugin-based inference manager designed to simplify the integration of ACE models into gaming and interactive applications. It offers the following functionality:

    • Plugin flexibility: Add, update, and manage AI plugins (ASR, language models, embeddings) with ease.
    • Windows-native DLLs: Streamline workflows for C++ games and applications.
    • GPU optimization: Use compute-in-graphics (CIG) technology for efficient AI inference alongside rendering tasks.

    By combining NVIGI with ACE, you can create autonomous characters with advanced generative AI capabilities, such as real-time NPC dialogue, contextual memory, and lifelike animation.

    How NVIGI works

    At its core, the NVIGI architecture is based on modular plugins that enable flexible integration of various AI functionalities:

    • Core plugins: Provide AI capabilities such as automatic speech recognition (ASR), generative reasoning, and embedding retrieval.
    • Helper plugins: Handle utilities such as GPU scheduling and network communication.
    • Unified APIs: Simplify plugin management and reduce code complexity.
    • Local and cloud execution: Supports both on-device inference (CPU/GPU) and cloud-based AI workflows.

    These components work together to deliver a seamless AI-driven gaming experience. For example, imagine a player asking a question to an NPC. NVIGI orchestrates an agentic workflow, enabling characters to listen, reason, speak, and animate in real time. 

    This process follows a few key steps:

    1. Listen to users with ASR: The NPC processes player speech using NVIDIA Riva ASR, converting spoken input into text for further reasoning.
    2. Generate a response with an SLM: The transcribed text is passed to a small language model (SLM), such as Mistral-Nemo-Minitron-128K-Instruct, which generates a dynamic, contextually relevant response. If additional context is needed, a retrieval-augmented generation (RAG) approach can be used, where an embedding model like E5-Large-Unsupervised converts text into vector representations. These vectors are then used in a similarity search to retrieve relevant knowledge, enriching the SLM’s response with additional context.
    3. Give characters a voice with TTS: The generated response is synthesized into lifelike voices, ensuring natural intonation and expressiveness.
    4. Create interactive animation: The spoken response then drives facial blendshapes for real-time animation using Audio2Face-3D, ensuring accurate lip sync and expressive character movements.

    Throughout this process, GPU scheduling and CiG ensure that AI inference workloads run alongside rendering tasks while minimizing impact to frame rates. This ensures a seamless, real-time experience.

    Video 1. NVIDIA ACE | inZOI – Create Simulated Cities with Co-Playable Characters

    For more information about implementing these plugins and models for in-process execution, see the next sections.

    Getting started with ACE on-device inference

    This section outlines how to use NVIGI with ACE SLMs to enable in-process, AI inference alongside your rendering workloads.

    NVIGI includes batch files that fetch models from repositories like NVIDIA NGC and Huggingface. This keeps the initial download minimal and ensures that you only have to download the models you want to work with. It also includes a suite of local inference plugins, which we highlight later to help you get started.

    Initialize NVIGI

    The first step is to initialize the NVIGI framework. This process sets up plugin paths, logging, and core configurations needed for NVIGI to function within your game.

    The following code example shows how to initialize NVIGI in your game:

    nvigi::Preferences preferences{};
    preferences.logLevel = nvigi::LogLevel::eVerbose; // Enable verbose logging
    preferences.utf8PathsToPlugins = {"path/to/plugins"}; // Set plugin path
    preferences.utf8PathToLogsAndData = "path/to/logs";    // Define log path
     
    if (nvigiInit(preferences, nullptr, nvigi::kSDKVersion) != nvigi::kResultOk) {
        std::cerr << "Failed to initialize NVIGI." << std::endl;
    }

    Load plugins and model

    The NVIGI architecture is built around a plugin system that offers modularity and flexibility. Plugins are included for AI model classes such as LLMs, ASR, and embedding retrieval. These plugins enable you to deploy AI models from the ACE range to implement the features and behaviors you need.

    Each plugin is designed to support multiple models that share a designated backend and underlying API. NVIGI includes in-process plugins with support for popular backends such as GGML (llama.cpp, whisper.cpp, embedding.cpp), ONNX Runtime, and DirectML.

    Here are some examples:

    • nvigi.plugin.asr.ggml.cuda: Converts speech into text using GGML and CUDA.
    • nvigi.plugin.gpt.ggml.cuda: Powers the AI-generated dialogue and reasoning.
    • nvigi.plugin.embed.ggml.*: Finds relevant text based on sentiment, to provide better context.

    To use the GPT plugin, first load its interface to query the model’s capabilities:

    // Load GPT plugin interface
    nvigi::IGeneralPurposeTransformer* gpt = nullptr;
    nvigiGetInterfaceDynamic(nvigi::plugin::gpt::ggml::cuda::kId, &gpt, ptr_nvigiLoadInterface);

    NVIGI comes pre-loaded with a selection of manually downloadable models to get started, but you can get more GGUF models either from NGC or on Huggingface. You then define model parameters for the plugin, such as the following example:

    // Configure model parameters
    nvigi::CommonCreationParameters common{};
    common.utf8PathToModels = "path/to/models";
    common.numThreads = 8;                    // Number of CPU threads
    common.vramBudgetMB = vram;               // VRAM allocation in MB
    common.modelGUID = "{YOUR_MODEL_GUID}";   // Model GUID

    When the interface and model parameters are loaded, configure an instance or in-process inference:

    // Create GPT instance
    nvigi::InferenceInstance* gptInstance = nullptr;
    if (gpt->createInstance(common, &gptInstance) != nvigi::kResultOk || !gptInstance) {
        std::cerr << "Failed to create GPT instance." << std::endl;
    }
     
    // Use GPT instance for inference

    Create the runtime configuration and inference

    The InferenceInstance interface provides the API for executing inference tasks and is configured through InferenceExecutionContext. This interface enables the setup of input slots, runtime parameters, and callback mechanisms to retrieve model responses.

    Inference tasks rely on input data, such as player text or audio, combined with runtime configurations. The context for defining a game character’s personality and role can be established using the nvigi::kGPTDataSlotSystem slot:

    // Define NPC role in a system prompt
    std::string npcPrompt = "You are a helpful NPC named TJ in a fantasy game.";
    nvigi::CpuData systemPromptData(npcPrompt.length() + 1, npcPrompt.c_str());
    nvigi::InferenceDataText systemPromptSlot(systemPromptData);
     
    // Set runtime parameters
    nvigi::GPTRuntimeParameters runtime{};
    runtime.tokensToPredict = 200// Limit token prediction to 200 tokens
    runtime.interactive = true;     // Enable multi-turn conversations
    std::vector<nvigi::InferenceDataSlot> slots = {
        {nvigi::kGPTDataSlotSystem, &systemPromptSlot}
    };
     
    // Inference context
    nvigi::InferenceExecutionContext gptExecCtx{};
    gptExecCtx.instance = gptInstance;
    gptExecCtx.runtimeParameters = runtime;
    gptExecCtx.inputs = slots.data();

    Dynamic interaction between the player and game character can be managed as follows:

    std::string userInput = "What’s your name?"; // example user input
    nvigi::CpuData userInputData(userInput.length() + 1, userInput.c_str());
    nvigi::InferenceDataText userInputSlot(userInputData);
     
    slots = {{nvigi::kGPTDataSlotUser, &userInputSlot}};
    gptExecCtx.inputs = slots.data();

    Run inference and handling of the response asynchronously:

    if (gptExecCtx.instance->evaluate(&gptExecCtx) == nvigi::kResultOk) {
        std::cout << "Inference completed successfully!" << std::endl;
    }

    Each turn of the conversation is processed through the nvigi::kGPTDataSlotUser input slot, maintaining context for multi-turn dialogue. 

    You can also implement a callback function to capture the character response for display in-game, with the example shown in the next section.

    Enable GPU scheduling and rendering integration

    AI workloads in games run alongside rendering tasks, so effective GPU scheduling is crucial to maintain frame rates. NVIGI uses CIG to schedule GPU workloads efficiently.

    To schedule graphics and compute efficiently, NVIGI must get the D3D direct queue that your game is using for graphics. The D3D12Parameters structure ensures that NVIGI integrates directly with the rendering pipeline, enabling AI tasks to run in parallel without impacting graphical performance. 

    The following code example shows how to enable CIG for AI inference using the NVIGI IHWICuda interface, using the ASR plugin as an example:

    // Enable Compute-in-Graphics (CIG)
    nvigi::IHWICuda* icig = nullptr;
    if (nvigiGetInterface(nvigi::plugin::hwi::cuda::kId, &icig) != nvigi::kResultOk || !icig) {
        std::cerr << "Failed to load CIG interface." << std::endl;
        return;
    }
     
    // Set up D3D12 parameters
    nvigi::D3D12Parameters d3d12Params{};
    d3d12Params.device = myD3D12Device;            // D3D12 device used for rendering
    d3d12Params.queue = myD3D12CommandQueue;      // Graphics command queue
     
    // Activate GPU scheduling for both inference and rendering
    if (icig->enableComputeInGraphics(d3d12Params) != nvigi::kResultOk) {
        std::cerr << "Failed to enable Compute-in-Graphics." << std::endl;
        return;
    }
     
    std::cout << "Compute-in-Graphics enabled successfully." << std::endl;

    To set up NVIGI GPU scheduling in Unreal Engine 5 (UE5), use the global dynamic rendering hardware interface (RHI) to access the game’s D3D device and command queue. 

    Configuring CIG in UE5 is straightforward:

    // UE5-specific code to retrieve D3D12 resources
    #include "ID3D12DynamicRHI.h"
     
    ID3D12DynamicRHI* RHI = nullptr;
    if (GDynamicRHI && GDynamicRHI->GetInterfaceType() == ERHIInterfaceType::D3D12)
    {
        RHI = static_cast<ID3D12DynamicRHI*>(GDynamicRHI);
    }
     
    ID3D12CommandQueue* CmdQ = nullptr;
    ID3D12Device* D3D12Device = nullptr;
     
    if (RHI) {
        CmdQ = RHI->RHIGetCommandQueue();  // Get graphics command queue
        int DeviceIndex = 0;
        D3D12Device = RHI->RHIGetDevice(DeviceIndex);  // Get D3D12 device
    }
     
    // Configure D3D12 parameters for IGI
    nvigi::D3D12Parameters d3d12Params{};
    d3d12Params.device = D3D12Device;
    d3d12Params.queue = CmdQ;
     
    // Pass the parameters to IGI instances
    nvigi::CommonCreationParameters commonParams{};
    commonParams.chain(d3d12Params);
     
    // Example: Creating an ASR instance with CIG
    nvigi::ASRCreationParameters asrParams{};
    asrParams.common = &commonParams;
     
    nvigi::InferenceInstance* asrInstance = nullptr;
    iasr->createInstance(asrParams, &asrInstance);

    Run inference

    Inference tasks on NVIGI involve setting up a conversation context, processing user inputs, and generating responses dynamically. The following steps outline how to run inference tasks efficiently in your game environment.

    To execute inference, you must create an inference context, which includes the following:

    • Input slots: Prepare input data (user text, voice data) in a format that the model can process.
    • Runtime parameters: Define the behavior of the inference, such as the number of tokens to predict or interactivity settings.
    • Callback mechanisms: Specify how to handle the output results.

    The inference context defines how inputs and outputs are processed. Start by enabling interactive mode and preparing runtime parameters:

    // Configure runtime parameters for GPT
    nvigi::GPTRuntimeParameters runtime{};
    runtime.tokensToPredict = 200// Predict up to 200 tokens
    runtime.interactive = true;     // Enable interactive mode
     
    // Set up inference context
    nvigi::InferenceExecutionContext gptExecCtx{};
    gptExecCtx.instance = gptInstance;   // Use the GPT instance created earlier
    gptExecCtx.runtimeParameters = runtime;
     
    gptExecCtx.callback = [](const nvigi::InferenceExecutionContext* execCtx, nvigi::InferenceExecutionState state, void* userData) {
        if (state == nvigi::kInferenceExecutionStateDone && execCtx->outputs){
            const nvigi::InferenceDataText* responseText = nullptr;
            execCtx->outputs->findAndValidateSlot(nvigi::kGPTDataSlotResponse, &responseText);
            if (responseText) {
                std::cout << "NPC Response: " << responseText->getUtf8Text() << std::endl;
            }
        }
        return state;
    };

    You can begin the conversation by providing a system prompt that defines the NPC’s personality or role. Use the nvigi::kGPTDataSlotSystem slot for this purpose:

    // Set up conversation context
    std::string npcPrompt = "You are a helpful NPC in a fantasy game. Respond thoughtfully to player questions.";
     
    nvigi::CpuData systemPromptData(npcPrompt.length() + 1, npcPrompt.c_str());
    nvigi::InferenceDataText systemPromptSlot(systemPromptData);
     
    std::vector<nvigi::InferenceDataSlot> slots = {
        {nvigi::kGPTDataSlotSystem, &systemPromptSlot}  // Set the system prompt
    };
     
    gptExecCtx.inputs = slots.data();
    gptExecCtx.numInputs = slots.size();
     
    // Execute to initialize conversation context
    if (gptExecCtx.instance->evaluate(&gptExecCtx) != nvigi::kResultOk) {
        std::cerr << "Failed to initialize conversation context." << std::endl;
        return;
    }

    List of available NVIGI plugins

    You can begin building agentic frameworks for on-device inference with the following speech and intelligence models today.

    NVIGI PluginSupported Inference HardwareSupported Models
    Speech – ASR Local GGMLCUDA-Enabled GPU or CPUWhisper ASR
    Speech – ASR Local TRTCUDA-Enabled GPUNVIDIA RIVA ASR (coming soon)
    Language – GPT Local ONNX DMLONNX-supported GPU or CPU Mistral-7B-Instruct
    Language- GPT Local GGMLCUDA-Enabled GPU or CPULlama-3.2-3b InstructNemotron-Mini-4B-InstructMistral-Nemo-Minitron-2B-128k-InstructMistral-Nemo-Minitron-4B-128k-InstructMistral-Nemo-Minitron-8B-128k-InstructNemovision-4B-Instruct
    RAG – Embed Local GGMLCUDA-Enabled GPU or CPUE5 Large Unsupervised
    Table 1. Available speech and intelligence models

    Conclusion

    NVIDIA ACE and NVIGI represent the next step in the evolution of AI-driven game development. By combining ACE’s advanced generative AI models with NVIGI seamless integration and GPU-optimized performance, you can unlock new levels of interactivity and immersion. 

    From dynamic NPC dialogue and real-time speech recognition to lifelike animations and contextual memory, ACE and NVIGI provide a cost-effective, scalable solution for creating intelligent, autonomous characters. 

    Get started with NVIDIA ACE with NVIGI today.

    Discuss (0)
    +5

    Tags

    人人超碰97caoporen国产