With emerging use cases such as digital humans, agents, podcasts, images, and video generation, generative AI is changing the way we interact with PCs. This paradigm shift calls for new ways of interfacing with and programming generative AI models. However, getting started can be daunting for PC developers and AI enthusiasts.
Today, NVIDIA released a suite of NVIDIA NIM microservices on NVIDIA RTX AI PCs to jumpstart AI development and experimentation on the PC. NIM microservices are currently in beta and offer AI foundation models – spanning language, speech, animation, content generation, and vision capabilities.
The easy-to-use industry-standard APIs help you kick your AI journey into high gear, from experimentation to building using NVIDIA NIM on NVIDIA RTX AI PCs . They are easy to download and run, span the top modalities for PC development, and are compatible with top ecosystem applications and tools.
How NVIDIA NIM works
Bringing AI to the PC poses unique challenges. The AI software stack is evolving rapidly from libraries and frameworks to SDKs and models. The number of combinations of this software stack is enormous, and any incompatibility with a single layer of this stack causes entire workflows to break. The unique constraints of making AI performant on the PC also requires intricate resource management, stringent latency, and throughput requirements.
On the PC, developers have been curating models, adapting the models to their application needs with custom data, quantizing them to optimize memory resource utilization, and connecting to custom PC-only inference backends.
NIM helps address these challenges. NIM provides prepackaged, state-of-the-art AI models that are optimized for deployment across NVIDIA GPUs. The NIM microservice is packaged as a container to self-host accelerated microservices for pretrained and customized AI models. It is built with pre-optimized inference engines for NVIDIA GPUs, including NVIDIA TensorRT and TensorRT-LLM.
With NIM, you get standard APIs, and a unified development and deployment experience across NVIDIA platforms from the cloud and datacenter to the NVIDIA RTX AI PC and workstation.
The GeForce RTX 50 Series GPUs include support for FP4 compute and come with up to 32 GB of VRAM, that help boost AI inference performance up to 2x and run larger generative AI models locally on device.
NIM microservices are also optimized for the new NVIDIA GeForce RTX 50 Series GPUs based on the NVIDIA Blackwell architecture.
On NVIDIA RTX AI PCs, NIM microservices run through WSL2. NVIDIA and Microsoft collaborated to bring CUDA acceleration to WSL2, making it possible for you to run NIM microservices using the Podman container toolkit and runtime on WSL2. With this, you can build, run, share, and verify your AI workloads anywhere.

Get started with NIM microservices on the NVIDIA RTX AI PC
The new suite of NIM microservices for NVIDIA RTX AI PCs span use cases such as LLMs, VLMs, image generation, speech, embedding models for RAG, PDF extraction, and computer vision.
- Language and reasoning:
- Image generation: Flux.dev
- Audio:
- RAG: Llama-3.2-NV-EmbedQA-1B-v2
- Computer vision and understanding: NV-CLIP, PaddleOCR, Yolo-X-v1
There are several ways to get started with NIM microservices on the PC today:
- Download from the NVIDIA API Catalog
- Integrate with other frameworks
- Use NVIDIA RTX AI PC interfaces
NVIDIA API Catalog
Download, install, and run NIM microservices.
Select your microservice, and choose Deploy. For Target environment, choose Windows on RTX AI PCs (Beta).
Integrate with other frameworks
Integrate with application development tools and frameworks, including low-code and no-code tools:
With these native integrations, you can connect your workflows built on these frameworks to AI models running in NIM through industry-standard endpoints, enabling you to use the latest technology with a unified interface across cloud, datacenter, workstation, and PC.
As an example, here’s how to access NIM microservices on Flowise:
- In Flowise, choose Chat Models and drag the Chat NVIDIA NIM node onto the board.
- Choose Set up NIM Locally and download the NIM installer.
- When complete, select a model to download and choose Next.
- After the model finishes downloading, configure memory constraints as needed and set the host port to any that is not actively in use to run the microservice.
- Start the container, save the chatflow, and begin prompting in the chat window.
Use NVIDIA RTX AI PC interfaces
Experience NIM on NVIDIA RTX AI PCs through user-friendly interfaces:
- AnythingLLM
- ComfyUI (coming soon)
- Microsoft AI Toolkit for VS Code
- ChatRTX
As an example, here’s how to use NIM with AnythingLLM:
- In AnythingLLM, choose Config > AI Providers > LLM.
- Under Provider, select NVIDIA NIM and choose Run NVIDIA NIM Installer.
- After the NVIDIA installer is complete, choose Swap to Managed Mode.
- Choose Import NIMs from NVIDIA, select a model to download, and accept the terms of use.
- Set the desired NIM model as active, choose Start NIM, and navigate back to workspaces.
- Open a new chat and begin prompting.
Video 1. Run State-of-the-art LLMs on RTX | NVIDIA NIM x AnythingLLM
Here’s an example of how to use NIM with the Microsoft AI Toolkit for VS Code:
- Use the NVIDIA NIM Installer to set up the environment for NIM microservices.
- On the AI Toolkit extension tab in Visual Studio Code, choose Model catalog.
- To view available models, choose Hosted by > NVIDIA NIM.
- To download a model, choose Add. When complete, choose Playground > Model selection and launch NIM locally. After the model loads, the microservice on AI Toolkit is ready for action.
Check out the user guide that highlights step-by-step instructions on using NIM microservices with ChatRTX. user guide that highlights step-by-step instructions on using NIM microservices with ChatRTX.
Experience the power of NIM through AI Blueprints
Coming soon, on NVIDIA RTX AI PCs, NVIDIA AI Blueprints will provide you with a head-start on building generative AI workflows with NIM microservices. These AI blueprints act as reference samples with everything that is needed—including NIM microservices, sample code, documentation—to build advanced AI workflows that run locally. Moreover, these blueprints are modular and can be fully customized quickly for any use case.
Example blueprints include the following:
- PDF to Podcast: Transforms documents into audio content so that you can learn on-the-go. It extracts text, images, and tables from a PDF. Based on the AI-generated script, it can then generate a full podcast using available voices. Users can have a real-time conversation with podcast hosts to learn more about specific topics, with retrieval-augmented-generation (RAG) techniques.
- 3D Guided Generative AI: Gives you full control of image generation. With Flux image models and a 3D rendering app similar to Blender, you can define scene elements, adjust camera angles, and use AI to enhance composition and structure into high-quality visuals. This workflow integrates with the Flux NIM microservice, Flux.dev models, and ComfyUI and is easily accessible through a one-click installer.
Summary
For building and experimentation, get started with NVIDIA NIM on the NVIDIA RTX AI PC.
Stay connected and up to date by joining the NVIDIA Developer Discord community.For technical support, visit the NVIDIA Developer Forums to get your questions answered.