This updated post was originally published on March 18, 2025. Organizations are embracing AI agents to enhance productivity and streamline operations. To maximize their impact, these agents need strong reasoning abilities to navigate complex problems, uncover hidden connections, and make logical decisions autonomously in dynamic environments. Due to their ability to tackle complex…
]]>As large language models (LLM) gain popularity in various question-answering systems, retrieval-augmented generation (RAG) pipelines have also become a focal point. RAG pipelines combine the generation power of LLMs with external data sources and retrieval mechanisms, enabling models to access domain-specific information that may not have existed during fine-tuning.
]]>Nearly 300,000 women across the globe die each year due to complications arising from pregnancy or childbirth. The number of stillborns and babies that die within their first month tops nearly 4M every year. April 7 marks World Health Day, which this year focuses on raising awareness about efforts to end preventable maternal and newborn deaths. Giving women and infants better access to…
]]>Join the hackathon to build open-source AI solutions, optimize models, enhance workflows, connect with peers, and win prizes.
]]>The newest generation of the popular Llama AI models is here with Llama 4 Scout and Llama 4 Maverick. Accelerated by NVIDIA open-source software, they can achieve over 40K output tokens per second on NVIDIA Blackwell B200 GPUs, and are available to try as NVIDIA NIM microservices. The Llama 4 models are now natively multimodal and multilingual using a mixture-of-experts (MoE) architecture.
]]>The compute demands for large language model (LLM) inference are growing rapidly, fueled by the combination of growing model sizes, real-time latency requirements, and, most recently, AI reasoning. At the same time, as AI adoption grows, the ability of an AI factory to serve as many users as possible, all while maintaining good per-user experiences, is key to maximizing the value it generates.
]]>The past few years have witnessed the rise in popularity of generative AI and large language models (LLMs), as part of a broad AI revolution. As LLM-based applications are rolled out across enterprises, there is a need to determine the cost efficiency of different AI serving solutions. The cost of an LLM application deployment depends on how many queries it can process per second while being…
]]>Since the release of ChatGPT in November 2022, the capabilities of large language models (LLMs) have surged, and the number of available models has grown exponentially. With this expansion, LLMs now vary widely in cost, performance, and specialization. For example, straightforward tasks like text summarization can be efficiently handled by smaller, general-purpose models. In contrast…
]]>From hyperlocal forecasts that guide daily operations to planet-scale models illuminating new climate insights, the world is entering a new frontier in weather and climate resilience. The combination of space-based observations and GPU-accelerated AI delivers near-instant, context-rich insights to enterprises, governments, researchers, and solution providers worldwide. It also marks a rare…
]]>Electric vehicles (EVs) are transforming transportation, but challenges such as cost, longevity, and range remain barriers to widespread adoption. At the heart of these challenges lies battery technology—specifically, the electrolyte, a critical component that enables energy storage and delivery. The electrolyte’s properties directly impact a battery’s charging speed, power output, stability…
]]>With emerging use cases such as digital humans, agents, podcasts, images, and video generation, generative AI is changing the way we interact with PCs. This paradigm shift calls for new ways of interfacing with and programming generative AI models. However, getting started can be daunting for PC developers and AI enthusiasts. Today, NVIDIA released a suite of NVIDIA NIM microservices on…
]]>Microsoft, in collaboration with NVIDIA, announced transformative performance improvements for the Meta Llama family of models on its Azure AI Foundry platform. These advancements, enabled by NVIDIA TensorRT-LLM optimizations, deliver significant gains in throughput, reduced latency, and improved cost efficiency, all while preserving the quality of model outputs. With these improvements…
]]>For years, advancements in AI have followed a clear trajectory through pretraining scaling: larger models, more data, and greater computational resources lead to breakthrough capabilities. In the last 5 years, pretraining scaling has increased compute requirements at an incredible rate of 50M times. However, building more intelligent systems is no longer just about pretraining bigger models.
]]>In the United Arab Emirates (UAE), extreme weather events disrupt daily life, delaying flights, endangering transportation, and complicating urban planning. High daytime temperatures limit human activity outdoors, while dense nighttime fog is a frequent cause of severe and often fatal car crashes. Meanwhile, 2024 saw the heaviest precipitation event in the country in 75 years…
]]>The growing volume and complexity of medical data—and the pressing need for early disease diagnosis and improved healthcare efficiency—are driving unprecedented advancements in medical AI. Among the most transformative innovations in this field are multimodal AI models that simultaneously process text, images, and video. These models offer a more comprehensive understanding of patient data than…
]]>Generative chemistry with AI has the potential to revolutionize how scientists approach drug discovery and development, health, and materials science and engineering. Instead of manually designing molecules with “chemical intuition” or screening millions of existing chemicals, researchers can train neural networks to propose novel molecular structures tailored to the desired properties.
]]>NVIDIA Parabricks is a scalable genomics analysis software suite that solves omics challenges with accelerated computing and deep learning to unlock new scientific breakthroughs. Released at NVIDIA GTC 2025, NVIDIA Parabricks v4.5 supports the growing quantity of data by including support for the latest NVIDIA GPU architectures, and improved alignment and variant calling with the…
]]>With the rise of physical AI, video content generation has surged exponentially. A single camera-equipped autonomous vehicle can generate more than 1 TB of video daily, while a robotics-powered manufacturing facility may produce 1 PB of data daily. To leverage this data for training and fine-tuning world foundation models (WFMs), you must first process it efficiently.
]]>Enterprises are generating and storing more multimodal data than ever before, yet traditional retrieval systems remain largely text-focused. While they can surface insights from written content, they aren’t extracting critical information embedded in tables, charts, and infographics—often the most information-dense elements of a document. Without a multimodal retrieval system…
]]>With the release of NVIDIA AgentIQ—an open-source library for connecting and optimizing teams of AI agents—developers, professionals, and researchers can create their own agentic AI applications. This tutorial shows you how to develop apps in AgentIQ through an example of AI code generation. We build a test-driven coding agent using LangGraph and reasoning models to scale test-time computation.
]]>As agentic AI systems evolve and become essential for optimizing business processes, it is crucial for developers to update them regularly to stay aligned with ever-changing business and user needs. Continuously refining these agents with AI and human feedback ensures that they remain effective and relevant. NVIDIA NeMo microservices is a fully accelerated, enterprise-grade solution designed…
]]>NVIDIA announced the release of NVIDIA Dynamo today at GTC 2025. NVIDIA Dynamo is a high-throughput, low-latency open-source inference serving framework for deploying generative AI and reasoning models in large-scale distributed environments. The framework boosts the number of requests served by up to 30x, when running the open-source DeepSeek-R1 models on NVIDIA Blackwell.
]]>NVIDIA announced world-record DeepSeek-R1 inference performance at NVIDIA GTC 2025. A single NVIDIA DGX system with eight NVIDIA Blackwell GPUs can achieve over 250 tokens per second per user or a maximum throughput of over 30,000 tokens per second on the massive, state-of-the-art 671 billion parameter DeepSeek-R1 model. These rapid advancements in performance at both ends of the performance…
]]>The next generation of AI-driven robots like humanoids and autonomous vehicles depends on high-fidelity, physics-aware training data. Without diverse and representative datasets, these systems don’t get proper training and face testing risks due to poor generalization, limited exposure to real-world variations, and unpredictable behavior in edge cases. Collecting massive real-world datasets for…
]]>AI is transforming how we experience our favorite games. It is unlocking new levels of visuals, performance, and gameplay possibilities with neural rendering and generative AI-powered characters. With game development becoming more complex, AI is also playing a role in helping artists and engineers realize their creative visions. At GDC 2025, NVIDIA is building upon NVIDIA RTX Kit…
]]>Building AI systems with foundation models requires a delicate balancing of resources such as memory, latency, storage, compute, and more. One size does not fit all for developers managing cost and user experience when bringing generative AI capability to the rapidly growing ecosystem of AI-powered applications. You need options for high-quality, customizable models that can support large…
]]>With the recent advancements in generative AI and vision foundational models, VLMs present a new wave of visual computing wherein the models are capable of highly sophisticated perception and deep contextual understanding. These intelligent solutions offer a promising means of enhancing semantic comprehension in XR settings. By integrating VLMs, developers can significantly improve how XR…
]]>Large language models (LLMs) have shown remarkable generalization capabilities in natural language processing (NLP). They are used in a wide range of applications, including translation, digital assistants, recommendation systems, context analysis, code generation, cybersecurity, and more. In automotive applications, there is growing demand for LLM-based solutions for both autonomous driving and…
]]>Training AI models on massive GPU clusters presents significant challenges for model builders. Because manual intervention becomes impractical as job scale increases, automation is critical to maintaining high GPU utilization and training productivity. An exceptional training experience requires resilient systems that provide low-latency error attribution and automatic fail over based on root…
]]>Learn from and connect with leading AI developers building the next generation of AI agents.
]]>Applications requiring high-performance information retrieval span a wide range of domains, including search engines, knowledge management systems, AI agents, and AI assistants. These systems demand retrieval processes that are accurate and computationally efficient to deliver precise insights, enhance user experiences, and maintain scalability. Retrieval-augmented generation (RAG) is used to…
]]>Join these sessions to learn how accelerated computing, generative AI, and physics-based world simulation are advancing physical and embodied AI.
]]>Discover cutting-edge AI and data science innovations from top generative AI teams at NVIDIA GTC 2025.
]]>Safeguarding AI agents and other conversational AI applications to ensure safe, on-brand and reliable behavior is essential for enterprises. NVIDIA NeMo Guardrails offers robust protection with AI guardrails for content safety, topic control, jailbreak detection, and more to evaluate and optimize guardrail performance. In this post, we explore techniques for measuring and optimizing your AI…
]]>AI agents are transforming business operations by automating processes, optimizing decision-making, and streamlining actions. Their effectiveness hinges on expert reasoning, enabling smarter planning and efficient execution. Agentic AI applications could benefit from the capabilities of models such as DeepSeek-R1. Built for solving problems that require advanced AI reasoning…
]]>As of 3/18/25, NVIDIA Triton Inference Server is now NVIDIA Dynamo. NAVER is a popular South Korean search engine company that offers Naver Place, a geo-based service that provides detailed information about millions of businesses and points of interest across Korea. Users can search about different places, leave reviews, and place bookings or orders in real time.
]]>Large language models (LLMs) have permeated every industry and changed the potential of technology. However, due to their massive size they are not practical for the current resource constraints that many companies have. The rise of small language models (SLMs) bridge quality and cost by creating models with a smaller resource footprint. SLMs are a subset of language models that tend to…
]]>In today’s data-driven world, the ability to retrieve accurate information from even modest amounts of data is vital for developers seeking streamlined, effective solutions for quick deployments, prototyping, or experimentation. One of the key challenges in information retrieval is managing the diverse modalities in unstructured datasets, including text, PDFs, images, tables, audio, video…
]]>A well-crafted systematic review is often the initial step for researchers exploring a scientific field. For scientists new to this field, it provides a structured overview of the domain. For experts, it refines their understanding and sparks new ideas. In 2024 alone, 218,650 review articles were indexed in the Web of Science database, highlighting the importance of these resources in research.
]]>Vision language models (VLMs) are evolving at a breakneck speed. In 2020, the first VLMs revolutionized the generative AI landscape by bringing visual understanding to large language models (LLMs) through the use of a vision encoder. These initial VLMs were limited in their abilities, only able to understand text and single image inputs. Fast-forward a few years and VLMs are now capable of…
]]>Chip and hardware design presents numerous challenges stemming from its complexity and advancing technologies. These challenges result in longer turn-around time (TAT) for optimizing performance, power, area, and cost (PPAC) during synthesis, verification, physical design, and reliability loops. Large language models (LLMs) have shown a remarkable capacity to comprehend and generate natural…
]]>There is an activity where people provide inputs to generative AI technologies, such as large language models (LLMs), to see if the outputs can be made to deviate from acceptable standards. This use of LLMs began in 2023 and has rapidly evolved to become a common industry practice and a cornerstone of trustworthy AI. How can we standardize and define LLM red teaming?
]]>Agentic workflows are the next evolution in AI-powered tools. They enable developers to chain multiple AI models together to perform complex activities, enable AI models to use tools to access additional data or automate user actions, and enable AI models to operate autonomously, analyzing and performing complex tasks with a minimum of human involvement or interaction. Because of their power…
]]>Generative AI, powered by advanced machine learning models and deep neural networks, is revolutionizing industries by generating novel content and driving innovation in fields like healthcare, finance, and entertainment. NVIDIA is leading this transformation with its cutting-edge GPU architectures and software ecosystems, such as the H100 Tensor Core GPU and CUDA platform…
]]>NVIDIA AI Enterprise is the cloud-native software platform for the development and deployment of production-grade AI solutions. The latest release of the NVIDIA AI Enterprise infrastructure software collection adds support for the latest NVIDIA data center GPU, NVIDIA H200 NVL, giving your enterprise new options for powering cutting-edge use cases such as agentic and generative AI with some of the…
]]>Traditional design and engineering workflows in the manufacturing industry have long been characterized by a sequential, iterative approach that is often time-consuming and resource intensive. These conventional methods typically involve stages such as requirement gathering, conceptual design, detailed design, analysis, prototyping, and testing, with each phase dependent on the results of previous…
]]>NVIDIA has consistently developed automatic speech recognition (ASR) models that set the benchmark in the industry. Earlier versions of NVIDIA Riva, a collection of GPU-accelerated speech and translation AI microservices for ASR, TTS, and NMT, support English-Spanish and English-Japanese code-switching ASR models based on the Conformer architecture, along with a model supporting multiple…
]]>Join us on February 27 to learn how to transform PDFs into AI podcasts using the NVIDIA AI Blueprint.
]]>Generative AI, especially with breakthroughs like AlphaFold and RosettaFold, is transforming drug discovery and how biotech companies and research laboratories study protein structures, unlocking groundbreaking insights into protein interactions. Proteins are dynamic entities. It has been postulated that a protein’s native state is known by its sequence of amino acids alone…
]]>AI has evolved from an experimental curiosity to a driving force within biological research. The convergence of deep learning algorithms, massive omics datasets, and automated laboratory workflows has allowed scientists to tackle problems once thought intractable—from rapid protein structure prediction to generative drug design, increasing the need for AI literacy among scientists.
]]>Learn from researchers, scientists, and industry leaders across a variety of topics including AI, robotics, and Data Science.
]]>Large language models (LLMs) that specialize in coding have been steadily adopted into developer workflows. From pair programming to self-improving AI agents, these models assist developers with various tasks, including enhancing code, fixing bugs, generating tests, and writing documentation. To promote the development of open-source LLMs, the Qwen team recently released Qwen2.5-Coder…
]]>Master prompt engineering, fine-tuning, and customization to build video analytics AI agents.
]]>As AI models extend their capabilities to solve more sophisticated challenges, a new scaling law known as test-time scaling or inference-time scaling is emerging. Also known as AI reasoning or long-thinking, this technique improves model performance by allocating additional computational resources during inference to evaluate multiple possible outcomes and then selecting the best one…
]]>Model pruning and knowledge distillation are powerful cost-effective strategies for obtaining smaller language models from an initial larger sibling. The How to Prune and Distill Llama-3.1 8B to an NVIDIA Llama-3.1-Minitron 4B Model post discussed the best practices of using large language models (LLMs) that combine depth, width, attention, and MLP pruning with knowledge distillation…
]]>In the rapidly evolving landscape of AI systems and workloads, achieving optimal model training performance extends far beyond chip speed. It requires a comprehensive evaluation of the entire stack, from compute to networking to model framework. Navigating the complexities of AI system performance can be difficult. There are many application changes that you can make…
]]>Explore the latest advancements in academia, including advanced research, innovative teaching methods, and the future of learning and technology.
]]>Translation plays an essential role in enabling companies to expand across borders, with requirements varying significantly in terms of tone, accuracy, and technical terminology handling. The emergence of sovereign AI has highlighted critical challenges in large language models (LLMs), particularly their struggle to capture nuanced cultural and linguistic contexts beyond English-dominant…
]]>Matrix multiplication and attention mechanisms are the computational backbone of modern AI workloads. While libraries like NVIDIA cuDNN provide highly optimized implementations, and frameworks such as CUTLASS offer deep customization, many developers and researchers need a middle ground that combines performance with programmability. The open-source Triton compiler on the NVIDIA Blackwell…
]]>NVIDIA AI Workbench is a free development environment manager to develop, customize, and prototype AI applications on your GPUs. AI Workbench provides a frictionless experience across PCs, workstations, servers, and cloud for AI, data science, and machine learning (ML) projects. The user experience includes: This post provides details about the January 2025 release of NVIDIA AI Workbench…
]]>Connect AI applications to enterprise data using embedding and reranking models for information retrieval.
]]>NVIDIA DLSS 4 is the latest iteration of DLSS introduced with the NVIDIA GeForce RTX 50 Series GPUs. It includes several new features: Here’s how you can get started with DLSS 4 in your integrations. This post focuses on the Streamline SDK, which provides a plug-and-play framework for simplified plugin integration. The NVIDIA Streamline SDK is an open-source framework that…
]]>NVIDIA recently announced a new generation of PC GPUs—the GeForce RTX 50 Series—alongside new AI-powered SDKs and tools for developers. Powered by the NVIDIA Blackwell architecture, fifth-generation Tensor Cores and fourth-generation RT Cores, the GeForce RTX 50 Series delivers breakthroughs in AI-driven rendering, including neural shaders, digital human technologies, geometry and lighting.
]]>Evaluating large language models (LLMs) and retrieval-augmented generation (RAG) systems is a complex and nuanced process, reflecting the sophisticated and multifaceted nature of these systems. Unlike traditional machine learning (ML) models, LLMs generate a wide range of diverse and often unpredictable outputs, making standard evaluation metrics insufficient. Key challenges include the…
]]>Despite the success of large language models (LLMs) as general-purpose AI tools, their high demand for computational resources make their deployment challenging in many real-world scenarios. The sizes of the model and conversation state are limited by the available high-bandwidth memory, limiting the number of users that can be served and the maximum conversation length. At present…
]]>As of 3/18/25, NVIDIA Triton Inference Server is now NVIDIA Dynamo. The explosion of AI-driven applications has placed unprecedented demands on both developers, who must balance delivering cutting-edge performance with managing operational complexity and cost, and AI infrastructure. NVIDIA is empowering developers with full-stack innovations—spanning chips, systems…
]]>As of 3/18/25, NVIDIA Triton Inference Server is now NVIDIA Dynamo. NVIDIA NIM microservices are model inference containers that can be deployed on Kubernetes. In a production environment, it’s important to understand the compute and memory profile of these microservices to set up a successful autoscaling plan. In this post, we describe how to set up and use Kubernetes Horizontal Pod…
]]>At NVIDIA, the Sales Operations team equips the Sales team with the tools and resources needed to bring cutting-edge hardware and software to market. Managing this across NVIDIA’s diverse technology is a complex challenge shared by many enterprises. Through collaboration with our Sales team, we found that they rely on internal and external documentation…
]]>Language models generate text by predicting the next token, given all the previous tokens including the input text tokens. Key and value elements of the previous tokens are used as historical context in LLM serving for generation of the next set of tokens. Caching these key and value elements from previous tokens avoids expensive recomputation and effectively leads to higher throughput. However…
]]>The introduction of the NVIDIA Jetson Orin Nano Super Developer Kit sparked a new age of generative AI for small edge devices. The new Super Mode delivered an unprecedented generative AI performance boost of up to 1.7x on the developer kit, making it the most affordable generative AI supercomputer. JetPack 6.2 is now available to support Super Mode for Jetson Orin Nano and Jetson Orin NX…
]]>AI agents present a significant opportunity for businesses to scale and elevate customer service and support interactions. By automating routine inquiries and enhancing response times, these agents improve efficiency and customer satisfaction, helping organizations stay competitive. However, alongside these benefits, AI agents come with risks. Large language models (LLMs) are vulnerable to…
]]>In recent years, large language models (LLMs) have achieved extraordinary progress in areas such as reasoning, code generation, machine translation, and summarization. However, despite their advanced capabilities, foundation models have limitations when it comes to domain-specific expertise such as finance or healthcare or capturing cultural and language nuances beyond English.
]]>Generative AI has revolutionized how people bring ideas to life, and agentic AI represents the next leap forward in this technological evolution. By leveraging sophisticated, autonomous reasoning and iterative planning, AI agents can tackle complex, multistep problems with remarkable efficiency. As AI continues to revolutionize industries, the demand for running AI models locally has surged.
]]>In a recent DC Anti-Conference Live presentation, Wade Vinson, chief data center distinguished engineer at NVIDIA, shared insights based upon work by NVIDIA designing, building, and operating NVIDIA DGX SuperPOD multi-megawatt data centers since 2016. NVIDIA is helping make data centers more accessible, resource-efficient, energy-efficient, and business-efficient, as well as scalable to any…
]]>In the rapidly evolving landscape of artificial intelligence, the quality of the data used for training models is paramount. High-quality data ensures that models are accurate, reliable, and capable of generalizing well across various applications. The recent NVIDIA webinar, Enhance Generative AI Model Accuracy with High-Quality Multimodal Data Processing, dove into the intricacies of data…
]]>Traditional computational drug discovery relies almost exclusively on highly task-specific computational models for hit identification and lead optimization. Adapting these specialized models to new tasks requires substantial time, computational power, and expertise—challenges that grow when researchers simultaneously work across multiple targets or properties.
]]>Designing a therapeutic protein that specifically binds its target in drug discovery is a staggering challenge. Traditional workflows are often a painstaking trial-and-error process—iterating through thousands of candidates, each synthesis and validation round taking months if not years. Considering the average human protein is 430 amino acids long, the number of possible designs translates to…
]]>NVIDIA is excited to announce the release of Nemotron-CC, a 6.3-trillion-token English language Common Crawl dataset for pretraining highly accurate large language models (LLMs), including 1.9 trillion tokens of synthetically generated data. One of the keys to training state-of-the-art LLMs is a high-quality pretraining dataset, and recent top LLMs, such as the Meta Llama series…
]]>Powered by the new GB10 Grace Blackwell Superchip, Project DIGITS can tackle large generative AI models of up to 200B parameters.
]]>As robotics and autonomous vehicles advance, accelerating development of physical AI—which enables autonomous machines to perceive, understand, and perform complex actions in the physical world—has become essential. At the center of these systems are world foundation models (WFMs)—AI models that simulate physical states through physics-aware videos, enabling machines to make accurate decisions and…
]]>Tune in January 16th at 9:00 AM PT for a live recap, followed by a Q&A of the latest developer announcements at CES 2025.
]]>Generative AI has evolved from text-based models to multimodal models, with a recent expansion into video, opening up new potential uses across various industries. Video models can create new experiences for users or simulate scenarios for training autonomous agents at scale. They are helping revolutionize various industries including robotics, autonomous vehicles, and entertainment.
]]>AI development has become a core part of modern software engineering, and NVIDIA is committed to finding ways to bring optimized accelerated computing to every developer that wants to start experimenting with AI. To address this, we’ve been working on making the accelerated computing stack more accessible with NVIDIA Launchables: preconfigured GPU computing environments that enable you to…
]]>This post was originally published July 29, 2024 but has been extensively revised with NVIDIA AI Blueprint information. Traditional video analytics applications and their development workflow are typically built on fixed-function, limited models that are designed to detect and identify only a select set of predefined objects. With generative AI, NVIDIA NIM microservices…
]]>Training physical AI models used to power autonomous machines, such as robots and autonomous vehicles, requires huge amounts of data. Acquiring large sets of diverse training data can be difficult, time-consuming, and expensive. Data is often limited due to privacy restrictions or concerns, or simply may not exist for novel use cases. In addition, the available data may not apply to the full range…
]]>Agentic AI, the next wave of generative AI, is a paradigm shift with the potential to revolutionize industries by enabling AI systems to act autonomously and achieve complex goals. Agentic AI combines the power of large language models (LLMs) with advanced reasoning and planning capabilities, opening a world of possibilities across industries, from healthcare and finance to manufacturing and…
]]>NVIDIA today unveiled next-generation hardware for gamers, creators, and developers—the GeForce RTX 50 Series desktop and laptop GPUs. Alongside these GPUs, NVIDIA introduced NVIDIA RTX Kit, a suite of neural rendering technologies to ray trace games with AI, render scenes with immense geometry, and create game characters with lifelike visuals. RTX Kit enhances geometry, textures, materials…
]]>Innovation in medical devices continues to accelerate, with a record number authorized by the FDA every year. When these new or updated devices are introduced to clinicians and patients, they require training to use them properly and safely. Once in use, clinicians or patients may need help troubleshooting issues. Medical devices are often accompanied by lengthy and technically complex…
]]>Classifier models are specialized in categorizing data into predefined groups or classes, playing a crucial role in optimizing data processing pipelines for fine-tuning and pretraining generative AI models. Their value lies in enhancing data quality by filtering out low-quality or toxic data, ensuring only clean and relevant information feeds downstream processes. Beyond filtering…
]]>Filmmaking is an intricate and complex process that involves a diverse team of artists, writers, visual effects professionals, technicians, and countless other specialists. Each member brings their unique expertise to the table, collaborating to transform a simple idea into a captivating cinematic experience. From the initial spark of a story to the final cut, every step requires creativity…
]]>Large language models (LLMs) are rapidly changing the business landscape, offering new capabilities in natural language processing (NLP), content generation, and data analysis. These AI-powered tools have improved how companies operate, from streamlining customer service to enhancing decision-making processes. However, despite their impressive general knowledge, LLMs often struggle with…
]]>Recurrent drafting (referred to as ReDrafter) is a novel speculative decoding technique developed and open-sourced by Apple for large language model (LLM) inference now available with NVIDIA TensorRT-LLM. ReDrafter helps developers significantly boost LLM workload performance on NVIDIA GPUs. NVIDIA TensorRT-LLM is a library for optimizing LLM inference. It provides an easy-to-use Python API to…
]]>Knowledge distillation is an approach for transferring the knowledge of a much larger teacher model to a smaller student model, ideally yielding a compact, easily deployable student with comparable accuracy to the teacher. Knowledge distillation has gained popularity in pretraining settings, but there are fewer resources available for performing knowledge distillation during supervised fine-tuning…
]]>NVIDIA just announced a series of small language models (SLMs) that increase the amount and type of information digital humans can use to augment their responses. This includes new large-context models that provide more relevant answers and new multi-modal models that allow images as inputs. These models are available now as part of NVIDIA ACE, a suite of digital human technologies that brings…
]]>Meta’s Llama collection of open large language models (LLMs) continues to grow with the recent addition of Llama 3.3 70B, a text-only instruction-tuned model. Llama 3.3 provides enhanced performance respective to the older Llama 3.1 70B model and can even match the capabilities of the larger, more computationally expensive Llama 3.1 405B model on several tasks including math, reasoning, coding…
]]>Efficient text retrieval is critical for a broad range of information retrieval applications such as search, question answering, semantic textual similarity, summarization, and item recommendation. It also plays a pivotal role in retrieval-augmented generation (RAG), a technique that enables large language models (LLMs) to access external context without modifying underlying parameters.
]]>The generative AI landscape is rapidly evolving, with new large language models (LLMs), visual language models (VLMs), and vision language action (VLA) models emerging daily. To stay at the forefront of this transformative era, developers need a platform powerful enough to seamlessly deploy the latest models from the cloud to the edge with optimized inferencing and open ML frameworks using CUDA.
]]>Agentic AI workflows often involve the execution of large language model (LLM)-generated code to perform tasks like creating data visualizations. However, this code should be sanitized and executed in a safe environment to mitigate risks from prompt injection and errors in the returned code. Sanitizing Python with regular expressions and restricted runtimes is insufficient…
]]>