Speech AI

Mar 04, 2025
Top Conversational AI Sessions at NVIDIA GTC 2025
Learn how to accelerate the full pipeline, from multilingual speech recognition and translation to generative AI and speech synthesis.
1 MIN READ

Feb 26, 2025
Latest Multimodal Addition to Microsoft Phi SLMs Trained on NVIDIA GPUs
Large language models (LLMs) have permeated every industry and changed the potential of technology. However, due to their massive size they are not practical...
4 MIN READ

Feb 20, 2025
Deploying NVIDIA Riva Multilingual ASR with Whisper and Canary Architectures While Selectively Deactivating NMT
NVIDIA has consistently developed automatic speech recognition (ASR) models that set the benchmark in the industry. Earlier versions of NVIDIA Riva, a...
12 MIN READ

Sep 18, 2024
Quickly Voice Your Apps with NVIDIA NIM Microservices for Speech and Translation
NVIDIA NIM, part of NVIDIA AI Enterprise, provides containers to self-host GPU-accelerated inferencing microservices for pretrained and customized AI models...
11 MIN READ

Sep 05, 2024
Achieving State-of-the-Art Zero-Shot Waveform Audio Generation across Audio Types
Stunning audio content is an essential component of virtual worlds. Audio generative AI plays a key role in creating this content, and NVIDIA is continuously...
6 MIN READ

Aug 05, 2024
Developing Robust Georgian Automatic Speech Recognition with FastConformer Hybrid Transducer CTC BPE
Building an effective automatic speech recognition (ASR) model for underrepresented languages presents unique challenges due to limited data resources. In...
9 MIN READ

Jul 02, 2024
Addressing Hallucinations in Speech Synthesis LLMs with the NVIDIA NeMo T5-TTS Model
NVIDIA NeMo has released the T5-TTS model, a significant advancement in text-to-speech (TTS) technology. Based on large language models (LLMs), T5-TTS produces...
4 MIN READ

Apr 18, 2024
New Standard for Speech Recognition and Translation from the NVIDIA NeMo Canary Model
NVIDIA NeMo is an end-to-end platform for the development of multimodal generative AI models at scale anywhere—on any cloud and on-premises. The NeMo team...
4 MIN READ

Apr 18, 2024
Turbocharge ASR Accuracy and Speed with NVIDIA NeMo Parakeet-TDT
NVIDIA NeMo, an end-to-end platform for developing multimodal generative AI models at scale anywhere—on any cloud and on-premises—recently released...
6 MIN READ

Apr 18, 2024
Pushing the Boundaries of Speech Recognition with NVIDIA NeMo Parakeet ASR Models
NVIDIA NeMo, an end-to-end platform for the development of multimodal generative AI models at scale anywhere—on any cloud and on-premises—released the...
6 MIN READ

Mar 19, 2024
NVIDIA Speech and Translation AI Models Set Records for Speed and Accuracy
Speech and translation AI models developed at NVIDIA are pushing the boundaries of performance and innovation. The NVIDIA Parakeet automatic speech recognition...
8 MIN READ

Feb 29, 2024
Event: Speech and Generative AI Developer Day at NVIDIA GTC 2024
Learn how to build a RAG-powered application with a human voice interface at NVIDIA GTC 2024 Speech and Generative AI Developer Day.
1 MIN READ

Jan 16, 2024
New Support for Dutch and Persian Released by NVIDIA NeMo ASR
Breaking barriers in speech recognition, NVIDIA NeMo proudly presents pretrained models tailored for Dutch and Persian—languages often overlooked in the AI...
2 MIN READ

Jan 09, 2024
Enhancing Phone Customer Service with ASR Customization
At the core of understanding people correctly and having natural conversations is automatic speech recognition (ASR). To make customer-led voice assistants and...
7 MIN READ

Jan 08, 2024
Spotlight: Convai Reinvents Non-Playable Character Interactions
Convai is a versatile developer platform for designing characters with advanced multimodal perception abilities. These characters are designed to integrate...
5 MIN READ

Dec 04, 2023
Create Lifelike Avatars with AI Animation and Speech Features in NVIDIA ACE
NVIDIA today unveiled major upgrades to the NVIDIA Avatar Cloud Engine (ACE) suite of technologies, bringing enhanced realism and accessibility to AI-powered...
3 MIN READ