Learn how to accelerate the full pipeline, from multilingual speech recognition and translation to generative AI and speech synthesis.
]]>Large language models (LLMs) have permeated every industry and changed the potential of technology. However, due to their massive size they are not practical for the current resource constraints that many companies have. The rise of small language models (SLMs) bridge quality and cost by creating models with a smaller resource footprint. SLMs are a subset of language models that tend to��
]]>NVIDIA has consistently developed automatic speech recognition (ASR) models that set the benchmark in the industry. Earlier versions of NVIDIA Riva, a collection of GPU-accelerated speech and translation AI microservices for ASR, TTS, and NMT, support English-Spanish and English-Japanese code-switching ASR models based on the Conformer architecture, along with a model supporting multiple��
]]>NVIDIA NIM, part of NVIDIA AI Enterprise, provides containers to self-host GPU-accelerated inferencing microservices for pretrained and customized AI models across clouds, data centers, and workstations. NIM microservices for speech and translation are now available. The new speech and translation microservices leverage NVIDIA Riva and provide automatic speech recognition (ASR)��
]]>Stunning audio content is an essential component of virtual worlds. Audio generative AI plays a key role in creating this content, and NVIDIA is continuously pushing the limits in this field of research. BigVGAN, developed in collaboration with the NVIDIA Applied Deep Learning Research and NVIDIA NeMo teams, is a generative AI model specialized in audio waveform synthesis that achieves state-of��
]]>Building an effective automatic speech recognition (ASR) model for underrepresented languages presents unique challenges due to limited data resources. In this post, I discuss the best practices for preparing the dataset, configuring the model, and training it effectively. I also discuss the evaluation metrics and the encountered challenges. By following these practices��
]]>NVIDIA NeMo has released the T5-TTS model, a significant advancement in text-to-speech (TTS) technology. Based on large language models (LLMs), T5-TTS produces more accurate and natural-sounding speech. By improving alignment between text and audio, T5-TTS eliminates hallucinations such as repeated spoken words and skipped text. Additionally, T5-TTS makes up to 2x fewer word pronunciation errors��
]]>NVIDIA NeMo is an end-to-end platform for the development of multimodal generative AI models at scale anywhere��on any cloud and on-premises. The NeMo team just released?Canary, a multilingual model that transcribes speech in English, Spanish, German, and French with punctuation and capitalization. Canary also provides bi-directional translation, between English and the three other supported��
]]>NVIDIA NeMo, an end-to-end platform for developing multimodal generative AI models at scale anywhere��on any cloud and on-premises��recently released Parakeet-TDT. This new addition to the?NeMo ASR Parakeet model family boasts better accuracy and 64% greater speed over the previously best model, Parakeet-RNNT-1.1B. This post explains Parakeet-TDT and how to use it to generate highly accurate��
]]>NVIDIA NeMo, an end-to-end platform for the development of multimodal generative AI models at scale anywhere��on any cloud and on-premises��released the Parakeet family of automatic speech recognition (ASR) models. These state-of-the-art ASR models, developed in collaboration with Suno.ai, transcribe spoken English with exceptional accuracy. This post details Parakeet ASR models that are��
]]>Speech and translation AI models developed at NVIDIA are pushing the boundaries of performance and innovation. The NVIDIA Parakeet automatic speech recognition (ASR) family of models and the NVIDIA Canary multilingual, multitask ASR and translation model currently top the Hugging Face Open ASR Leaderboard. In addition, a multilingual P-Flow-based text-to-speech (TTS) model won the LIMMITS ��24��
]]>Learn how to build a RAG-powered application with a human voice interface at NVIDIA GTC 2024 Speech and Generative AI Developer Day.
]]>Breaking barriers in speech recognition, NVIDIA NeMo proudly presents pretrained models tailored for Dutch and Persian��languages often overlooked in the AI landscape. These models leverage the recently introduced FastConformer architecture and were trained simultaneously with CTC and transducer objectives to maximize each model��s accuracy. Automatic speech recognition (ASR) is a��
]]>At the core of understanding people correctly and having natural conversations is automatic speech recognition (ASR). To make customer-led voice assistants and automate customer service interactions over the phone, companies must solve the unique challenge of gaining a caller��s trust through qualities such as understanding, empathy, and clarity. Telephony-bound voice is inherently challenging��
]]>Convai is a versatile developer platform for designing characters with advanced multimodal perception abilities. These characters are designed to integrate seamlessly into both the virtual and real worlds. Whether you��re a creator, game designer, or developer, Convai enables you to quickly modify a non-playable character (NPC), from backstory and knowledge to voice and personality.
]]>NVIDIA today unveiled major upgrades to the NVIDIA Avatar Cloud Engine (ACE) suite of technologies, bringing enhanced realism and accessibility to AI-powered avatars and digital humans. These latest animation and speech capabilities enable more natural conversations and emotional expressions. Developers can now easily implement and scale intelligent avatars across applications using new��
]]>Meetings are the lifeblood of an organization. They foster collaboration and informed decision-making. They eliminate silos through brainstorming and problem-solving. And they further strategic goals and planning. Yet, leading meetings that accomplish these goals��especially those involving cross-functional teams and external participants��can be challenging. A unique blend of people��
]]>The integration of speech and translation AI into our daily lives is rapidly reshaping our interactions, from virtual assistants to call centers and augmented reality experiences. Speech AI Day provided valuable insights into the latest advancements in speech AI, showcasing how this technology addresses real-world challenges. In this first of three Speech AI Day sessions��
]]>From start-ups to large enterprises, businesses use cloud marketplaces to find the new solutions needed to quickly transform their businesses. Cloud marketplaces are online storefronts where customers can purchase software and services with flexible billing models, including pay-as-you-go, subscriptions, and privately negotiated offers. Businesses further benefit from committed spending at��
]]>On Sept. 20, join experts from leading companies at NVIDIA-hosted Speech AI Day.
]]>The Speech AI Summit is an annual conference that brings together experts in the field of AI and speech technology to discuss the latest industry trends and advancements. This post summarizes the top questions asked during Overview of Zero-Shot Multi-Speaker TTS System, a recorded talk from the 2022 summit featuring Coqui.ai. Text-to-speech (TTS) systems have significantly advanced in��
]]>Voice-enabled technology is becoming ubiquitous. But many are being left behind by an anglocentric and demographically biased algorithmic world. Mozilla Common Voice (MCV) and NVIDIA are collaborating to change that by partnering on a public crowdsourced multilingual speech corpus��now the largest of its kind in the world��and open-source pretrained models. It is now easier than ever before to��
]]>The telecom sector is transforming how communication happens. Striving to provide reliable, uninterrupted service, businesses are tackling the challenge of delivering an optimal customer experience. This optimal customer experience is something many long-time customers of large telecom service providers do not have. Take Jack, for example. His call was on hold for 10 minutes��
]]>Join Infosys, NVIDIA, and Quantiphi on June 7 to learn how to use speech and translation AI to improve agent-assist solutions in multiple languages.
]]>Agent-assist technology uses AI and ML to provide facts and make real-time suggestions that help human agents across retail, telecom, and other industries conduct conversations with customers.
]]>The telecommunication industry has seen a proliferation of AI-powered technologies in recent years, with speech recognition and translation leading the charge. Multi-lingual AI virtual assistants, digital humans, chatbots, agent assists, and audio transcription are technologies that are revolutionizing the telco industry. Businesses are implementing AI in call centers to address incoming requests��
]]>Join Infosys, Quantiphi, Talkmap, and NVIDIA on May 31 for a live webinar to learn how telecommunications companies are using AI to improve operational efficiency and enhance customer engagement.
]]>This hands-on workshop guides you through the process of voice-enabling your product, from familiarizing yourself with NVIDIA Riva to assessing the costs and resources required for your project.
]]>NVIDIA showed how AI workflows can be leveraged to help you accelerate the development of AI solutions to address a range of use cases at NVIDIA GTC 2023. AI workflows are cloud-native, packaged reference examples showing how NVIDIA AI frameworks can be used to efficiently build AI solutions such as intelligent virtual assistants, digital fingerprinting for cybersecurity��
]]>Learn about the latest tools, trends, and technologies for building and deploying conversational AI.
]]>Explore the latest advances in accurate and customizable automatic speech recognition, multi-language translation, and text-to-speech.
]]>Over 55% of the global population uses social media, easily sharing online content with just one click. While connecting with others and consuming entertaining content, you can also spot harmful narratives posing real-life threats. That��s why VP of Engineering at Pendulum, Ammar Haris, wants his company��s AI to help clients to gain deeper insight into the harmful content being generated��
]]>Multilingual automatic speech recognition (ASR) models have gained significant interest because of their ability to transcribe speech in more than one language. This is fueled by the growing multilingual communities as well as by the need to reduce complexity. You only need one model to handle multiple languages. This post explains how to use pretrained multilingual NeMo ASR models from the��
]]>Learn to build an engaging and intelligent virtual assistant with NVIDIA AI workflows powered by NVIDIA Riva in this free hands-on lab from NVIDIA LaunchPad��
]]>Join this webinar on January 25 and learn how to build a voice-enabled intelligent virtual assistant to improve customer experiences at contact centers.
]]>OpenAI researchers recently released a paper describing the development of GPT-3, a state-of-the-art language model made up of 175 billion parameters. For comparison, the previous version, GPT-2, was made up of 1.5 billion parameters. The largest Transformer-based language model was released by Microsoft earlier this month and is made up of 17 billion parameters. ��GPT-3 achieves strong��
]]>