Speech Recognition / Diarization – NVIDIA Technical Blog

Speech Recognition / Diarization – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2025-03-27T16:00:00Z http://www.open-lab.net/blog/feed/ Elena Rastorgueva <![CDATA[New Standard for Speech Recognition and Translation from the NVIDIA NeMo Canary Model]]> http://www.open-lab.net/blog/?p=80661 2024-08-06T17:19:16Z 2024-04-18T20:09:33Z

NVIDIA NeMo is an end-to-end platform for the development of multimodal generative AI models at scale anywhere��on any cloud and on-premises. The NeMo team...]]>

NVIDIA NeMo is an end-to-end platform for the development of multimodal generative AI models at scale anywhere��on any cloud and on-premises. The NeMo team... Decorative image of text and speech recognition processes encircling the globe.

Decorative image of text and speech recognition processes encircling the globe.

NVIDIA NeMo is an end-to-end platform for the development of multimodal generative AI models at scale anywhere��on any cloud and on-premises. The NeMo team just released?Canary, a multilingual model that transcribes speech in English, Spanish, German, and French with punctuation and capitalization. Canary also provides bi-directional translation, between English and the three other supported��

]]> 1 Hainan Xu <![CDATA[Turbocharge ASR Accuracy and Speed with NVIDIA NeMo Parakeet-TDT]]> http://www.open-lab.net/blog/?p=80732 2024-08-12T16:06:21Z 2024-04-18T20:03:54Z

NVIDIA NeMo, an end-to-end platform for developing multimodal generative AI models at scale anywhere��on any cloud and on-premises��recently released...]]>

NVIDIA NeMo, an end-to-end platform for developing multimodal generative AI models at scale anywhere��on any cloud and on-premises��recently released...

asr-graphic

NVIDIA NeMo, an end-to-end platform for developing multimodal generative AI models at scale anywhere��on any cloud and on-premises��recently released Parakeet-TDT. This new addition to the?NeMo ASR Parakeet model family boasts better accuracy and 64% greater speed over the previously best model, Parakeet-RNNT-1.1B. This post explains Parakeet-TDT and how to use it to generate highly accurate��

]]> 0 Somshubra Majumdar <![CDATA[Pushing the Boundaries of Speech Recognition with NVIDIA NeMo Parakeet ASR Models]]> http://www.open-lab.net/blog/?p=80564 2024-08-12T16:07:43Z 2024-04-18T20:03:07Z

NVIDIA NeMo, an end-to-end platform for the development of multimodal generative AI models at scale anywhere��on any cloud and on-premises��released the...]]>

NVIDIA NeMo, an end-to-end platform for the development of multimodal generative AI models at scale anywhere��on any cloud and on-premises��released the... Image of two people sitting in their cubicles with speech recognition visualizations in the background.

Image of two people sitting in their cubicles with speech recognition visualizations in the background.

NVIDIA NeMo, an end-to-end platform for the development of multimodal generative AI models at scale anywhere��on any cloud and on-premises��released the Parakeet family of automatic speech recognition (ASR) models. These state-of-the-art ASR models, developed in collaboration with Suno.ai, transcribe spoken English with exceptional accuracy. This post details Parakeet ASR models that are��

]]> 0 Gordana Neskovic <![CDATA[NVIDIA Speech and Translation AI Models Set Records for Speed and Accuracy]]> http://www.open-lab.net/blog/?p=79365 2024-08-12T16:09:12Z 2024-03-19T16:00:00Z

Speech and translation AI models developed at NVIDIA are pushing the boundaries of performance and innovation. The NVIDIA Parakeet automatic speech recognition...]]>

Speech and translation AI models developed at NVIDIA are pushing the boundaries of performance and innovation. The NVIDIA Parakeet automatic speech recognition...

speech-ai-composite-graphic

Speech and translation AI models developed at NVIDIA are pushing the boundaries of performance and innovation. The NVIDIA Parakeet automatic speech recognition (ASR) family of models and the NVIDIA Canary multilingual, multitask ASR and translation model currently top the Hugging Face Open ASR Leaderboard. In addition, a multilingual P-Flow-based text-to-speech (TTS) model won the LIMMITS ��24��

]]> 0 Piotr ?elasko <![CDATA[New Support for Dutch and Persian Released by NVIDIA NeMo ASR]]> http://www.open-lab.net/blog/?p=76636 2024-02-08T18:52:04Z 2024-01-16T18:29:16Z

Breaking barriers in speech recognition, NVIDIA NeMo proudly presents pretrained models tailored for Dutch and Persian��languages often overlooked in the AI...]]>

Breaking barriers in speech recognition, NVIDIA NeMo proudly presents pretrained models tailored for Dutch and Persian��languages often overlooked in the AI... Person sitting at a desk having a conversation with a speech ai chatbot.

Person sitting at a desk having a conversation with a speech ai chatbot.

Breaking barriers in speech recognition, NVIDIA NeMo proudly presents pretrained models tailored for Dutch and Persian��languages often overlooked in the AI landscape. These models leverage the recently introduced FastConformer architecture and were trained simultaneously with CTC and transducer objectives to maximize each model��s accuracy. Automatic speech recognition (ASR) is a��

]]> 1 Pawe? Budzianowski <![CDATA[Enhancing Phone Customer Service with ASR Customization]]> http://www.open-lab.net/blog/?p=75584 2024-01-25T18:17:37Z 2024-01-09T17:00:00Z

At the core of understanding people correctly and having natural conversations is automatic speech recognition (ASR). To make customer-led voice assistants and...]]>

At the core of understanding people correctly and having natural conversations is automatic speech recognition (ASR). To make customer-led voice assistants and... Decorative image.

Decorative image.

At the core of understanding people correctly and having natural conversations is automatic speech recognition (ASR). To make customer-led voice assistants and automate customer service interactions over the phone, companies must solve the unique challenge of gaining a caller��s trust through qualities such as understanding, empathy, and clarity. Telephony-bound voice is inherently challenging��

]]> 0 Seth Schneider <![CDATA[Building Lifelike Digital Avatars with NVIDIA ACE Microservices]]> http://www.open-lab.net/blog/?p=76147 2024-01-25T18:17:41Z 2024-01-08T16:30:00Z

Generative AI technologies are revolutionizing how games are produced and played. Game developers are exploring how these technologies can accelerate their...]]>

Generative AI technologies are revolutionizing how games are produced and played. Game developers are exploring how these technologies can accelerate their... Still image from Kairos demo, of an NPC at a bar.

Still image from Kairos demo, of an NPC at a bar.

Generative AI technologies are revolutionizing how games are produced and played. Game developers are exploring how these technologies can accelerate their content pipelines and provide new gameplay experiences previously thought impossible. One area of focus, digital avatars, will have a transformative impact on how gamers will interact with non-playable characters (NPCs). Historically��

]]> 0 Mohamed Elshenawy <![CDATA[Boost Meeting Productivity with AI-Powered Note-Taking and Summarization]]> http://www.open-lab.net/blog/?p=73964 2023-12-14T19:27:34Z 2023-11-29T21:00:00Z

Meetings are the lifeblood of an organization. They foster collaboration and informed decision-making. They eliminate silos through brainstorming and...]]>

Meetings are the lifeblood of an organization. They foster collaboration and informed decision-making. They eliminate silos through brainstorming and...

trascription-graphic

Meetings are the lifeblood of an organization. They foster collaboration and informed decision-making. They eliminate silos through brainstorming and problem-solving. And they further strategic goals and planning. Yet, leading meetings that accomplish these goals��especially those involving cross-functional teams and external participants��can be challenging. A unique blend of people��

]]> 0 Belen Tegegn <![CDATA[Video: Exploring Speech AI from Research to Practical Production Applications]]> http://www.open-lab.net/blog/?p=72433 2023-11-16T19:16:46Z 2023-11-07T16:07:22Z

The integration of speech and translation AI into our daily lives is rapidly reshaping our interactions, from virtual assistants to call centers and augmented...]]>

The integration of speech and translation AI into our daily lives is rapidly reshaping our interactions, from virtual assistants to call centers and augmented... Decorative image of groups of people using speech AI in different ways standing around a globe.

Decorative image of groups of people using speech AI in different ways standing around a globe.

The integration of speech and translation AI into our daily lives is rapidly reshaping our interactions, from virtual assistants to call centers and augmented reality experiences. Speech AI Day provided valuable insights into the latest advancements in speech AI, showcasing how this technology addresses real-world challenges. In this first of three Speech AI Day sessions��

]]> 0 Tanya Lenz <![CDATA[Workshop: Building Conversational AI Applications]]> http://www.open-lab.net/blog/?p=70919 2023-11-03T07:14:57Z 2023-09-20T17:00:00Z

Learn how to build and deploy production-quality conversational AI apps with real-time transcription and NLP.]]>

Learn how to build and deploy production-quality conversational AI apps with real-time transcription and NLP.

dli-social-convai-workshop-and-scaling-gpu-1920x1080

Learn how to build and deploy production-quality conversational AI apps with real-time transcription and NLP.

]]> 0 Sven Chilton <![CDATA[How to Deploy NVIDIA Riva Speech and Translation AI in the Public Cloud]]> http://www.open-lab.net/blog/?p=69702 2023-10-20T18:13:34Z 2023-08-29T17:00:00Z

From start-ups to large enterprises, businesses use cloud marketplaces to find the new solutions needed to quickly transform their businesses. Cloud...]]>

From start-ups to large enterprises, businesses use cloud marketplaces to find the new solutions needed to quickly transform their businesses. Cloud... Image of two boxes with text, in two languages, with speech icons joining them to a central box symbolizing translation. The English language box displays,

Image of two boxes with text, in two languages, with speech icons joining them to a central box symbolizing translation. The English language box displays,

From start-ups to large enterprises, businesses use cloud marketplaces to find the new solutions needed to quickly transform their businesses. Cloud marketplaces are online storefronts where customers can purchase software and services with flexible billing models, including pay-as-you-go, subscriptions, and privately negotiated offers. Businesses further benefit from committed spending at��

]]> 0 Sirisha Rella <![CDATA[Speech AI Spotlight: Visualizing Spoken Language and Sounds on AR Glasses]]> http://www.open-lab.net/blog/?p=66701 2023-07-13T19:00:30Z 2023-06-23T15:00:00Z

Audio can include a wide range of sounds, from human speech to non-speech sounds like barking dogs and sirens. When designing accessible applications for people...]]>

Audio can include a wide range of sounds, from human speech to non-speech sounds like barking dogs and sirens. When designing accessible applications for people... Image of glasses with computer screen reflected.

Image of glasses with computer screen reflected.

Audio can include a wide range of sounds, from human speech to non-speech sounds like barking dogs and sirens. When designing accessible applications for people with hearing difficulties, the application should be able to recognize sounds and understand speech. Such technology would help deaf or hard-of-hearing individuals with visualizing speech, like human conversations and non-speech��

]]> 1 Caroline Gottlieb <![CDATA[Unlocking Speech AI Technology for Global Language Users: Top Q&As]]> http://www.open-lab.net/blog/?p=66216 2023-11-03T07:15:00Z 2023-06-06T17:00:00Z

Voice-enabled technology is becoming ubiquitous. But many are being left behind by an anglocentric and demographically biased algorithmic world. Mozilla Common...]]>

Voice-enabled technology is becoming ubiquitous. But many are being left behind by an anglocentric and demographically biased algorithmic world. Mozilla Common...

speech-ai-summit-graphic

Voice-enabled technology is becoming ubiquitous. But many are being left behind by an anglocentric and demographically biased algorithmic world. Mozilla Common Voice (MCV) and NVIDIA are collaborating to change that by partnering on a public crowdsourced multilingual speech corpus��now the largest of its kind in the world��and open-source pretrained models. It is now easier than ever before to��

]]> 0 Vishal Manchanda <![CDATA[How Language Neutralization Is Transforming Customer Service Contact Centers]]> http://www.open-lab.net/blog/?p=65761 2023-10-30T23:18:55Z 2023-05-30T22:58:34Z

According to Gartner,? "Nearly half of digital workers struggle to find the data they need to do their jobs, and close to one-third have made a wrong business...]]>

According to Gartner,? "Nearly half of digital workers struggle to find the data they need to do their jobs, and close to one-third have made a wrong business...

transcription-graphic

According to Gartner,? ��Nearly half of digital workers struggle to find the data they need to do their jobs, and close to one-third have made a wrong business decision due to lack of information awareness.��1 To address this challenge, more and more enterprises are deploying AI in customer service, as it helps to provide more efficient and information-based personalized services.

]]> 0 Swaroop Kumar <![CDATA[Enhancing Customer Experience in Telecom with NVIDIA Customized Speech AI]]> http://www.open-lab.net/blog/?p=65421 2023-10-30T23:24:55Z 2023-05-30T15:00:00Z

The telecom sector is transforming how communication happens. Striving to provide reliable, uninterrupted service, businesses are tackling the challenge of...]]>

The telecom sector is transforming how communication happens. Striving to provide reliable, uninterrupted service, businesses are tackling the challenge of... Image of a chatbot as the interface between customers, with speech bubbles.

Image of a chatbot as the interface between customers, with speech bubbles.

The telecom sector is transforming how communication happens. Striving to provide reliable, uninterrupted service, businesses are tackling the challenge of delivering an optimal customer experience. This optimal customer experience is something many long-time customers of large telecom service providers do not have. Take Jack, for example. His call was on hold for 10 minutes��

]]> 0 Kristen Rumley <![CDATA[How Speech Recognition Improves Customer Service in Telecommunications]]> http://www.open-lab.net/blog/?p=63789 2023-11-03T07:15:01Z 2023-05-02T16:00:00Z

The telecommunication industry has seen a proliferation of AI-powered technologies in recent years, with speech recognition and translation leading the charge....]]>

The telecommunication industry has seen a proliferation of AI-powered technologies in recent years, with speech recognition and translation leading the charge....

How Speech Recognition Improves Customer Service in Telecommunications

The telecommunication industry has seen a proliferation of AI-powered technologies in recent years, with speech recognition and translation leading the charge. Multi-lingual AI virtual assistants, digital humans, chatbots, agent assists, and audio transcription are technologies that are revolutionizing the telco industry. Businesses are implementing AI in call centers to address incoming requests��

]]> 0 Kristen Rumley <![CDATA[Workshop: How to Enable Your Product with Voice Interface]]> http://www.open-lab.net/blog/?p=63525 2023-08-18T20:50:31Z 2023-04-19T16:29:08Z

This hands-on workshop guides you through the process of voice-enabling your product, from familiarizing yourself with NVIDIA Riva to assessing the costs and...]]>

This hands-on workshop guides you through the process of voice-enabling your product, from familiarizing yourself with NVIDIA Riva to assessing the costs and...

convai-gtc22-fall-launchpad-yt-1920x1080

This hands-on workshop guides you through the process of voice-enabling your product, from familiarizing yourself with NVIDIA Riva to assessing the costs and resources required for your project.

]]> 0 Michelle Horton <![CDATA[Top Conversational AI Sessions at NVIDIA GTC 2023]]> http://www.open-lab.net/blog/?p=61425 2023-03-09T19:18:45Z 2023-02-28T19:30:27Z

Learn about the latest tools, trends, and technologies for building and deploying conversational AI.]]>

Learn about the latest tools, trends, and technologies for building and deploying conversational AI. 4 images showing different applications for conversational AI such as virtual assistants and avatars.

4 images showing different applications for conversational AI such as virtual assistants and avatars.

Learn about the latest tools, trends, and technologies for building and deploying conversational AI.

]]> 0 Michelle Horton <![CDATA[Top Speech AI Developer Day Sessions at NVIDIA GTC 2023]]> http://www.open-lab.net/blog/?p=60997 2023-03-14T19:01:01Z 2023-02-14T22:00:00Z

Explore the latest advances in accurate and customizable automatic speech recognition, multi-language translation, and text-to-speech.]]>

Explore the latest advances in accurate and customizable automatic speech recognition, multi-language translation, and text-to-speech. Black background with bright green sound waves and GTC banner in the corner.

Black background with bright green sound waves and GTC banner in the corner.

Explore the latest advances in accurate and customizable automatic speech recognition, multi-language translation, and text-to-speech.

]]> 0 David Taubenheim <![CDATA[Speech AI Spotlight: How Pendulum Nabs Harmful Narratives Online]]> http://www.open-lab.net/blog/?p=60694 2023-11-03T07:15:05Z 2023-02-08T17:00:00Z

Over 55% of the global population uses social media, easily sharing online content with just one click. While connecting with others and consuming entertaining...]]>

Over 55% of the global population uses social media, easily sharing online content with just one click. While connecting with others and consuming entertaining...

speech-ai-spotlight-story-pendulum-solution-workflow-featured-image

Over 55% of the global population uses social media, easily sharing online content with just one click. While connecting with others and consuming entertaining content, you can also spot harmful narratives posing real-life threats. That��s why VP of Engineering at Pendulum, Ammar Haris, wants his company��s AI to help clients to gain deeper insight into the harmful content being generated��

]]> 1 Somshubra Majumdar <![CDATA[Controlled Adaptation of Speech Recognition Models to New Domains]]> http://www.open-lab.net/blog/?p=60523 2023-06-12T07:55:13Z 2023-02-03T14:00:00Z

Have you ever tried to fine-tune a speech recognition system on your accent only to find that, while it recognizes your voice well, it fails to detect words...]]>

Have you ever tried to fine-tune a speech recognition system on your accent only to find that, while it recognizes your voice well, it fails to detect words...

computer-code

Have you ever tried to fine-tune a speech recognition system on your accent only to find that, while it recognizes your voice well, it fails to detect words spoken by others? This is common in speech recognition systems that have trained on hundreds of thousands of hours of speech. In large-scale automatic speech recognition (ASR), a system may perform well in many but not all scenarios.

]]> 0 Dima Rekesh <![CDATA[Multilingual and Code-Switched Automatic Speech Recognition with NVIDIA NeMo]]> http://www.open-lab.net/blog/?p=60289 2023-11-03T07:15:06Z 2023-01-31T17:00:00Z

Multilingual automatic speech recognition (ASR) models have gained significant interest because of their ability to transcribe speech in more than one language....]]>

Multilingual automatic speech recognition (ASR) models have gained significant interest because of their ability to transcribe speech in more than one language....

multilingual-asr-featured

Multilingual automatic speech recognition (ASR) models have gained significant interest because of their ability to transcribe speech in more than one language. This is fueled by the growing multilingual communities as well as by the need to reduce complexity. You only need one model to handle multiple languages. This post explains how to use pretrained multilingual NeMo ASR models from the��

]]> 0 Michelle Horton <![CDATA[New Hands-on Lab: Intelligent Virtual Assistant]]> http://www.open-lab.net/blog/?p=60074 2023-06-12T07:58:15Z 2023-01-24T21:00:00Z

Learn to build an engaging and intelligent virtual assistant with NVIDIA AI workflows powered by NVIDIA Riva in this free hands-on lab from NVIDIA LaunchPad,]]>

Learn to build an engaging and intelligent virtual assistant with NVIDIA AI workflows powered by NVIDIA Riva in this free hands-on lab from NVIDIA LaunchPad, Illustration of a cell phone with a virtual assistant helping with a purchase.

Illustration of a cell phone with a virtual assistant helping with a purchase.

Learn to build an engaging and intelligent virtual assistant with NVIDIA AI workflows powered by NVIDIA Riva in this free hands-on lab from NVIDIA LaunchPad��

]]> 0 Aleksandr Laptev <![CDATA[Entropy-Based Methods for Word-Level ASR Confidence Estimation]]> http://www.open-lab.net/blog/?p=59689 2023-06-12T08:12:41Z 2023-01-13T22:21:15Z

Once you have your automatic speech recognition (ASR) model predictions, you may also want to know how likely those predictions are to be correct. This...]]>

Once you have your automatic speech recognition (ASR) model predictions, you may also want to know how likely those predictions are to be correct. This... Person standing in front of waterfall

Person standing in front of waterfall

]]> 0 Maggie Zhang <![CDATA[Autoscaling NVIDIA Riva Deployment with Kubernetes for Speech AI in Production]]> http://www.open-lab.net/blog/?p=59514 2023-10-20T18:16:30Z 2023-01-12T17:30:00Z

Speech AI applications, from call centers to virtual assistants, rely heavily on automatic speech recognition (ASR) and text-to-speech (TTS). ASR can process...]]>

Speech AI applications, from call centers to virtual assistants, rely heavily on automatic speech recognition (ASR) and text-to-speech (TTS). ASR can process... Graphic with computer, cloud, and GPU icons

Graphic with computer, cloud, and GPU icons

Speech AI applications, from call centers to virtual assistants, rely heavily on automatic speech recognition (ASR) and text-to-speech (TTS). ASR can process the audio signal and transcribe the audio to text. Speech synthesis or TTS can generate high-quality, natural-sounding audio from the text in real time. The challenge of Speech AI is to achieve high accuracy and meet the latency requirements��

]]> 0 Sirisha Rella <![CDATA[Speech AI Technology Enables Natural Interactions with Service Robots]]> http://www.open-lab.net/blog/?p=59175 2023-06-12T08:17:08Z 2022-12-17T00:23:07Z

From taking your order and serving you food in a restaurant to playing poker with you, service robots are becoming increasingly prevalent. Globally, you can...]]>

From taking your order and serving you food in a restaurant to playing poker with you, service robots are becoming increasingly prevalent. Globally, you can...

smart-retail-robot

From taking your order and serving you food in a restaurant to playing poker with you, service robots are becoming increasingly prevalent. Globally, you can find these service robots at hospitals, airports, and retail stores. According to Gartner, by 2030, 80% of humans will engage with smart robots daily, due to smart robot advancements in intelligence, social interactions��

]]> 0 Sirisha Rella <![CDATA[Deep Learning is Transforming ASR and TTS Algorithms]]> http://www.open-lab.net/blog/?p=59169 2023-04-04T21:25:25Z 2022-12-16T23:48:41Z

Speech is one of the primary means to communicate with an AI-powered application. From virtual assistants to digital avatars, voice-based interfaces are...]]>

Speech is one of the primary means to communicate with an AI-powered application. From virtual assistants to digital avatars, voice-based interfaces are...

Man interacting with digital menu at counter

Speech is one of the primary means to communicate with an AI-powered application. From virtual assistants to digital avatars, voice-based interfaces are changing how we typically interact with smart devices. Deep learning techniques for speech recognition and speech synthesis are helping improve the user experience��think human-like responses and natural-sounding tones. If you plan to��

]]> 0 Sven Chilton <![CDATA[Reducing Development Time for Intelligent Virtual Assistants in Contact Centers]]> http://www.open-lab.net/blog/?p=58450 2023-08-22T20:30:40Z 2022-12-15T16:00:00Z

As the global service economy grows, companies rely increasingly on contact centers to drive better customer experiences, increase customer satisfaction, and...]]>

As the global service economy grows, companies rely increasingly on contact centers to drive better customer experiences, increase customer satisfaction, and...

NVIDIA Speech AI Riva

As the global service economy grows, companies rely increasingly on contact centers to drive better customer experiences, increase customer satisfaction, and lower costs with increased efficiencies. Customer demand has increased far more rapidly than contact center employment ever could. Combined with the high agent churn rate, customer demand creates a need for more automated real-time customer��

]]> 0 Sirisha Rella <![CDATA[Speech AI Spotlight: Reimagine Customer Service with Virtual Agents]]> http://www.open-lab.net/blog/?p=58387 2023-06-12T08:23:40Z 2022-12-14T18:00:00Z

Virtual agents or voice-enabled assistants have been around for quite some time. But in the last decade, their usefulness and popularity have exploded with the...]]>

Virtual agents or voice-enabled assistants have been around for quite some time. But in the last decade, their usefulness and popularity have exploded with the... Customer service representatives working

Customer service representatives working

Virtual agents or voice-enabled assistants have been around for quite some time. But in the last decade, their usefulness and popularity have exploded with the use of AI. According to Gartner, virtual assistants will automate up to 75% of tasks for call center agents by 2025�Cup from 30% in 2021. This translates to a better experience for both contact center agents and customers.

]]> 0 Davide Onofrio <![CDATA[Introducing NVIDIA Riva: A GPU-Accelerated SDK for Developing Speech AI Applications]]> http://www.open-lab.net/blog/?p=17451 2023-05-22T22:12:28Z 2022-12-08T23:37:19Z

This post was updated in March 2023. Sign up for the latest Speech AI news from NVIDIA. Speech AI is used in a variety of applications, including contact...]]>

This post was updated in March 2023. Sign up for the latest Speech AI news from NVIDIA. Speech AI is used in a variety of applications, including contact...

riva-use-cases (2)

This post was updated in March 2023. Sign up for the latest Speech AI news from NVIDIA. Speech AI is used in a variety of applications, including contact centers�� agent assists for empowering human agents, voice interfaces for intelligent virtual assistants (IVAs), and live captioning in video conferencing. To support these features, speech AI technology includes automatic speech recognition��

]]> 3 Siddharth Sharma <![CDATA[Explainer: What Is Conversational AI?]]> http://www.open-lab.net/blog/?p=54534 2024-06-05T22:05:49Z 2022-12-05T20:00:00Z

Real-time natural language understanding will transform how we interact with intelligent machines and applications.]]>

Real-time natural language understanding will transform how we interact with intelligent machines and applications.

sound-waves-1280x680

Real-time natural language understanding will transform how we interact with intelligent machines and applications.

]]> 0 Vinh Nguyen <![CDATA[Making an NVIDIA Riva ASR Service for a New Language]]> http://www.open-lab.net/blog/?p=50426 2024-08-28T14:49:34Z 2022-10-28T17:00:00Z

Speech AI is the ability of intelligent systems to communicate with users using a voice-based interface, which has become ubiquitous in everyday life. People...]]>

Speech AI is the ability of intelligent systems to communicate with users using a voice-based interface, which has become ubiquitous in everyday life. People...

automatic-speech-recognition-tech-blog-featured-image

Speech AI is the ability of intelligent systems to communicate with users using a voice-based interface, which has become ubiquitous in everyday life. People regularly interact with smart home devices, in-car assistants, and phones through speech. Speech interface quality has improved leaps and bounds in recent years, making them a much more pleasant, practical, and natural experience than just a��

]]> 4 Michelle Horton <![CDATA[Upcoming Event: Speech AI Summit 2022]]> http://www.open-lab.net/blog/?p=56388 2023-11-03T07:15:06Z 2022-10-25T16:00:00Z

Join experts from Google, Meta, NVIDIA, and more at the first annual NVIDIA Speech AI Summit. Register now!]]>

Join experts from Google, Meta, NVIDIA, and more at the first annual NVIDIA Speech AI Summit. Register now!

speech-ai-summit

Join experts from Google, Meta, NVIDIA, and more at the first annual NVIDIA Speech AI Summit. Register now!

]]> 0 Tanya Lenz <![CDATA[New Course: Get Started with Highly Accurate Custom ASR for Speech AI]]> http://www.open-lab.net/blog/?p=55846 2023-11-03T07:15:07Z 2022-10-24T16:30:00Z

Learn how to build, train, customize, and deploy a GPU-accelerated automatic speech recognition service with NVIDIA Riva in this self-paced course.]]>

Learn how to build, train, customize, and deploy a GPU-accelerated automatic speech recognition service with NVIDIA Riva in this self-paced course.

dli-sp-course-sept22-riva-asr-li-tw-2048x1024 (1)

Learn how to build, train, customize, and deploy a GPU-accelerated automatic speech recognition service with NVIDIA Riva in this self-paced course.

]]> 0 Aleksandra Antonova <![CDATA[Building an Automatic Speech Recognition Model for the Kinyarwanda Language]]> http://www.open-lab.net/blog/?p=56301 2023-11-03T07:15:08Z 2022-10-20T14:30:00Z

Speech recognition technology is growing in popularity for voice assistants and robotics, for solving real-world problems through assisted healthcare or...]]>

Speech recognition technology is growing in popularity for voice assistants and robotics, for solving real-world problems through assisted healthcare or...

Speech recognition technology is growing in popularity for voice assistants and robotics, for solving real-world problems through assisted healthcare or education, and more. This is helping democratize access to speech AI worldwide. As labeled datasets for unique, emerging languages become more widely available, developers can build AI applications readily, accurately, and affordably to enhance��

]]> 0 Gordana Neskovic <![CDATA[Just Released: New Updates to NVIDIA Riva]]> http://www.open-lab.net/blog/?p=54741 2023-06-12T08:56:12Z 2022-09-26T17:00:00Z

Build better GPU-accelerated Speech AI applications with the latest NVIDIA Riva updates, including enterprise support.]]>

Build better GPU-accelerated Speech AI applications with the latest NVIDIA Riva updates, including enterprise support.

convai-gtc22-fall-launchpad-yt-1920x1080

Build better GPU-accelerated Speech AI applications with the latest NVIDIA Riva updates, including enterprise support.

]]> 0 Dave Niewinski <![CDATA[Low-Code Building Blocks for Speech AI Robotics]]> http://www.open-lab.net/blog/?p=55065 2023-11-03T07:15:08Z 2022-09-22T18:33:00Z

When examining an intricate speech AI robotic system, it��s easy for developers to feel intimidated by its complexity. Arthur C. Clarke claimed, ��Any...]]>

When examining an intricate speech AI robotic system, it��s easy for developers to feel intimidated by its complexity. Arthur C. Clarke claimed, ��Any...

gtc22-fall-convai

When examining an intricate speech AI robotic system, it��s easy for developers to feel intimidated by its complexity. Arthur C. Clarke claimed, ��Any sufficiently advanced technology is indistinguishable from magic.�� From accepting natural-language commands to safely interacting in real-time with its environment and the humans around it, today��s speech AI robotics systems can perform tasks to��

]]> 0 Yang Zhang <![CDATA[Text Normalization and Inverse Text Normalization with NVIDIA NeMo]]> http://www.open-lab.net/blog/?p=55161 2023-11-03T07:15:09Z 2022-09-16T21:54:42Z

Text normalization (TN) converts text from written form into its verbalized form, and it is an essential preprocessing step before text-to-speech (TTS). TN...]]>

Text normalization (TN) converts text from written form into its verbalized form, and it is an essential preprocessing step before text-to-speech (TTS). TN...

Text normalization (TN) converts text from written form into its verbalized form, and it is an essential preprocessing step before text-to-speech (TTS). TN ensures that TTS can handle all input texts without skipping unknown symbols. For example, ��$123�� is converted to ��one hundred and twenty-three dollars.�� Inverse text normalization (ITN) is a part of the automatic speech recognition (ASR)��

]]> 0 Taejin Park <![CDATA[Dynamic Scale Weighting Through Multiscale Speaker Diarization]]> http://www.open-lab.net/blog/?p=54785 2023-11-03T07:15:09Z 2022-09-16T21:38:00Z

Speaker diarization is the process of segmenting audio recordings by speaker labels and aims to answer the question ��Who spoke when?��. It makes a clear...]]>

Speaker diarization is the process of segmenting audio recordings by speaker labels and aims to answer the question ��Who spoke when?��. It makes a clear...

msdd-featured

Speaker diarization is the process of segmenting audio recordings by speaker labels and aims to answer the question ��Who spoke when?��. It makes a clear distinction when it is compared with speech recognition. Before you perform speaker diarization, you know ��what is spoken�� but you don��t know ��who spoke it��. Therefore, speaker diarization is an essential feature for a speech recognition��

]]> 0 Sirisha Rella <![CDATA[Developing the Next Generation of Extended Reality Applications with Speech AI]]> http://www.open-lab.net/blog/?p=54831 2023-11-03T07:15:10Z 2022-09-14T16:00:00Z

Virtual reality (VR), augmented reality (AR), and mixed reality (MR) environments can feel incredibly real due to the physically immersive experience. Adding a...]]>

Virtual reality (VR), augmented reality (AR), and mixed reality (MR) environments can feel incredibly real due to the physically immersive experience. Adding a...

convai-gtc22-fall-promo-pack-xr-applications-1600x900

]]> 0 Xianchao Wu <![CDATA[Improving Japanese Language ASR by Combining Convolutions with Attention Mechanisms]]> http://www.open-lab.net/blog/?p=54745 2023-06-12T08:56:00Z 2022-09-12T14:30:00Z

Automatic speech recognition (ASR) research generally focuses on high-resource languages such as English, which is supported by hundreds of thousands of hours...]]>

Automatic speech recognition (ASR) research generally focuses on high-resource languages such as English, which is supported by hundreds of thousands of hours...

Abstract graphic of Human Genome dna sequencing analysis, Sequencing DNA means determining the order of the four chemical building blocks called bases

Automatic speech recognition (ASR) research generally focuses on high-resource languages such as English, which is supported by hundreds of thousands of hours of speech. Recent literature has renewed focus on more complex languages, such as Japanese. Like other Asian languages, Japanese has a vast base character set (upwards of 3,000 unique characters are used in common vernacular)��

]]> 0 Aleksandr Laptev <![CDATA[Changing CTC Rules to Reduce Memory Consumption in Training and Decoding]]> http://www.open-lab.net/blog/?p=54761 2023-12-30T01:55:52Z 2022-09-12T14:30:00Z

Loss functions for training automatic speech recognition (ASR) models are not set in stone. The older rules of loss functions are not necessarily optimal....]]>

Loss functions for training automatic speech recognition (ASR) models are not set in stone. The older rules of loss functions are not necessarily optimal.... Decorative image.

Decorative image.

Loss functions for training automatic speech recognition (ASR) models are not set in stone. The older rules of loss functions are not necessarily optimal. Consider connectionist temporal classification (CTC) and see how changing some of its rules enables you to reduce GPU memory, which is required for training and inference of CTC-based models and more. For more information about the��

]]> 0 Michelle Horton <![CDATA[Upcoming Event: Conversational AI Sessions at GTC 2022]]> http://www.open-lab.net/blog/?p=54056 2023-11-03T07:15:11Z 2022-09-02T16:00:00Z

Learn about the latest tools, trends, and technologies for building and deploying conversational AI.]]>

Learn about the latest tools, trends, and technologies for building and deploying conversational AI.

gtc22-fall-social-session-a41127-quicklink-1600x900

Learn about the latest tools, trends, and technologies for building and deploying conversational AI.

]]> 0 Sunil Kumar Jang Bahadur <![CDATA[Solving Automatic Speech Recognition Deployment Challenges]]> http://www.open-lab.net/blog/?p=54238 2023-06-12T09:00:47Z 2022-08-31T16:00:00Z

Successfully deploying an automatic speech recognition (ASR) application can be a frustrating experience. For example, it is difficult for an ASR system to...]]>

Successfully deploying an automatic speech recognition (ASR) application can be a frustrating experience. For example, it is difficult for an ASR system to...

speech-to-text-remixed

Successfully deploying an automatic speech recognition (ASR) application can be a frustrating experience. For example, it is difficult for an ASR system to correctly identify words while maintaining low latency, considering the many different dialects and pronunciations that exist. Sign up for the latest Data Science news. Get the latest announcements, notebooks, hands-on tutorials, events��

]]> 0 David Taubenheim <![CDATA[Exploring Unique Applications of Automatic Speech Recognition Technology]]> http://www.open-lab.net/blog/?p=54108 2023-06-12T09:02:00Z 2022-08-29T18:00:00Z

Automatic speech recognition (ASR) is becoming part of everyday life, from interacting with digital assistants to dictating text messages. ASR research...]]>

Automatic speech recognition (ASR) is becoming part of everyday life, from interacting with digital assistants to dictating text messages. ASR research...

automatic-speech-recognition-applications-featured-image

Automatic speech recognition (ASR) is becoming part of everyday life, from interacting with digital assistants to dictating text messages. ASR research continues to progress, thanks to recent advances: This post first introduces common ASR applications and then features two startups exploring unique applications of ASR as a core product capability. Sign up for the latest Data��

]]> 0 Sirisha Rella <![CDATA[Essential Guide to Automatic Speech Recognition Technology]]> http://www.open-lab.net/blog/?p=51263 2023-06-12T09:10:15Z 2022-08-08T21:30:00Z

Over the past decade, AI-powered speech recognition systems have slowly become part of our everyday lives, from voice search to virtual assistants in contact...]]>

Over the past decade, AI-powered speech recognition systems have slowly become part of our everyday lives, from voice search to virtual assistants in contact...

ai-for-dev-blog-riva-asr-v006-1600x900

Over the past decade, AI-powered speech recognition systems have slowly become part of our everyday lives, from voice search to virtual assistants in contact centers, cars, hospitals, and restaurants. These speech recognition developments are made possible by deep learning advancements. Sign up for the latest Data Science news. Get the latest announcements, notebooks, hands-on tutorials��

]]> 0 Rohil Bhargava <![CDATA[Building a Speech-Enabled AI Virtual Assistant with NVIDIA Riva on Amazon EC2]]> http://www.open-lab.net/blog/?p=50606 2023-03-14T18:54:05Z 2022-07-28T15:30:00Z

Speech AI can assist human agents in contact centers, power virtual assistants and digital avatars, generate live captioning in video conferencing, and much...]]>

Speech AI can assist human agents in contact centers, power virtual assistants and digital avatars, generate live captioning in video conferencing, and much... Figure illustrating a screenshot of an NVIDIA Riva sample virtual assistant application running on a GPU-powered AWS EC2 instance through a web browser.

Figure illustrating a screenshot of an NVIDIA Riva sample virtual assistant application running on a GPU-powered AWS EC2 instance through a web browser.

Speech AI can assist human agents in contact centers, power virtual assistants and digital avatars, generate live captioning in video conferencing, and much more. Under the hood, these voice-based technologies orchestrate a network of automatic speech recognition (ASR) and text-to-speech (TTS) pipelines to deliver intelligent, real-time responses. Sign up for the latest Data Science news.

]]> 3 Vinh Nguyen <![CDATA[A?Guide to?Understanding Essential Speech AI Terms]]> http://www.open-lab.net/blog/?p=50343 2023-06-12T09:18:28Z 2022-07-26T17:43:37Z

Speech AI is the technology that makes it possible to communicate with computer systems using your voice. Commanding an in-car assistant or handling a smart...]]>

Speech AI is the technology that makes it possible to communicate with computer systems using your voice. Commanding an in-car assistant or handling a smart... Abstract visual with geometric shapes and vectors

Abstract visual with geometric shapes and vectors

Speech AI is the technology that makes it possible to communicate with computer systems using your voice. Commanding an in-car assistant or handling a smart home device? An AI-enabled voice interface helps you interact with devices without having to type or tap on a screen. Sign up for the latest Data Science news. Get the latest announcements, notebooks, hands-on tutorials, events��

]]> 0 Ashraf Eassa <![CDATA[The Full Stack Optimization Powering NVIDIA MLPerf Training v2.0 Performance]]> http://www.open-lab.net/blog/?p=49597 2023-07-05T19:27:00Z 2022-06-30T18:00:00Z

MLPerf benchmarks are developed by a consortium of AI leaders across industry, academia, and research labs, with the aim of providing standardized, fair, and...]]>

MLPerf benchmarks are developed by a consortium of AI leaders across industry, academia, and research labs, with the aim of providing standardized, fair, and...

Boosting MLPerf Training Performance with Full-Stack Optimization

MLPerf benchmarks are developed by a consortium of AI leaders across industry, academia, and research labs, with the aim of providing standardized, fair, and useful measures of deep learning performance. MLPerf training focuses on measuring time to train a range of commonly used neural networks for the following tasks: Lower training times are important to speed time to deployment��

]]> 0 Mikiko Bazeley <![CDATA[An Easy Introduction to Speech AI]]> http://www.open-lab.net/blog/?p=48941 2023-11-03T07:15:11Z 2022-06-23T16:00:00Z

Artificial intelligence (AI) has transformed synthesized speech from monotone robocalls and decades-old GPS navigation systems to the polished tone of virtual...]]>

Artificial intelligence (AI) has transformed synthesized speech from monotone robocalls and decades-old GPS navigation systems to the polished tone of virtual...

ai-for-dev-blog-green-neon-wave-1600x950

Artificial intelligence (AI) has transformed synthesized speech from monotone robocalls and decades-old GPS navigation systems to the polished tone of virtual assistants in smartphones and smart speakers. It has never been so easy for organizations to use customized state-of-the-art speech AI technology for their specific industries and domains. Speech AI is being used to power virtual��

]]> 1 Siddharth Sharma <![CDATA[Build Speech AI in Multiple Languages and Train Large Language Models with the Latest from Riva and NeMo Framework]]> http://www.open-lab.net/blog/?p=45648 2023-06-12T20:54:30Z 2022-03-28T16:00:00Z

Major updates to Riva, an SDK for building speech AI applications, and a paid Riva Enterprise offering were announced at NVIDIA GTC 2022 last week. Several key...]]>

Major updates to Riva, an SDK for building speech AI applications, and a paid Riva Enterprise offering were announced at NVIDIA GTC 2022 last week. Several key... Graphical representation of automatic speech recognition for transcription, controllable text-to-speech, and natural language processing in a chatbot.

Graphical representation of automatic speech recognition for transcription, controllable text-to-speech, and natural language processing in a chatbot.

Major updates to Riva, an SDK for building speech AI applications, and a paid Riva Enterprise offering were announced at NVIDIA GTC 2022 last week. Several key updates to the NeMo framework, a framework for training Large Language Models, were also announced. Riva offers world-class accuracy for real-time automatic speech recognition (ASR) and text-to-speech (TTS) skills across multiple��

]]> 0 Gordana Neskovic <![CDATA[Create Speech AI Applications in Multiple Languages and Customize Text-to-Speech with Riva]]> http://www.open-lab.net/blog/?p=43993 2023-03-14T18:55:13Z 2022-02-07T17:00:00Z

This month, NVIDIA released world-class speech-to-text models for Spanish, German, and Russian in Riva, powering enterprises to deploy speech AI applications...]]>

This month, NVIDIA released world-class speech-to-text models for Spanish, German, and Russian in Riva, powering enterprises to deploy speech AI applications...

Riva-Multiple-Language

This month, NVIDIA released world-class speech-to-text models for Spanish, German, and Russian in Riva, powering enterprises to deploy speech AI applications globally. In addition, enterprises can now create expressive speech interfaces using Riva��s customizable text-to-speech pipeline. NVIDIA Riva is a GPU-accelerated speech AI SDK for developing real-time applications like live captioning��

]]> 6 Siddharth Sharma <![CDATA[ICYMI: New AI Tools and Technologies Announced at NVIDIA GTC Keynote]]> http://www.open-lab.net/blog/?p=39300 2023-03-22T01:16:48Z 2021-11-09T19:08:00Z

At NVIDIA GTC this November, new software tools were announced that help developers build real-time speech applications, optimize inference for a variety of...]]>

At NVIDIA GTC this November, new software tools were announced that help developers build real-time speech applications, optimize inference for a variety of...

ai-for-developers-nov-21-announcement-social-ai-sw-gtc-keynote-wrap-up-fi-social-graphics-2003402-tw-li-1000x600-r5

At NVIDIA GTC this November, new software tools were announced that help developers build real-time speech applications, optimize inference for a variety of use-cases, optimize open-source interoperability for recommender systems, and more. Watch the keynote from CEO, Jensen Huang, to learn about the latest NVIDIA breakthroughs. Today, NVIDIA unveiled a new version of NVIDIA Riva with a��

]]> 0 Christopher Parisien <![CDATA[Building Transcription and Entity Recognition Apps Using NVIDIA Riva]]> http://www.open-lab.net/blog/?p=24076 2023-03-22T01:16:50Z 2021-11-09T16:15:08Z

In the past several months, many of us have grown accustomed to seeing our doctors over a video call. It��s certainly convenient, but after the call ends,...]]>

In the past several months, many of us have grown accustomed to seeing our doctors over a video call. It��s certainly convenient, but after the call ends,...

riva-multi-speaker-transcription

In the past several months, many of us have grown accustomed to seeing our doctors over a video call. It��s certainly convenient, but after the call ends, those important pieces of advice from your doctor start to slip away. What was that new medication I needed to take? Were there any side effects to watch out for? Conversational AI can help in building an application to transcribe speech as��

]]> 10 Nikhil Srihari <![CDATA[Creating Voice-based Virtual Assistants Using NVIDIA Riva and Rasa]]> http://www.open-lab.net/blog/?p=24085 2023-03-22T01:16:51Z 2021-11-09T16:14:54Z

Sign up for the latest Speech AI news from NVIDIA. Virtual assistants have become part of our daily lives. We ask virtual assistants almost anything that we...]]>

Sign up for the latest Speech AI news from NVIDIA. Virtual assistants have become part of our daily lives. We ask virtual assistants almost anything that we...

riva-voice-based-assistant

Sign up for the latest Speech AI news from NVIDIA. Virtual assistants have become part of our daily lives. We ask virtual assistants almost anything that we wonder about. In addition to providing convenience to our daily lives, virtual assistants are of tremendous help when it comes to enterprise applications. For example, we use online virtual agents to help navigate complex technical issues��

]]> 6 Tanay Varshney <![CDATA[Speech Recognition: Deploying Models to Production]]> http://www.open-lab.net/blog/?p=39744 2023-12-30T01:51:25Z 2021-11-09T09:37:00Z

This post is part of a series about generating accurate speech transcription. For part 1, see Speech Recognition: Generating Accurate Domain-Specific Audio...]]>

This post is part of a series about generating accurate speech transcription. For part 1, see Speech Recognition: Generating Accurate Domain-Specific Audio...

riva-inference

This post is part of a series about generating accurate speech transcription. For part 1, see Speech Recognition: Generating Accurate Domain-Specific Audio Transcriptions Using NVIDIA Riva. For part 2, see Speech Recognition: Customizing Models to Your Domain Using Transfer Learning. NVIDIA Riva is an AI speech SDK for developing real-time applications like transcription, virtual assistants��

]]> 0 Tanay Varshney <![CDATA[Speech Recognition: Customizing Models to Your Domain Using Transfer Learning]]> http://www.open-lab.net/blog/?p=39742 2023-03-22T01:16:53Z 2021-11-09T09:36:00Z

This post is part of a series about generating accurate speech transcription. For part 1, see Speech Recognition: Generating Accurate Transcriptions Using...]]>

This post is part of a series about generating accurate speech transcription. For part 1, see Speech Recognition: Generating Accurate Transcriptions Using...

part-2-featured-image

This post is part of a series about generating accurate speech transcription. For part 1, see Speech Recognition: Generating Accurate Transcriptions Using NVIDIA Riva. For part 3, see Speech Recognition: Deploying Models to Production. Creating a new AI deep learning model from scratch is an extremely time�C and resource-intensive process. A common solution to this problem is to employ��

]]> 0 Sirisha Rella <![CDATA[Speech Recognition: Generating Accurate Domain-Specific Audio Transcriptions Using NVIDIA Riva]]> http://www.open-lab.net/blog/?p=39715 2025-01-23T19:24:23Z 2021-11-09T09:35:00Z

This post is part of a series about generating accurate speech transcription. For part 2, see Speech Recognition: Customizing Models to Your Domain Using...]]>

This post is part of a series about generating accurate speech transcription. For part 2, see Speech Recognition: Customizing Models to Your Domain Using...

Riva-ASR-DevBlog-Feature-Image-1000x600-1

This post is part of a series about generating accurate speech transcription. For part 2, see Speech Recognition: Customizing Models to Your Domain Using Transfer Learning. For part 3, see Speech Recognition: Deploying Models to Production. Every day millions of audio minutes are produced across several industries such as Telecommunications, Finance, and Unified Communications as a Service��

]]> 2 Sirisha Rella <![CDATA[NVIDIA at INTERSPEECH 2021]]> http://www.open-lab.net/blog/?p=36357 2022-08-21T23:52:31Z 2021-08-18T22:02:56Z

Researchers from around the world working on speech applications are gathering this month for INTERSPEECH, a conference focused on the latest research and...]]>

Researchers from around the world working on speech applications are gathering this month for INTERSPEECH, a conference focused on the latest research and...

Interspeech

Researchers from around the world working on speech applications are gathering this month for INTERSPEECH, a conference focused on the latest research and technologies in speech processing. NVIDIA researchers will present papers on groundbreaking research in speech recognition and speech synthesis. Conversational AI research is fueling innovations in speech processing that help computers��

]]> 0 Oleksii Kuchaiev <![CDATA[Accelerating Conversational AI Research with New Cutting-Edge Neural Networks and Features from NeMo 1.0]]> http://www.open-lab.net/blog/?p=32233 2023-02-10T22:26:14Z 2021-06-08T16:00:00Z

NVIDIA NeMo is a conversational AI toolkit built for researchers working on automatic speech recognition (ASR), natural language processing (NLP), and...]]>

NVIDIA NeMo is a conversational AI toolkit built for researchers working on automatic speech recognition (ASR), natural language processing (NLP), and...

nemo-diagrams

NVIDIA NeMo is a conversational AI toolkit built for researchers working on automatic speech recognition (ASR), natural language processing (NLP), and text-to-speech synthesis (TTS). The primary objective of NeMo is to help researchers from industry and academia to reuse prior work (code and pretrained models and make it easier to create new conversational AI models. NeMo is an open-source project��

]]> 6 Brad Nemire <![CDATA[NVIDIA Releases Riva 1.0 Beta for Building Real-Time Conversational AI Services]]> https://news.www.open-lab.net/?p=19332 2023-02-13T19:00:37Z 2021-02-25T18:00:00Z

Today, NVIDIA released the Riva 1.0 Beta which includes an end-to-end workflow for building and deploying real-time conversational AI apps, such as...]]>

Today, NVIDIA released the Riva 1.0 Beta which includes an end-to-end workflow for building and deploying real-time conversational AI apps, such as...

Jarvis Beta Featured Image

Today, NVIDIA released the Riva 1.0 Beta which includes an end-to-end workflow for building and deploying real-time conversational AI apps, such as transcription, virtual assistants and chatbots. Riva is an accelerated SDK for multimodal conversational AI services that delivers real-time performance on NVIDIA GPUs. This release of Riva includes new pretrained models for conversation AI and��

]]> 0 Raghav Mani <![CDATA[Speeding Up Development of Speech and Language Models with NVIDIA NeMo]]> http://www.open-lab.net/blog/?p=17649 2023-03-22T01:09:09Z 2020-10-05T13:00:00Z

[stextbox id="info"]This is an updated version of Neural Modules for Fast Development of Speech and Language Models. This post upgrades the NeMo diagram with...]]>

[stextbox id="info"]This is an updated version of Neural Modules for Fast Development of Speech and Language Models. This post upgrades the NeMo diagram with...

NeMo-featured-image

This is an updated version of Neural Modules for Fast Development of Speech and Language Models. This post upgrades the NeMo diagram with PyTorch and PyTorch Lightning support and updates the tutorial with the new code base. As a researcher building state-of-the-art speech and language models, you must be able to quickly experiment with novel network architectures.

]]> 0 Akhil Docca <![CDATA[Empowering Smart Hospitals with NVIDIA Clara Guardian from NGC and NVIDIA Fleet Command]]> http://www.open-lab.net/blog/?p=21206 2023-03-22T01:09:07Z 2020-10-05T13:00:00Z

Hospitals today are seeking to overhaul their existing digital infrastructure to improve their internal processes, deliver better patient care, and reduce...]]>

Hospitals today are seeking to overhaul their existing digital infrastructure to improve their internal processes, deliver better patient care, and reduce...

ngc-collections-featured-1444340-thumbnail-250x151

Hospitals today are seeking to overhaul their existing digital infrastructure to improve their internal processes, deliver better patient care, and reduce operational expenses. Such a transition is required if hospitals are to cope with the needs of a burgeoning human population, accumulation of medical patient data, and a pandemic. The goal is not only to digitize existing infrastructure but��

]]> 0 James Sohn <![CDATA[Simplifying AI Inference with NVIDIA Triton Inference Server from NVIDIA NGC]]> http://www.open-lab.net/blog/?p=19889 2022-10-10T18:57:20Z 2020-08-25T00:12:17Z

Seamlessly deploying AI services at scale in production is as critical as creating the most accurate AI model. Conversational AI services, for example, need...]]>

Seamlessly deploying AI services at scale in production is as critical as creating the most accurate AI model. Conversational AI services, for example, need...

Triton Inference Server Featured

Seamlessly deploying AI services at scale in production is as critical as creating the most accurate AI model. Conversational AI services, for example, need multiple models handling functions of automatic speech recognition (ASR), natural language understanding (NLU), and text-to-speech (TTS) to complete the application pipeline. To provide real-time conversation to users��

]]> 3 Levi Barnes <![CDATA[Extracting Features from Multiple Audio Channels with Kaldi]]> http://www.open-lab.net/blog/?p=19854 2022-08-21T23:40:35Z 2020-08-20T23:23:59Z

In automatic speech recognition (ASR), one widely used method combines traditional machine learning with deep learning. In ASR flows of this type, audio...]]>

In automatic speech recognition (ASR), one widely used method combines traditional machine learning with deep learning. In ASR flows of this type, audio... Extracting Features from Multiple Audio Channels with Kaldi

Extracting Features from Multiple Audio Channels with Kaldi

In automatic speech recognition (ASR), one widely used method combines traditional machine learning with deep learning. In ASR flows of this type, audio features are first extracted from the raw audio. Features are then passed into an acoustic model. The acoustic model is a neural net trained on transcribed data to extract phoneme probabilities from the features. A phoneme is a single��

]]> 0 Hugo Braun <![CDATA[Integrating NVIDIA Triton Inference Server with Kaldi ASR]]> http://www.open-lab.net/blog/?p=19647 2022-08-21T23:40:33Z 2020-08-14T18:26:32Z

Speech processing is compute-intensive and requires a powerful and flexible platform to power modern conversational AI applications. It seemed natural to...]]>

Speech processing is compute-intensive and requires a powerful and flexible platform to power modern conversational AI applications. It seemed natural to...

Triton kaldi featured

Speech processing is compute-intensive and requires a powerful and flexible platform to power modern conversational AI applications. It seemed natural to combine the de facto standard platform for automatic speech recognition (ASR), the Kaldi Speech Recognition Toolkit, with the power and flexibility of NVIDIA GPUs. Kaldi adopted GPU acceleration for training workloads early on.

]]> 3 Christian Sch?fer <![CDATA[Creating Robust Neural Speech Synthesis with ForwardTacotron]]> http://www.open-lab.net/blog/?p=19393 2023-01-13T17:37:03Z 2020-08-07T02:24:26Z

Photo by Thomas Le: https://unsplash.com/@thomasble The artificial production of human speech, also known as speech synthesis, has always been a fascinating...]]>

Photo by Thomas Le: https://unsplash.com/@thomasble The artificial production of human speech, also known as speech synthesis, has always been a fascinating...

speaker-microphone-le-featured

The artificial production of human speech, also known as speech synthesis, has always been a fascinating field for researchers, including our AI team at Axel Springer SE. For a long time, people have worked on creating text-to-speech (TTS) systems that reach human level. Following the field��s transition to deep learning with the introduction of Google WaveNet in 2006, it has almost reached this��

]]> 0 Fedor Ignatov <![CDATA[Building a Simple AI Assistant with DeepPavlov and NVIDIA NeMo]]> http://www.open-lab.net/blog/?p=19316 2023-02-13T19:09:28Z 2020-08-04T21:04:00Z

In the past few years, voice-based interaction has become a feature of many industrial products. Voice platforms like Amazon Alexa, Google Home, Xiaomi Xiaz,...]]>

In the past few years, voice-based interaction has become a feature of many industrial products. Voice platforms like Amazon Alexa, Google Home, Xiaomi Xiaz,...

robot-to-human

In the past few years, voice-based interaction has become a feature of many industrial products. Voice platforms like Amazon Alexa, Google Home, Xiaomi Xiaz, Yandex Alice, and other in-home voice assistants provide easy-to-install, smart home technologies to even the least technologically savvy consumers. The fast adoption and rising performance of voice platforms drive interest in smart��

]]> 0 Abhishek Sawarkar <![CDATA[Optimizing and Accelerating AI Inference with the TensorRT Container from NVIDIA NGC]]> http://www.open-lab.net/blog/?p=19032 2022-10-10T18:57:20Z 2020-07-23T17:24:26Z

Natural language processing (NLP) is one of the most challenging tasks for AI because it needs to understand context, phonics, and accent to convert human...]]>

Natural language processing (NLP) is one of the most challenging tasks for AI because it needs to understand context, phonics, and accent to convert human...

triton-1

Natural language processing (NLP) is one of the most challenging tasks for AI because it needs to understand context, phonics, and accent to convert human speech into text. Building this AI workflow starts with training a model that can understand and process spoken language to text. BERT is one of the best models for this task. Instead of starting from scratch to build state-of-the-art��

]]> 0 Nefi Alarcon <![CDATA[OpenAI Presents GPT-3, a 175 Billion Parameters Language Model]]> https://news.www.open-lab.net/?p=17148 2023-06-12T21:16:13Z 2020-07-07T19:49:00Z

OpenAI researchers recently released a paper describing the development of GPT-3, a state-of-the-art language model made up of 175 billion parameters. For...]]>

OpenAI researchers recently released a paper describing the development of GPT-3, a state-of-the-art language model made up of 175 billion parameters. For...

OpenAI GPT 3 featured image

OpenAI researchers recently released a paper describing the development of GPT-3, a state-of-the-art language model made up of 175 billion parameters. For comparison, the previous version, GPT-2, was made up of 1.5 billion parameters. The largest Transformer-based language model was released by Microsoft earlier this month and is made up of 17 billion parameters. ��GPT-3 achieves strong��

]]> 0 David Williams <![CDATA[Training and Fine-tuning BERT Using NVIDIA NGC]]> http://www.open-lab.net/blog/?p=17909 2022-08-21T23:40:09Z 2020-06-16T17:25:49Z

Imagine an AI program that can understand language better than humans can. Imagine building your own personal Siri or Google Search for a customized domain or...]]>

Imagine an AI program that can understand language better than humans can. Imagine building your own personal Siri or Google Search for a customized domain or...

bert-photo

Imagine an AI program that can understand language better than humans can. Imagine building your own personal Siri or Google Search for a customized domain or application. Google BERT (Bidirectional Encoder Representations from Transformers) provides a game-changing twist to the field of natural language processing (NLP). BERT runs on supercomputers powered by NVIDIA GPUs to train its��

]]> 0 Grzegorz Karch <![CDATA[How to Deploy Real-Time Text-to-Speech Applications on GPUs Using TensorRT]]> http://www.open-lab.net/blog/?p=16159 2022-08-21T23:39:43Z 2020-01-06T17:03:35Z

Sign up for the latest Speech AI news from NVIDIA. Conversational AI is the technology that allows us to communicate with machines like with other people. With...]]>

Sign up for the latest Speech AI news from NVIDIA. Conversational AI is the technology that allows us to communicate with machines like with other people. With...

text-to-speech

Sign up for the latest Speech AI news from NVIDIA. Conversational AI is the technology that allows us to communicate with machines like with other people. With the advent of sophisticated deep learning models, the human-machine communication has risen to unprecedented levels. However, these models are compute intensive, and hence require optimized code for flawless interaction. In this post��

]]> 0 Adriana Flores Miranda <![CDATA[How to Build Domain Specific Automatic Speech Recognition Models on GPUs]]> http://www.open-lab.net/blog/?p=16095 2022-11-08T00:27:31Z 2019-12-18T03:00:17Z

In simple terms, conversational AI is the use of natural language to communicate with machines. Deep learning applications in conversational AI are growing...]]>

In simple terms, conversational AI is the use of natural language to communicate with machines. Deep learning applications in conversational AI are growing...

automatic-speech-recognition_updated

In simple terms, conversational AI is the use of natural language to communicate with machines. Deep learning applications in conversational AI are growing every day, from voice assistants and chatbots, to question answering systems that enable customer self-service. The range of industries adapting conversational AI into their solutions are wide, and have diverse domains extending from finance to��

]]> 0 Jocelyn Huang <![CDATA[Develop Smaller Speech Recognition Models with the NVIDIA NeMo Framework]]> http://www.open-lab.net/blog/?p=16063 2023-03-14T23:16:05Z 2019-12-10T16:00:44Z

As computers and other personal devices have become increasingly prevalent, interest in conversational AI has grown due to its multitude of potential...]]>

As computers and other personal devices have become increasingly prevalent, interest in conversational AI has grown due to its multitude of potential...

QuartzNet architecture

As computers and other personal devices have become increasingly prevalent, interest in conversational AI has grown due to its multitude of potential applications in a variety of situations. Each conversational AI framework is comprised of several more basic modules such as automatic speech recognition (ASR), and the models for these need to be lightweight in order to be effectively deployed on��

]]> 11 Sharath Sreenivas <![CDATA[Pretraining BERT with Layer-wise Adaptive Learning Rates]]> http://www.open-lab.net/blog/?p=15981 2022-08-21T23:39:41Z 2019-12-05T18:39:10Z

Training with larger batches is a straightforward way to scale training of deep neural networks to larger numbers of accelerators and reduce the training time....]]>

Training with larger batches is a straightforward way to scale training of deep neural networks to larger numbers of accelerators and reduce the training time....

BERT Phase1 pretraining

Training with larger batches is a straightforward way to scale training of deep neural networks to larger numbers of accelerators and reduce the training time. However, as the batch size increases, numerical instability can appear in the training process. The purpose of this post is to provide an overview of one class of solutions to this problem: layer-wise adaptive optimizers, such as LARS, LARC��

]]> 0 David Taubenheim <![CDATA[GPU-Accelerated Speech to Text with Kaldi: A Tutorial on Getting Started]]> http://www.open-lab.net/blog/?p=15710 2025-03-21T20:22:31Z 2019-10-17T20:59:04Z

Sign up for the latest Speech AI news from NVIDIA. Recently, NVIDIA achieved GPU-accelerated speech-to-text inference with exciting performance results. That...]]>

Sign up for the latest Speech AI news from NVIDIA. Recently, NVIDIA achieved GPU-accelerated speech-to-text inference with exciting performance results. That...

Kaldi Figure 1

Sign up for the latest Speech AI news from NVIDIA. Recently, NVIDIA achieved GPU-accelerated speech-to-text inference with exciting performance results. That post described the general process of the Kaldi ASR pipeline and indicated which of its elements the team accelerated, that is, implementing the decoder on the GPU and taking advantage of Tensor Cores in the acoustic model.

]]> 7 Raghav Mani <![CDATA[Neural Modules for Fast Development of Speech and Language Models]]> http://www.open-lab.net/blog/?p=15664 2022-08-21T23:39:37Z 2019-09-14T14:59:20Z

[stextbox id="info"]This post has been updated with Announcing NVIDIA NeMo: Fast Development of Speech and Language Models. The new version has information...]]>

[stextbox id="info"]This post has been updated with Announcing NVIDIA NeMo: Fast Development of Speech and Language Models. The new version has information...

Neural Modules Diagram1 (002)

This post has been updated with Announcing NVIDIA NeMo: Fast Development of Speech and Language Models. The new version has information about pretrained models in NGC and fine-tuning models on custom dataset sections, upgrades the NeMo diagram with the text-to-speech collection, and replaces the AN4 dataset in the example with the LibriSpeech dataset. As a researcher building state-of-the��

]]> 0 Shar Narasimhan <![CDATA[NVIDIA Clocks World��s Fastest BERT Training Time and Largest Transformer Based Model, Paving Path For Advanced Conversational AI]]> http://www.open-lab.net/blog/?p=15430 2022-08-21T23:39:34Z 2019-08-13T13:00:23Z

NVIDIA DGX SuperPOD trains?BERT-Large in just 47 minutes, and trains GPT-2 8B, the largest Transformer Network Ever with 8.3Bn parameters? Conversational AI...]]>

NVIDIA DGX SuperPOD trains?BERT-Large in just 47 minutes, and trains GPT-2 8B, the largest Transformer Network Ever with 8.3Bn parameters? Conversational AI...

Figure 3 Training

NVIDIA DGX SuperPOD trains BERT-Large in just 47 minutes, and trains GPT-2 8B, the largest Transformer Network Ever with 8.3Bn parameters Conversational AI is an essential building block of human interactions with intelligent machines and applications �C from robots and cars, to home assistants and mobile apps. Getting computers to understand human languages, with all their nuances��

]]> 3 Purnendu Mukherjee <![CDATA[Real-Time Natural Language Understanding with BERT Using TensorRT]]> http://www.open-lab.net/blog/?p=15432 2022-10-10T18:51:43Z 2019-08-13T13:00:19Z

Large scale language models (LSLMs) such as BERT, GPT-2, and XL-Net have brought about exciting leaps in state-of-the-art accuracy for many natural language...]]>

Large scale language models (LSLMs) such as BERT, GPT-2, and XL-Net have brought about exciting leaps in state-of-the-art accuracy for many natural language...

Figure 6 Compute latency

Large scale language models (LSLMs) such as BERT, GPT-2, and XL-Net have brought about exciting leaps in state-of-the-art accuracy for many natural language understanding (NLU) tasks. Since its release in Oct 2018, BERT1 (Bidirectional Encoder Representations from Transformers) remains one of the most popular language models and still delivers state of the art accuracy at the time of writing2.

]]> 11 Hugo Braun <![CDATA[NVIDIA Accelerates Real Time Speech to Text Transcription 3500x with Kaldi]]> http://www.open-lab.net/blog/?p=13915 2022-08-21T23:39:21Z 2019-03-18T15:00:51Z

Think of a sentence and repeat it aloud three times. If someone recorded this speech and performed a point-by-point comparison, they would find that no single...]]>

Think of a sentence and repeat it aloud three times. If someone recorded this speech and performed a point-by-point comparison, they would find that no single...

Think of a sentence and repeat it aloud three times. If someone recorded this speech and performed a point-by-point comparison, they would find that no single utterance exactly matched the others. Similar to different resolutions, angles, and lighting conditions in imagery, human speech varies with respect to timing, pitch, amplitude, and even how base units of speech �C phonemes and morphemes��

]]> 5 Chip Huyen <![CDATA[Mixed Precision Training for NLP and Speech Recognition with OpenSeq2Seq]]> http://www.open-lab.net/blog/?p=12300 2022-08-21T23:39:09Z 2018-10-09T13:00:45Z

The success of neural networks thus far has been built on bigger datasets, better theoretical models, and reduced training time. Sequential models, in...]]>

The success of neural networks thus far has been built on bigger datasets, better theoretical models, and reduced training time. Sequential models, in...

mixed_precision_training_flow

]]> 1 Bryan Catanzaro <![CDATA[Deep Speech: Accurate Speech Recognition with GPU-Accelerated Deep Learning]]> http://www.open-lab.net/blog/parallelforall/?p=4922 2022-08-21T23:37:30Z 2015-02-25T16:00:54Z

Speech recognition is an established technology, but it tends to fail when we need it the most, such as in noisy or crowded environments, or when the speaker is...]]>

Speech recognition is an established technology, but it tends to fail when we need it the most, such as in noisy or crowded environments, or when the speaker is...

deep_speech_thumb

Speech recognition is an established technology, but it tends to fail when we need it the most, such as in noisy or crowded environments, or when the speaker is far away from the microphone. At Baidu we are working to enable truly ubiquitous, natural speech interfaces. In order to achieve this, we must improve the accuracy of speech recognition, especially in these challenging environments.

]]> 6 ��˳��97caoporen��