NVIDIA NeMo? ????? ?????? ???? ???? ??? ???? ??? AI ??? ???? ?? ?? ? ?? ??????.
?? NeMo ?? ??, ????, ???, ????? ??? ??? ? ????? ?? ???? ??? ??Canary? ??????. Canary? ??? ?? 3?? ?? ?? ?? ??? ??? ?????.?
? ?????? Canary ??? ?? ??? ?? ??? ?????.
Canary ??
Canary? HuggingFace Open ASR ????? ?? 6.67%? ?? ???(WER)? 1?? ??????. ?? ?? ?? ??? ?? ??? ?? ?????.
Canary? ?? ???? ?? ???? ???? ???????. 85,000?? ??? ??? ??? ???? ?? ??? ?????. Canary? ??? ???? ?? NVIDIA NeMo ??? ?? ??? ???? ???? ?? ??? ?? ????? ??? ??????.
Canary? ?? ?? ?? ???? ????? ???? ?? ? ?? ?? ? ??? ??? ??? Whisper-large-v3 ? SeamlessM4T-Medium-v1 ??? ?????. ??, ????, ????, ???? MCV 16.1 ??? ???? Canary? WER? 5.77?????(?? 1).

(???? ??)


Gradio ???? canary-1b ??? ??? ? ? ????. ???? Canary? ????? ? ?? ???? ??? ?? ??? ????NVIDIA/NeMo?GitHub ?????? ?????.
Canary ????
Canary? NVIDIA ??? ???? ??? ???-??? ?????.
???? ??? ??? ?? 3?, ???? ?? 4? ????? ???? ???? Conformer ???? Fast-Conformer???. ???? ??-? ?????? ??? ??? ???? ????, ????? ???? ?? ?? ???? ?? ??? ??? ?????. Canary? ??? ?? ? ??? ???? ???? ?? ??? ???? ??? ????.
?? Canary? ??? ?????? ???? ?? ?? ??? ????? ?????.?
?? ???? ?? ???? ???? CC BY-NC 4.0 ????? ?? ????, ? ??? ???? ? ???? ???NeMo? Apache 2.0 ????? ?? ?? ?????.
Canary? ???? ??
Canary? ????? NeMo? pip ???? ???? ???. NeMo? ???? ? Cython ? PyTorch(2.0 ??)? ?????.
pip install nemo_toolkit['asr']
NeMo? ???? Canary? ???? ??? ??? ????? ?????.
# Load Canary model
from nemo.collections.asr.models import EncDecMultiTaskModel
canary_model = EncDecMultiTaskModel.from_pretrained('nvidia/canary-1b')
# Transcribe
transcript = canary_model.transcribe(audio=["path_to_audio_file.wav"])
# By default, Canary assumes that input audio is in English and transcribes it.
# To transcribe in a different language, such as Spanish
transcript = canary_model.transcribe(
audio=["path_to_spanish_audio_file.wav"],
batch_size=1,
task='asr',
source_lang='es', # es: Spanish, fr: French, de: German
target_lang='es', # should be same as "source_lang" for 'asr'
pnc=True )
# To translate using Canary. For example, from English audio to French text
transcript = canary_model.transcribe(
audio=["path_to_english_audio_file.wav"],
batch_size=1,
task='ast',
source_lang='en',
target_lang='fr',
pnc=True )
??
Canary ??? ??? ??, ????, ???, ?????? ??? ?? ?? ? ??? ???? ??? ???? ??????.
Canary ????? ?? ??? ??? ?? ???? ????? ????.
Gradio ????? ??canary-1b? ??? ??? NVIDIA/NeMoGitHub ?????? ?? ??? ??? ??????. Parakeet-CTC? ?? ?? ???? ?? ??? NVIDIA Riva? ??? ? ??? ?????.
NVIDIA API ????? ?? AI ??? ??? ?? NVIDIA NIM? ?? ??????? ?????. ?? ??? ?? NVIDIA LaunchPad? ???? ??? ???? ??? ???? ? ????? ??? ?????.
?? ?? ??
? ???? ??? ?? ?? ?????? ??????.?Krishna Puvvada, Piotr Zelasko, He Huang, (Steve) Oleksii Hrinchuk, Nithin Koluguri, Somshubra Majumdar, Elena Rastorgueva, Kunal Dhawan, Zhehuai Chen, Vitaly Lavrukhin, Jagadeesh Balam, Boris G
?? ???
GTC ??: ??? ??? ????? ?? ?? ?? ????????(NVIDIA NeMo)
GTC ??: ???? ??? ?? ??(PEFT)? ??? ?? ?? ?? ?? ??
GTC ??: NVIDIA NeMo ? AWS? ??? LLM ???? ???
NGC ????: ???? NeMo ASR ??????
SDK: NeMo ????
SDK: NeMo