Mostofa Patwary – NVIDIA Technical Blog

Mostofa Patwary – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2024-10-18T20:15:54Z http://www.open-lab.net/blog/feed/ Mostofa Patwary <![CDATA[Curating Trillion-Token Datasets: Introducing NVIDIA NeMo Data Curator]]> http://www.open-lab.net/blog/?p=68797 2024-10-18T20:15:54Z 2023-08-08T18:33:00Z

The latest developments in large language model (LLM) scaling laws have shown that when scaling the number of model parameters, the number of tokens used for...]]>

The latest developments in large language model (LLM) scaling laws have shown that when scaling the number of model parameters, the number of tokens used for training should be scaled at the same rate. The Chinchilla and LLaMA models have validated these empirically derived laws and suggest that previous state-of-the-art models have been under-trained regarding the total number of tokens used…

]]> 0 Mostofa Patwary <![CDATA[Scaling Language Model Training to a Trillion Parameters Using Megatron]]> http://www.open-lab.net/blog/?p=24760 2023-03-22T01:12:02Z 2021-04-12T17:00:00Z

Natural Language Processing (NLP) has seen rapid progress in recent years as computation at scale has become more available and datasets have become larger. At...]]>

Natural Language Processing (NLP) has seen rapid progress in recent years as computation at scale has become more available and datasets have become larger. At the same time, recent work has shown large language models to be effective few-shot learners, with high accuracy on many NLP datasets without additional finetuning. As a result, state-of-the-art NLP models have grown at an exponential rate…

]]> 1 Mostofa Patwary <![CDATA[Adding External Knowledge and Controllability to Language Models with Megatron-CNTRL]]> http://www.open-lab.net/blog/?p=21265 2023-03-22T01:09:01Z 2020-10-06T13:00:00Z

Large language models such as Megatron and GPT-3 are transforming AI. We are excited about applications that can take advantage of these models to create better...]]>

Large language models such as Megatron and GPT-3 are transforming AI. We are excited about applications that can take advantage of these models to create better conversational AI. One main problem that generative language models have in conversational AI applications is their lack of controllability and consistency with real-world facts. In this work, we try to address this by making our large…

]]> 1 Mostofa Patwary <![CDATA[State-of-the-Art Language Modeling Using Megatron on the NVIDIA A100 GPU]]> http://www.open-lab.net/blog/?p=17320 2023-04-04T17:01:46Z 2020-05-14T13:00:46Z

Recent work has demonstrated that larger language models dramatically advance the state of the art in natural language processing (NLP) applications such as...]]>

Recent work has demonstrated that larger language models dramatically advance the state of the art in natural language processing (NLP) applications such as question-answering, dialog systems, summarization, and article completion. However, during training, large models do not fit in the available memory of a single accelerator, requiring model parallelism to split the parameters across multiple…

]]> 1 ��˳��97caoporen��