Jared Casper – NVIDIA Technical Blog News and tutorials for developers, data scientists, and IT admins 2023-04-04T17:01:46Z http://www.open-lab.net/blog/feed/ Jared Casper <![CDATA[Scaling Language Model Training to a Trillion Parameters Using Megatron]]> http://www.open-lab.net/blog/?p=24760 2023-03-22T01:12:02Z 2021-04-12T17:00:00Z Natural Language Processing (NLP) has seen rapid progress in recent years as computation at scale has become more available and datasets have become larger. At...]]>

Natural Language Processing (NLP) has seen rapid progress in recent years as computation at scale has become more available and datasets have become larger. At the same time, recent work has shown large language models to be effective few-shot learners, with high accuracy on many NLP datasets without additional finetuning. As a result, state-of-the-art NLP models have grown at an exponential rate…

Source

]]>
1
Jared Casper <![CDATA[State-of-the-Art Language Modeling Using Megatron on the NVIDIA A100 GPU]]> http://www.open-lab.net/blog/?p=17320 2023-04-04T17:01:46Z 2020-05-14T13:00:46Z Recent work has demonstrated that larger language models dramatically advance the state of the art in natural language processing (NLP) applications such as...]]>

Recent work has demonstrated that larger language models dramatically advance the state of the art in natural language processing (NLP) applications such as question-answering, dialog systems, summarization, and article completion. However, during training, large models do not fit in the available memory of a single accelerator, requiring model parallelism to split the parameters across multiple…

Source

]]>
1
Jared Casper <![CDATA[NVVL Accelerates Machine Learning on Video Datasets]]> http://www.open-lab.net/blog/?p=10313 2022-10-10T18:51:24Z 2018-05-02T16:18:12Z Loading data onto GPUs for training has historically been a minor issue for most deep learning practitioners. Data read from a local spinning hard drive or NAS...]]>

Loading data onto GPUs for training has historically been a minor issue for most deep learning practitioners. Data read from a local spinning hard drive or NAS device would be preprocessed on the CPU, then shipped to the GPU for training. The data input pipeline rarely proved to be the bottleneck given the long number-crunching times involved. As GPUs improve and DL frameworks use them more…

Source

]]>
0
���˳���97caoporen����