RSS
Логотип
Баннер в шапке 1
Баннер в шапке 2

Megatron

Product
The name of the base system (platform): Artificial intelligence (AI, Artificial intelligence, AI)
Developers: Nvidia, Microsoft
Date of the premiere of the system: 2021/10/15
Technology: Speech Technology

Content

Main articles:

Megatron is an open source tool of the Nvidia team of researchers, on the basis of which language model training is studied on any scale.

2021

Release of the Nvidia NeMo Megatron framework

At the GTC conference in November 2021, Nvidia introduced Nemo Megatron, a framework for teaching large language models containing trillions of parameters. NeMo Megatron is based on Megatron, an open-source project of the NVIDIA research team that studies effective language model training on any scale. The framework automates the complexity of LLM learning with data-processing libraries that collect, process, organize, and purge data.

Nemo Megatron is optimized for scaling in the Nvidia DGX SuperPOD accelerated computing infrastructure.

On the basis of the same project, Megatron 530B was introduced - a castomized large language model that can be taught for new subject areas and new languages.

The full list of November GTC 2021 announcements is available here.

Create a Microsoft-enabled natural language generation model

On October 15, 2021, Microsoft and Nvidia joined forces to create the Megatron-Turing Natural Language Generation natural language generation model, which contains 530 billion parameters.

Trend in the size of modern NLP models over time

MT-NLG has 3 times more parameters than the existing largest model of this type and shows high accuracy in a wide range of natural language problems, such as:

  • Predicting the completion of the text in meaning;
  • Understanding what has been read;
  • Generation of logical leads;
  • Creation of natural language opinions;
  • Distinguishes the meaning of words with multiple meanings.

Learning such a powerful model was made possible thanks to numerous innovations. For example, NVIDIA and Microsoft combined a state-of-the-art GPU-based learning infrastructure with a distributed learning software stack. databases Natural language was created, including hundreds of billions of units, content and teaching methods were developed to increase efficiency and stability of optimization.