RSS
Логотип
Баннер в шапке 1
Баннер в шапке 2

Nvidia DGX Supercomputers

Product
Developers: Nvidia
Date of the premiere of the system: 2016/04/05
Last Release Date: 2020/04/12
Technology: Supercomputer

Content

Main article: Supercomputers

2021: Nvidia DGX Station 320G

On April 12, 2021, at the annual GTC conference, NVIDIA announced a new version of the DGX Station - DGX Station 320G, which is based on the Ampere GPU, which contains technologies that allow dividing the GPU into subsections, and has a large amount of memory compared to the previous model. Up to 28 data processors and analysts can use the same station at the same time.

From the presentation on GTC it follows that each such station provides performance up to 2.5 Pflops. True, the presentation does not specify whether performance on double or single precision operations is meant.

The company says that a cluster based on CPU with similar performance today would cost about $1 million, and DGX Station costs $149 thousand.

'NVIDIA announces new version of DGX Station - DGX Station 320G '

The full list of announcements on GTC 2021 is available here.

2020: NVIDIA DGX Station A100

On November 16, 2020, NVIDIA introduced the NVIDIA DGX Station A100 petaflop integrated server. The second-generation DGX Station A100 artificial intelligence system accelerates complex machine learning and data processing tasks for teams working in corporate offices, research centers, laboratories or home offices.

With a performance of 2.5 petaflops in AI tasks, the DGX Station A100 is a server for workgroups with four NVIDIA A100 GPUs with tensor cores combined with the NVIDIA NVLink interface, with up to 320GB of memory to make new breakthroughs in data processing and artificial intelligence.

The DGX Station A100 is also a workgroup server that supports NVIDIA MIG (Multi-Instance GPU) technology. With this technology, a single DGX Station A100 can organize up to 28 separate instances for parallel tasks and multiple users without compromising system performance.

Organizations around the world have adopted DGX Station systems to perform data analysis and AI tasks in industries such as education, financial services, government, health care and retail. trade Among them:

  • BMW Group Production uses NVIDIA DGX Station systems to quickly obtain analytical data, as they develop and deploy AI models to improve operations.
  • DFKI, a German artificial intelligence research center, uses DGX Station to create models that solve critical social and industrial problems, including the creation of computer vision systems that help emergency services respond quickly to natural disasters.
  • Lockheed Martin uses DGX Station to develop artificial intelligence models that use sensor data and log files to predict the need for maintenance to increase production uptime, increase safety for workers, and reduce operating costs.
  • NTT Docomo, a leading mobile operator communications Japan with more than 79 million subscribers, uses DGX Station to develop innovative artificial intelligence-based services, such as an image recognition solution.
  • Pacific Northwest National Laboratory uses NVIDIA DGX stations to conduct federally funded national security research. PNNL specializes in technology innovations in energy fault tolerance and national security and is the leading center for high-performance computing in the United States, dealing with scientific discovery, energy fault tolerance, chemistry, geosciences and data analysis.

Although the DGX Station A100 does not require data center power or cooling, it is a server-class system that has the same remote management capabilities as the NVIDIA DGX A100 for data center. System administrators can easily perform any management tasks through a remote connection for data processors and researchers working at home or in laboratories.

The DGX Station A100 is available with four NVIDIA A100 80GB or 40GB GPUs with tensor cores, allowing teams of researchers to select a system based on their unique workloads and budgets.

The DGX Station A100 is more than 4 times faster than the previous generation DGX Station in working with complex models of dialog AI, for example, in the BERT Large conference. It provides nearly three times the performance of BERT Large training.

For large data center workloads, the DGX A100 will be available with the NVIDIA A100 80GB graphics processors, with twice the memory capacity of up to 640 GB per system, allowing AI teams to improve accuracy with larger datasets and models.

NVIDIA DGX A100 640 GB systems can also be integrated into NVIDIA DGX SuperPOD enterprise solutions, which will allow organizations to create, train and deploy massive artificial intelligence models on ready-made AI supercomputers available in assemblies of 20 DGX A100 systems.

Among the first DGX SuperPOD systems with the DGX A100 640GB on board are the Cambridge-1 supercomputer installed in the UK to accelerate health research, as well as the new AI supercomputer HiPerGator at the University of Florida, which will be used for research using artificial intelligence throughout the state of Florida.

Availability

NVIDIA DGX Station A100 and NVIDIA DGX A100 640GB will be available this quarter from NVIDIA partners worldwide. An upgrade option is available for NVIDIA DGX A100 320GB owners.

2018: Nvidia DGX-2

At the end of March 2018, Nvidia introduced the DGX-2 supercomputer, which has a capacity of about two petaflops, and is designed for deep learning tasks. According to the company, a single DGX-2 server can replace 300 conventional servers that occupy 15 racks in data centers with 60 times smaller and 18 times more energy efficiency.[1]

The supercomputer is based on 16 Tesla V100 video accelerators, combining the Nvidia GV100 graphics processor on the Volta architecture and 32 GB of HBM2 memory.

Graphics accelerator communication uses the NVSwitch interface, which allows two GPUs to "communicate" at speeds up to 300 Gb/s. This bus, along with NVLink 2, allows you to combine all 16 Tesla V100s into one giant video accelerator with almost 82 thousand CUDA cores, more than ten thousand Tensor cores and 512 GB of HBM2 memory with a bandwidth limit of 14.4 TB/s.

Nvidia DGX-2

The remaining specifications for the Nvidia DGX-2 supercomputer include two Intel Xeon Platinum processors (specific models are not called), up to 1.5 TB of DDR4 RAM, as well as NVMe drives with a total volume of 30 to 60 TB. In addition, there is a InfiniBand interface on board along with 100 gigabit Ethernet. The power consumption of the device in load is 10 kW.

The Nvidia DGX-2 supercomputer was estimated by the manufacturer at $400 thousand, deliveries will begin in the third quarter.[2]

2016: Nvidia DGX-1

On April 5, 2016, NVIDIA announced the creation of DGX-1 in order to support developments in the field of artificial intelligence.

NVIDIA DGX-1 is a system designed specifically for deep learning tasks. It is equipped with the necessary hardware, deep learning software and development tools to quickly deploy the system. It is based on graphics processors that provide data processing speeds comparable to 250 x86 servers of the architecture.

NVIDIA DGX-1, (2016)

The use of GPU-accelerated computing allows researchers in the field of data to create smart machines that can learn, see and perceive the world as a person. The system has significant computational power, allowing you to run artificial intelligence applications. It reduces researchers' time to learn large and complex deep neural networks.

Neural networks allow you to create new types of applications that work with enormous amounts of information and, accordingly, require a higher level of computational performance.

File:Aquote1.png
Artificial intelligence is the largest technological breakthrough of our time. Obviously, it will change all industries, all companies, and the whole way of human life. Artificial intelligence will give rise to new markets from which everyone will benefit. Today, researchers in the field of data and artificial intelligence spend too much time creating "home" high-performance computing systems. The DGX-1 system is simple to install and has only one goal: to unlock the potential of superhuman abilities and direct it to solve problems that were previously considered unsolvable.
File:Aquote2.png

The DGX-1 software set includes:

  • NVIDIA Deep Learning GPU Training System (DIGITS), an interactive system for creating deep neural networks (DNN),
  • NVIDIA CUDA Deep Neural Network (cuDNN) version 5, a GPU-accelerated primitive library for creating DNN.

The system includes optimized versions of several widely used deep learning frameworks - Caffe, Theano and Torch. DGX-1 provides access to cloud-based management tools, software updates, and a bank of container applications.

Characteristics

  • Peak performance up to 170 teraflops of half-precision computing (FP16)
  • Eight Tesla P100 GPUs, with 16 GB of memory on board each GPU
  • NVLink Hybrid Cube Mesh
  • 7TB SSD DL Cache
  • Dual 10GbE, Quad InfiniBand 100Gb
  • 3U - 3200 W

The NVIDIA DGX-1 supercomputer is the world's first system designed specifically for deep learning and accelerated data analysis tasks in the field of artificial intelligence. The supercomputer allows you to process and analyze information 100 times faster than traditional computing systems, which leads to significant savings in the formation and maintenance of your IT infrastructure.

The system is built on Tesla P100 accelerators, the high-speed connection between which provides the NVIDIA NVLink interface, which can increase the speed of communication between GPUs by up to 12 times compared to the PCI-E bus. The system software complex turns off the NVIDIA DIGITS GPU Training System, NVIDIA Deep Learning SDK (CuDNN, NCCL), NVIDIA Dockerd fast creation and training of deep neural networks (DNN). In addition, the system includes optimized versions of widely used deep learning frameworks - Caffe, Theano, Torch and not only. NVIDIA DGX-1 also has access to the cloud management system needed to create and deploy containers, system updates, and access application storage.

Notes