RSS
Логотип
Баннер в шапке 1
Баннер в шапке 2

Nvidia Triton Inference Server

Product
The name of the base system (platform): Artificial intelligence (AI, Artificial intelligence, AI)
Developers: Nvidia
Last Release Date: November 2021
Branches: Electrical and microelectronics

The Nvidia Triton Server (formerly TensorRT) is an open source software for deploying deep learning models in a work environment. The Triton server allows commands to deploy prepared AI models from local storage (TensorFlow, PyTorch, TensorRT Plan, Caffe, MXNet or Custom), Google Cloud or AWS S3 platforms on any GPU or CPU-based infrastructure. The server simultaneously runs several models on one GPU to increase utilization, and integrates with Kubernetes for orchestration, parameter management and automatic scaling.

2021: Multi-GPU Support

At the GTC conference in November 2021, Nvidia introduced the Triton Infection Server update. It now supports multiple GPUs and nodes, which allows you to distribute interference workloads for LLM among many graphics processors and nodes in real time. Such models require more memory than is available on a single GPU or even on a large server with multiple GPUs, and the interference must run quickly.

Megatron 530B was also introduced - a castomized large language model that can be taught for new subject areas and new languages. With the Triton Infection Server, the Megatron 530B can run on two Nvidia DGX systems to reduce processing time from a minute on the CPU server to half a second. This may allow the deployment of LLM for real-time applications.

The full list of November GTC 2021 announcements is available here.