RSS
Логотип
Баннер в шапке 1
Баннер в шапке 2

Cloud.ru: Evolution ML Inference

Product
Developers: Cloud.ru (Cloud) formerly SberCloud
Date of the premiere of the system: 2025/04/17
Technology: IaaS - Infrastructure as a Service

Main article: What is IaaS

2025: Evolution ML Inference Presentation

Cloud.ru April 17, 2025 introduced Evolution ML Inference - ready cloud service for inference (output) Large Language Models (LLM) with the ability to separate graphics (processors GPU) and a flexible approach to the disposal of computing resources. In addition to the Cloud.ru models already available within the service GigaChat , the business can launch and deploy its own - AI models and any ML/DL open source models from the Hugging Face library based on cloud GPUs in a few clicks. The service is already available to users in General availability mode and will be included Cloud.ru Evolution AI Factory in the - a ready-made set of tools for working AI with in the cloud.

Evolution ML Inference is suitable for companies and users who develop AI and ML solutions and want to quickly and cost-effectively launch their own ML model and personal end product for work. This is a fully managed service - the user only configures the configuration, model and type of scaling. At the same time, the Cloud.ru provides access to powerful graphics processors, as well as perform full administration and maintenance of the infrastructure.

Key benefits of the service:

  • Shared GPU - the technology allows you to share GPU resources and consume the amount of vRAM (video memory) that is necessary for the efficient operation of the model itself without delay, with the ability to dynamically redistribute resources depending on current customer needs. This increases the utilization of capacity in AI projects from 15 to 45% compared to the scenario when GPUs are fully used.
  • The simplicity and flexibility of service management make it possible to both run models without the need to build an image directly from Hugging Face, and launch your own images with your environment.
  • The solution provides a high degree of adaptation and rational use of available resources: several models can be run simultaneously on one video card. This makes the technology most optimal for distributed systems with heterogeneous computing infrastructure and helps to efficiently scale the load.
  • Skylation mode (effective scaling) - charging for using the model begins only at the moment of accessing it.

{{quote 'author=said Evgeny Kolbin, CEO of cloud and AI technology provider Cloud.ru. | According to our estimates, about 70% of users download GPU resources reserved for inference during the operation of ML models by less than 55%. When implementing AI, in most cases the model execution environment becomes the base. Therefore, to save resources and optimize costs while using artificial intelligence technologies, especially GenAI, you need a productive infrastructure with flexible real-time scaling,}}

Having deeply studied the needs of customers and the most popular requests for infrastructure and services for AI, we presented the market with the first managed cloud service for LLM information. With it, business can effectively manage computing resources in a high-data-intensive environment. By placing Evolution ML Inference in the cloud, companies can simplify access to AI and make using AI tools easier and more convenient.