RSS
Логотип
Баннер в шапке 1
Баннер в шапке 2

Yandex DataSphere

Product
Developers: Yandex.Cloud
Date of the premiere of the system: 2020/05/29
Last Release Date: 2023/05/23
Branches: Information Technology
Technology: IaaS - Infrastructure as a Service

Content

The main articles are:

2024: Training with Yandex DataSphere neural network to detect fetal CNS defects during ultrasound

On August 19, 2024, Yandex announced the development of a neural network that will help doctors detect the symptoms of spina bifida, a severe congenital disease of the central nervous system in children, during an ultrasound study of pregnant women. This congenital pathology is difficult to diagnose as it occurs once per thousand newborns. It often leads to severe disability. With the help of technology, medical specialists will be able to see signs of this disease at an earlier date and send the patient for additional examination. The solution is available free of charge to all doctors and medical experts on the website of the Spin Bifida Foundation, which initiated the first such project in Russia. Training took place on the cloud platform using the Yandex DataSphere service. Yandex Cloud provided computing power and help from architects for the project for free. Read more here.

2023

Ability to use dedicated virtual machines for ML tasks

On May 23, 2023, Yandex Cloud announced the opening of access to the updated version of the Yandex DataSphere full-cycle machine learning service. Now developers can use dedicated virtual machines for ML tasks in the service. This will help IT professionals who are used to working with algorithms in their own infrastructure, it is easier to transfer ML computing to the cloud. Also, Yandex DataSphere has become more convenient to configure the development environment, which will allow you to quickly train and put models into production.

This Yandex DataSphere Dedicated mode of operation allows the user to reserve a virtual machine in the cloud for his project and work with it as long as necessary. By working with computing resources in Dedicated mode, you can speed up the development of machine learning models for various data analysis tasks. For example, to detect equipment breakdowns or manage risks in a company.

In addition to this Dedicated mode, Yandex DataSphere still has the ability to select Serverless mode when training models. Serverless computing technology allows you to automatically connect a virtual machine of the desired type only for the duration of direct calculations (model training, startup and other calculations). This mode allows the user to pay for computing power only during real learning and optimize computing costs as much as possible.

Also in Yandex DataSphere has an updated version of Jupyter Notebook, the most popular code editor for ML development. The updated interface, as well as pre-installed extensions - for example, navigating the notebook inside the laptop, make it easier to work with the Jupiter Notebook. In addition, Yandex DataSphere can configure transparent visualization of resource usage: monitor in real time what resources are available on the machines used and how they are disposed of.

Yandex DataSphere has all the necessary tools for the full cycle of machine learning development, as well as integration with other cloud services platforms - Data Proc (management) Apache Spark and Data Transfer (transfer tool). data An ML specialist can connect the necessary libraries inside the service for parallel processing of data on Spark clusters and directly - various cloud storage for analysis and. data storage Yandex DataSphere is also great for teamwork: other ML developers and specialists who are involved in working with machine learning models can be connected to projects. For example, a support engineer can adjust settings to operate the model, and an administrator can manage access settings.

Free provision to Russian universities

The company Yandex Cloud on April 6, 2023 announced that it would provide Russian universities with cloudy resources for training for free. AI

This will help educational organizations increase the number of machine learning programs and improve their quality

So, with the help of the service for ML development Yandex DataSphere, teachers will be able to train students to create basic machine learning models, check code faster, and launch educational research in the field of artificial intelligence.

Yandex DataSphere has all the necessary tools for the full cycle of machine learning development. In addition to computing resources, the service provides a predefined environment for working with neural networks, which in the future can be customized for various tasks. The Jupiter Notebook interface is also available as an environment for training models in Yandex DataSphere. Thanks to simple and familiar tools at Yandex DataSphere, students will be able to focus on working with code, as well as work on models for longer without automatically turning off the laptop.

Yandex DataSphere is a teamwork tool. You can connect several students to projects in the service at once and simulate the work of the ML team by roles. The support engineer can go into the project and adjust the settings for operating the model, and the administrator can manage the access settings directly in Yandex DataSphere. Project managers and analysts have the ability to track how much resources the team spends on model development.

File:Aquote1.png
It is important for us that not only companies, but also the entire scientific community, including universities, have access to machine learning technologies in the cloud. Yandex DataSphere will help teachers improve their learning programs, and students learn the basics of ML easier and faster, "said Anna Lemyakina, director of national strategic projects at Yandex Cloud.
File:Aquote2.png

Yandex Cloud for April 2023 supports 45 Russian universities with cloud expertise:,,,, and HSE RANEPA SPBGETU LETI KFU SPBPU many others. The grant support program sciences formations in the field of Computer Science has been operating since 2021. During this time, more than 100 have been issued for grants scientific research and ML development in educational projects. Students and scientists launched a crop monitoring system in Yandex Cloud, created algorithm a self-driving racing car and investigated dark matter.

2022: Yandex DataSphere 2.0 with reconfigured development environment

Cloudy The platform Yandex Cloud on September 23, 2022 announced the opening of access to the updated version of algorithms machine learning the Yandex DataSphere 2.0 development service.

The service has become more convenient to configure the development environment and connect tools for all stages of training ML models - from preparation data to operation.

This will allow you to quickly train and output machine learning algorithms to production. An updated version of Yandex DataSphere is already available to current users of the service.

All the necessary functions for the full development cycle of machine learning models are included in the updated Yandex DataSphere graphical interface. An ML specialist can create a separate project in the service, connect the necessary libraries to it, directly download data from cloudy storages and process it for training.

Also, in the changed Yandex DataSphere interface, it is convenient to manage project accesses and save model versions. Jupiter Notebook, one of the most popular ML development tools, is available by button in the service. Later, Yandex DataSphere will have alternative editors for working with code, for example, Visual Studio Code.

Yandex DataSphere has become more convenient for teamwork. At the same time, not only other ML developers can be connected to projects, but also other specialists who are involved in working with machine learning models. For example, now the support engineer can go into the project and adjust the settings for operating the model, and the administrator can manage the access settings directly in Yandex DataSphere. Project managers and analysts have the ability to track how much resources the team spends on model development.

File:Aquote1.png
In machine learning, speed is important. If you process data faster and test hypotheses, then faster you output models to production and benefit the business. In this version of Yandex DataSphere, we created a full-fledged data scientist workplace. The service helps to optimize the entire development cycle, focus on working with code and on integrating models into business processes, "said Alexey Bashkeev, head of the Yandex Cloud platform.
File:Aquote2.png

The service uses the technology server of computing: when editing and viewing the code, the power of the CPU (ordinary processors) or GPU (graphics processors) is not used virtual machine , and the required type is connected only for the duration of direct calculations (training models, starting and other calculations). This allows the user to pay for computing power only in time real learning. The time of editing and viewing the code, random downtime of the machine are not taken into account.

2020

Sharing

The Yandex.Cloud platform on October 23, 2020 announced the opening of a common access to the Yandex DataSphere machine learning developer service. The service helps companies and individual developers reduce the cost of creating and operating machine learning models, automatically manage the volume and type of computing resources, and reduce the loss of time for creating and organizing a development environment. Yandex DataSphere will be in the public domain from October 1.

Companies' global spending on artificial intelligence is projected by IDC to double over the next four years from $50bn in 2020 to $110bn in 2024. The expenses of Russian companies on AI amounted to $172 million at the end of 2019 with a forecast of growth of 30% annually. Development based on machine learning is already actively used by many Russian companies. For example, in medicine to create solutions for analyzing images, in retail to develop recommendation systems.

File:Aquote1.png
"Machine learning methods are becoming an increasingly popular tool for businesses around the world. But for many companies, it is still unavailable due to the high entry threshold and the cost of necessary computing resources. To solve these problems, we created DataSphere, where you can get a ready-made ML environment at the touch of a button. Various types of computing resources are available in DataSphere - from classic capacities to GPUs and distributed computing, and charging occurs only for the actually consumed server power during the execution of your tasks, "commented Alexey Bashkeev, head of the Yandex.Cloud platform.
File:Aquote2.png

Yandex DataSphere uses technology serverless computing in the development of machine learning models. The technology automates resource management and achieves significant savings. In DataSphere, editing and viewing code does not involve computing resources CPU or GPUs, the virtual machine required type is connected only during direct calculations (model training, startup, other calculations). As a result, the user pays only for the actually consumed computing resource. The time for editing and viewing the code, the operation of a randomly not turned off virtual machine is not charged. According to the results of DataSphere testing, which was attended by 200 users from various fields, the downtime of computing power in the development of machine learning is 50-70%, Yandex said.. Cloud When using DataSphere, this time will not be charged.

Yandex DataSphere also implements seamless switching between different types of computing resources. This means that within the framework of one model training scenario, the user can use different types of virtual machines - economical with conventional processors ( CPU) and faster with GPUs (graphics accelerators). The progress of model training will continue. In most cloud-based machine learning development environments, the learning model can only be calculated on one type of machine.

The third feature of DataSphere is the preservation of versions of model calculations, including data, code and states. This makes the process of developing machine learning more profitable for business: the progress made in training is not lost, it can be reproduced if necessary.

Opening access to Yandex DataSphere service by pre-registration

On May 28, 2020, Yandex announced that the Yandex.Cloud platform opens access to the Yandex DataSphere service for the development of machine learning. The service works in preview mode, access is provided by pre-registration. You can use Yandex DataSphere for free.

Yandex.Cloud opens access to Yandex DataSphere service

Yandex DataSphere is a cloud environment for using machine learning tools. Developers are offered the interface of Jupyter Notebook, one of the ML development tools. At the same time, the capabilities of Jupyter Notebook are adapted to work in the cloud and expanded, Yandex noted.

As of May 2020, 19 Russian companies using machine learning in business and 31 individual developers took part in the closed testing of Yandex DataSphere. According to experts, the service allows you to save up to 70% of resources on computing using GPU, Yandex reported.

According to the developer, Yandex DataSphere uses server serverless computing technology when working with machine learning. This means that editing and viewing the code does not involve computing resources CPU or GPUs. Virtual machine the required type is connected only for direct calculations: model training, startup, other calculations. With this approach, the client pays only for the time of real use of computing resources. Editing and viewing code, random downtime not turned off at night or on the output of the virtual machine is not charged.

Yandex DataSphere also implements seamless switching between different types of computing resources. In Yandex DataSphere, you can use different types of virtual machines without stopping computing and while maintaining progress: cost-effective with CPU (conventional processors) and fast with GPU (graphics accelerators). In most cloud machine learning development environments, it is possible to run model calculations only on one type of machine. If some of the calculations require the use of a more expensive GPU machine, the entire project will be calculated on it. In Yandex DataSphere, each part (cell) of the code can be executed on the desired machine type, and the results of previous calculations will be saved. To switch to another type of machine, you do not need to restart the entire project. This, according to Yandex, speeds up development, reduces costs and optimizes the use of expensive computing resources.

According to the developer, when the preview stage is completed and the service is put into commercial operation, another function will become available in DataSphere - saving versions of model calculations by three dimensions: data, code and state of the laptop. The function will simplify the teamwork of data specialists, make ML development a more manageable process for corporate users. It will also be possible to check the quality of the code and get recommendations on how to use computing resources optimally.