RSS
Логотип
Баннер в шапке 1
Баннер в шапке 2

RT.Datalake Storage and Processing Solution for Any Volume

Product
The name of the base system (platform): Rostelecom Data Management Platform
Developers: Rostelecom
Date of the premiere of the system: 2022/06/08
Technology: Big Data,  MDM - Master Data Management - Master Master Data Management,  DSS

Content

The main articles are:

RT.DataLake is a set of ecosystem applications Hadoop combined into a distribution kit prepared for installation.

2023: Compatibility with "Plus 7 FormIT on Hadoop"

On October 12, 2023, DIS Group and Rostelecom signed a protocol on the compatibility of RT.DataLake and Plus7 FormIT on Hadoop products.

The joint use of RT.DataLake and Plus7 FormIT will significantly expand the capabilities of Rostelecom's data management platform, including profiling data and leading it to the required level of quality - which is especially important when creating digital twins and complex predictive analytics that require the highest level of data quality.

File:Aquote1.png
Large corporate customers may need additional powerful data integration tools when creating data lakes based on RT.DataLake. In such cases, we believe that Plus7 FormIT on Hadoop will be a good solution and have successfully tested its compatibility with RT.DataLake, "said Stanislav Lazukov, director of data management platform development at PJSC Rostelecom.
File:Aquote2.png

Within the protocol, the following test tasks were identified:

  • Automated installation of the RT distribution. DataLake using RT.ClusterManager;
  • Integration of RT.DataLake platforms and the Plus 7 FormIT on Hadoop component;
  • Read, write and process data on the Hadoop cluster through the "Plus 7 FormIT on Hadoop" connection in Native, Spark modes;
  • Sqoop utility health: Oracle DB and PostgreSQL tables were used as source and receiver.

The test tested all major collaboration modes, including key features for data integration and data processing on the Plus 7 FormIT on Hadoop platform in Pushdown mode on the RT.DataLake platform.

File:Aquote1.png
Confirmation of the compatibility of "Plus 7 FormIT on Hadoop" and RT.DataLake will make it possible to create projects for the construction and filling of data lakes with large amounts of unstructured information that can be used to solve a variety of business problems, - said Oleg Hyatsintov, Technical Director of DIS Group.
File:Aquote2.png

2022

Tasks and features to be solved

RT.DataLake solves the problems of reliable storage of multi-format data and distributed computing.

RT.DataLake is suitable for storing streaming data from devices, logs, files and other information. With distributed storage technology using the HDFS file system, high reliability is achieved - if one of the cluster nodes fails, information will not be lost.

In RT.DataLake, you can perform analytical queries and data transformation - mechanisms,, TEZ are implemented MapReduce Spark. The product allows you to prepare data for use in models machine learning and for data research, profiling or reporting analytical.

As of July 2022, RT.DataLake supports flexible settings for security and data access policies using Kerberos and Ranger technologies, which allows you to store even personal and other sensitive data in the lake without fear of information leakage.

Features:

  • Import-independent product registered in the register of domestic software;
  • Use of a secure repository with source code and support for domestic operating systems;
  • The ability to individually build versions of distribution components;
  • Built-in HAProxy load balancer, support for Zstd compression codec, support for Yarn UIv2 and queues for fair and capacity sheduler (from English scheduler), and other improvements;
  • Automation of service functions: installation, configuration of the cluster, monitoring of health and incident resolution (mechanisms for administration and problem solving);
  • 24x7 support. Geographically distributed support team and full set of technical and operational documentation in Russian;
  • Flexibility of use. The product is available in two options: cloud and on-premieses;
  • Short cycle of change delivery (bugfix, securitybugfix and newfeatures) and release of new versions of RT.DataLake by automating build and version testing;

RT.DataLake supports current and stable versions of Hadoop ecosystem components - 2nd and 3rd. This will eliminate compatibility issues when migrating data from other lakes and data sources. And also help painlessly solve the issue of switching from imported solutions to a domestic product.

The product can be used in conjunction with RT.ClusterManager.

Entering the Russian market

On June 8, 2022, Rostelecom announced that it was bringing to the Russian market an independent import freeware distribution kit RT.Datalake, a special assembly of one of the components of the Data Management Platform, which is designed to organize an effective storage of big data.

RT.Datalake allows you to store and process data of any volume to solve various business problems: from reporting to creating machine learning models. Thus, Rostelecom provides Russian companies with the opportunity to build their own data lake for free on a domestic, completely import-independent assembly of the Hadoop distribution without the cost of license fees.

The product is based on Apache Hadoop, it meets high reliability and availability requirements, and also has a low storage cost. Freeware users will have access to the assembly and detailed instructions for installing it with the specified parameters and settings. RT.Datalake includes the following component versions: Hadoop 3.0.0, HBase 2.2.6, Hive 3.1.1, Hue 4.10, Spark 3.0.0, Zookeeper 3.7.0.

Users of the freeware distribution will be able to conclude an agreement for the advice of Rostelecom specialists - a format of technical support, when the client asks for clarification on any issue that arose during the installation or operation of the assembly. In addition, the user always retains the ability to migrate to a commercial version, in which most actions are automated using ansible scripts and RT.ClusterManager, which simplifies the installation and operation process.

File:Aquote1.png
The "data management platform" as early as June 2022 helps Russian companies reduce dependence on foreign software suppliers and reduce data costs. Due to the departure of Western vendors from Russia and the need to ensure technological sovereignty, we decided to provide Russian companies with the opportunity to use our products for free with limited functionality. We give the opportunity to use the RT.DataLake product for free, and in the near future we will make available two more products in the freeware version: RT.Warehouse and RT.Streaming, - said Sergei Nosov, director of data management at Rostelecom.
File:Aquote2.png