RSS
Логотип
Баннер в шапке 1
Баннер в шапке 2

ADH Arenadata Hadoop

Product
The name of the base system (platform): Apache Hadoop
Developers: Arenadata (Arenadata Software)
Last Release Date: 2024/08/28
Technology: DBMS

Content

Main article: Database Management System (DMS)

Arenadata Hadoop (ADH) is a full-fledged distributed storage platform distribution based on Apache Hadoop, adapted for enterprise use.

  • Search and contextual search engines for high-load websites and online stores
  • Store, sort, and process vast amounts of unstructured data

2024

Hadoop User Experience and ACID Transaction Support

Arenadata on August 28, 2024 introduced the next version of Arenadata Hadoop (ADH), an enterprise distribution for storing, processing and analyzing unstructured and semi-structured data. The release expands data management capabilities with ACID transaction support and an HUE web interface for easy analyst interaction with databases and datastores.

The updated version of the product includes Apache Iceberg, a support library for high-performance open table data format for managing information at the file system level. Apache Iceberg allows you to work with structured information in a data lake using SQL queries, and easily integrates into your existing infrastructure thanks to compatibility with most storage technologies (from HDFS to S3) and popular computing tools such as Spark, Impala, Hive, etc.

Iceberg solves the problems of traditional table formats and provides new capabilities, including consistent parallel writing of data to shared files in a cluster, retrospective queries to early versions of data and rollback of changes, changing the storage scheme, partitioning of data, etc. The use of Iceberg tables allows you to significantly increase the speed of queries by incrementally processing data, quickly scanning and filtering irrelevant data.

The presented version of Arenadata Hadoop supports HUE (Hadoop User Experience), a web interface of the Hadoop ecosystem designed for data analysis. It is able to connect to a DBMS, computing tool or data store through native connectors and simplifies work with data sources. HUE is in demand by a wide range of users: from business analysts, data engineers and data scientists to database administrators and SQL developers. As part of Arenadata Hadoop, HUE contains pre-configured SQL interpreters for Impala, Hive, Kyuubi and Spark SQL, as well as monitoring YARN and Impala tasks and the ability to view the HDFS file system.

File:Aquote1.png
"At the moment, Iceberg support is available on Spark, Impala services and limited to Hive (read only). In the next release, we will expand functionality in Hive and add support for Iceberg format in Flink, which will increase streaming capabilities, "said Alexander Anisimov, Technical Director of Arenadata Hadoop Product. "In turn, the new HUE service will receive additional functionality related to security and fault tolerance, and the list of pre-configured interpreters will also be expanded."
File:Aquote2.png

The Arenadata Hadoop release included other changes: updated versions of Impala, Spark, Kyuubi and Zeppelin services, added LDAP-authentication for Impala and Kyuubi and the Kyuubi AuthZ plugin in Spark3 to support authorization in Ranger, simplified management SSL-enciphering for cluster services. The security engine has also been updated: the Arenadata Platform Security updated version introduces support for the Samba domain controller and high availability mode for Ranger KMS.

MWS Cloud Availability

MTS, a digital ecosystem, on July 26, 2024 announced the conclusion of a strategic partnership between MTS Web Services (MWS), part of the MTS group, and Arenadata. As part of the agreement, MWS launched five services on software products from Arenadata, including Arenadata Hadoop (ADH). Read more here.

Add Apache Kyuubi service

Arenadata has included Apache Kyuubi, a distributed multi-user SQL gateway for corporate data warehouses and lakes, in the Arenadata Hadoop (ADH) enterprise distribution. It enhances the fast interactive analytics capabilities in Arenadata Hadoop and provides easy and secure access to any cluster resource through a single entry point. The company announced this on July 25, 2024.

Kyuubi provides a unified interface for accessing computing engines through a single authentication and authorization system. Thanks to the service, data scientists and analysts are able to process data using the usual engine supported by the product. In turn, database administrators are provided with a single interface for configuration, security, and data access control.

File:Aquote1.png
In the context of the Arenadata Hadoop distribution, we have already provided the ability for Kyuubi to work with Spark SQL and Hive, and plan to expand this functionality with Flink SQL support. In addition, we have developed and submitted to the upstream of the project the possibility of supporting the Impala dialect for the JDBC engine, in our product it is available starting from ADH 3.2.4.2, "said Alexander Anisimov, Technical Director of Arenadata Hadoop product.
File:Aquote2.png

This service provides SQL interface and support for JDBC/ODBC, which makes it convenient for batch processing of ETL/ELT, analytics, ad-hoc tasks and integration with BI systems. Kyuubi also provides efficient management of the computing resources of the Spark SQL engine, allowing you to both combine computing resources for a group of users in one session and guarantee the isolation of the necessary resources for each user or connection.

Security and high availability features are fundamental requirements for enterprise use. As part of this trend, Kyuubi introduced support for LDAP authentication for clients, and added the Kyuubi AuthZ plugin for Spark 3, which provides data access control - functionality is available starting with ADH 3.2.4.3. In the context of improving high availability, the Arenadata team developed and upstream the project the ability to use PostgreSQL as a metadata store (metastore).

In addition to the basic use case, Kyuubi allows you to expand the capabilities of the server and computing engines. The server can implement user functions in modules responsible for authentication, configuration, etc. For computing engines, you can add new features by developing your own plugins and using third-party ones. "Kyuubi meets the requirements of our customers for enterprise-level projects. The service expands the capabilities of Arenadata Hadoop in terms of interactive access, isolation of computing resources, support for several workloads and data security, "said Ekaterina Ulyashova, product marketing manager at Arenadata.

Obtaining FSTEC certificate for compliance with the requirements of the 4th level of trust and technical specifications

Arenadata Hadoop (ADH) - a corporate distribution based on Apache Hadoop - received certificate of conformity FSTEC RUSSIAN FEDERATION No. 4821 dated June 13, 2024. The document certifies that the product is certified according to the requirements of the 4th level of trust and technical specifications.

The 4th level of trust of FSTEC is one of the highest levels of trust in the means of technical protection of confidential information. He confirms that Arenadata Hadoop is suitable for use: · in significant objects of the critical information infrastructure of the 1st category; · state information systems of the 1st security class; · automated systems for control of production and technological processes of the 1st class of security; · personal data information systems, if it is necessary to ensure the 1st level of personal data security; · class 2 public information systems that process restricted information, including personal data, service, commercial and other types of confidential information.

File:Aquote1.png
This is an important event for us, as a developer of Arenadata Hadoop, and for our customers and potential customers, the specifics of which prescribe to use software products only with such a high level of security, - said Alexander Ermakov, CTO of Arenadata.
File:Aquote2.png

The FSTEC certificate confirms that Arenadata Hadoop is software with built-in means of protection against unauthorized access to information that does not contain information constituting a state secret, implementing the functions of identification and authentication, access control and registration of security events. The distribution meets the information security requirements established in the document "Information Security Requirements Establishing Levels of Trust in Information Security Tools and Information Technology Security Tools."

Based on the technical conclusions, Arenadata Hadoop is included in the state register of the certification system for information protection tools in accordance with information security requirements dated June 13, 2024. The received FSTEC certificate is valid until June 13, 2029.

Apache Impala compatibility with Arenadata Catalog

On May 16, 2024, Arenadata announced that DataCatalog (part of the Arenadata Group) had tested a connector that provides compatibility between the Arenadata Catalog (ADC) product and the Apache Impala service, which is part of the Arenadata Hadoop (ADH) enterprise distribution. Read more here.

Adding Smart Storage Manager

Arenadata has included Smart Storage Manager (SSM), a tool for optimizing storage and data management, in the Arenadata Hadoop (ADH) enterprise distribution. The service expands the capabilities of the distributed HDFS file system and allows you to choose the most efficient way to work with data, reducing the overhead of storing them and increasing the performance of requests. The digging reported this on April 2, 2024.

In a typical Hadoop installation, 80% of the processing load is 20% of the data. To optimize data management, depending on their demand, Smart Storage Manager analyzes the frequency of file calls, and then automatically, based on the rules set by administrators, moves hot data to the cache, warm data to media optimal for performance (SSD), cold data to archive to storage optimized media (HDD ). This reduces the cost of storing infrequent data, improves hot data reading performance, and optimizes hardware usage.

Smart Storage Manager provides the ability to configure asynchronous data replication between different Hadoop clusters or between a Hadoop cluster and cloud storage. The service tracks data change operations such as creating, deleting, adding, and renaming to ensure real-time synchronization and avoid MapReduce's computational costs. Easily configure and manage replication to implement disaster recovery (DR) scenarios.

File:Aquote1.png
A standard tool for replicating data between different Hadoop clusters - the Distributed Copy command - is suitable for batch replication of large amounts of data and is not applicable in a number of other scenarios. With the advent of SSM, we have expanded the capabilities of Arenadata Hadoop with a new Data Sync functionality that allows asynchronous replication with the least delay and influence on the source cluster, "said Alexander Anisimov, Technical Director of Arenadata Hadoop.
File:Aquote2.png

Smart Storage Manager policies and rules allow you to flexibly configure Erasure Coding, a fault-tolerant data distributor technology. The technology supports data compression in HDFS without restricting access to it for external applications, which helps save space in the storage subsystem.

File:Aquote1.png
Due to the growth of unstructured data and its inequality in terms of query frequency, it makes no sense to optimize the entire array. SSM allows you to collect and analyze historical indicators, based on them to identify and predict data access patterns in order to automatically adjust storage options, optimizing costs and increasing performance, "said Ekaterina Ulyashova, Product Marketing Manager at Arenadata.
File:Aquote2.png

The service also includes a solution for optimizing memory consumption when working with small files - they can be compressed into one container file, which is stored in HDFS and the data in it is available for top-level applications. This reduces overhead and improves the performance of writing and reading small files.

For the convenience of cluster administrators, Smart Storage Manager provides a web-based interface that allows you to create rules, run actions, check their execution status, and monitor cluster statistics.

Availability from the CROC Cloud

Croc Cloud Services and Arenadata entered into a partnership agreement in which Arenadata products became part of the services provided to customers on the basis. In CROC clouds particular, provider Arenadata Hadoop (ADH) is now available to cloud customers.) CROC announced this on January 17, 2024. More. here

2023

Arenadata Hadoop 3.1.2.1.b1 with Apache Impala service

Arenadata on October 25, 2023 presented the release of Arenadata Hadoop (ADH) 3.1.2.1.b1, which included the Apache Impala service - a distributed SQL query execution system in the Hadoop ecosystem. The service is designed for interactive data processing on extremely large amounts of data, it opens access to new use cases that require high request execution speed.

The following advantages of the new ADH functionality for users can be distinguished:

  • The rate at which requests are processed in the data lake. Impala provides low latency and high levels of parallelism in the Hadoop ecosystem. This will allow you to more effectively solve the problems of self-service analytics and ad-hoc queries.
  • Easy to implement into your current infrastructure. Customers who already have Hive installed do not have to migrate data and tables, as Impala uses the same metadata, file formats, and connection drivers.
  • Scale independent of the main Hadoop cluster. Arenadata Hadoop provides the ability to deploy Impala outside the main cluster. This allows you to eliminate competition for hardware resources and separately scale the analytical load even in existing ADH installations.
  • Optimizing the use of equipment. The service will help reduce the cost of data processing by optimizing the use of hardware. In addition, you no longer need to spend resources on SQL adaptation and additional training for analysts when migrating from Cloudera Data Platform.
  • Optimize customer landscape. Individual ad-hoc and self-service analytics scenarios that require massively parallel processing can now be implemented locally without loading them on the primary data store.

File:Aquote1.png
The inclusion of another service in Arenadata Hadoop has significantly increased product performance for a number of business scenarios. It largely meets customer requirements in terms of query execution speed, allowing new use cases to be implemented. In the upcoming ADH releases, we plan to expand the functionality of this service: its security, availability and integration with other components, for example, such as Arenadata Platform Security (ADPS), - commented Alexander Ermakov, CTO of Arenadata.
File:Aquote2.png

The updated Arenadata Hadoop release included other changes: automatic management of the high availability of Hadoop services was presented, ADB Spark Connector with Spark3 support was added, the cluster kerberization functionality was improved, which allows for more detailed configuration.

Cloud Availability Cloud.ru

and Provider cloudy AI-technology Cloud.ru has become a strategic partner Russian of the vendor ON for storages and processing. data Arenadata Arenadata products can now cloud Cloud.ru be used in. Arenadata announced on August 29, 2023.

According to TAdviser representatives of Arenadata, the following products of the company are available in the Cloud.ru cloud: ADB - Arenadata DB, ADH (Arenadata Hadoop), ADQM (Arenadata QuickMarts) and ADS (Arenadata Streaming). Read more here.

Availability in beeline cloud

The product ecosystem Arenadata is now available to customers. cloudy provider beeline cloud Arenadata announced this on August 3, 2023. As TAdviser representatives of Arenadata specified, among the products available through the beeline cloud:, Arenadata DB Arenadata Hadoop,,, and. Arenadata Streaming Arenadata Postgres Arenadata QuickMarts Arenadata Cluster Manager Read more here.

Rubbles MLOps Suite Compatibility

IT companies Rubbles and Arenadata have made their software products compatible. Now the Rubbles MLOps Suite platform and Arenadata data storage and processing solutions can work seamlessly in a single software package. Arenadata announced this on July 19, 2023.

A single software package combines Rubbles MLOps solutions and Arenadata products, including Arenadata Hadoop. Read more here.

Arenadata Hadoop 2.1.7_b1 with Hbase, Flink and Zookeeper components

On January 24, 2023, Arenadata announced the release of a version of the Apache Hadoop-based distribution adapted for corporate use - Arenadata Hadoop (ADH) 2.1.7_b1. The release included updates to versions of the Hbase, Flink and Zookeeper components. It also added the Airflow2 service and the Livy component version 0.7.1 with support for working with Spark version 3.3.0 on Scala 2.13.

Arenadata Hadoop 2.1.7_b1 included the following improvements and updated functionality:

  • the ability to install the version of the Flink 1.15.1 service, as well as "by button" update from the previous version. Users will be able to use the functionality of this version and fix the problems of the previous one;
  • the ability to install this version of the Hbase 2.0.4 service and "by button" update from the previous version, fixing the problems of the old version;
  • the ability to install this version of the Zookeeper 3.5.10 service, as well as "by button" update from the previous version. This version will fix the problems of the previous one;
  • the ability to install a new Spark3 service component - Live 0.7.1 with support for Spark 3.3.0 on Scala 2.13 as a separate component parallel to Live for Spark version 2.3.2. The component has added the ability to support Spark 3.3.0 and Scala 2.13;
  • updated Airflow service version 2.3.3. It is installed in parallel with Airflow version 1.10.11. This makes it possible to transfer settings and tasks from a previous version;
  • supports krb5.conf and ldap.conf customization via the Arenadata Cluster Manager (ADCM) interface. This allows the user to independently use ADCM to change the contents of the krb5.conf and ldap.conf configuration files. This functionality will allow you to configure more complex and subtle kerberization and authorization configurations depending on the complexity of the infrastructure;
  • added the ability to forcibly disable HDFS ACL access policies if the Ranger plugin is enabled. This will allow you to use a single "point of truth" to organize user access.

File:Aquote1.png
Arenadata is committed to providing customers with the ability to receive the latest Hadoop components. It is important that users have the ability to update them automatically "by button." Specialists, working on updated versions of Arenadata Hadoop, make sure that the product is safe and its components are compatible with each other and worked consistently throughout the platform.
told Alexander Bolshakov, Arenadata Product Director.
File:Aquote2.png

2022

Gagar Server and Disk Array Compatibility Testing > n

server The equipment manufacturer GAGAR>N , together Russian with the data platform developer Arenadata and IT the company "," T1 Consulting tested the equipment for working Big Data with and confirmed the effectiveness of its use for creating domestic complexes storages and processing. This was announced big data on July 26, 2022 by representatives of T1 Consulting.

In connection with the increased need of customers to build an IT infrastructure based on Russian T1 Consulting hardware and software systems, he organized the development of PAC using POArenadata DB and Arenadata Hadoop on the x86 server platform and GAGAR > N disk arrays. Before forming a ready-made proposal, specialists conducted testing to verify the compatibility of computing equipment with software solutions. Read more here.

Kerberos authorization to prevent unauthorized access

On February 24, 2022, Arenadata announced that it had provided users with three of its products - Arenadata Hadoop (ADH), Arenadata Streaming (ADS) and Arenadata Platform Security (ADPS) - the ability to prevent unauthorized access to services and cluster data. Now it is enough to enable Kerberos authorization for all components, which will allow you to store authorization data in Active Directory. Read more here.

2021

Mail.ru Cloud Solutions Availability

Mail.ru Cloud Solutions on July 21, 2021 announced the launch of Hadoop 3.0 as a service based on the Arenadata solution. Read more here.

Arenadata Hadoop 2.1 with Platform Security module

On January 18, 2021, Arenadata announced the completion of the final testing of the updated component to ensure centralized management of cluster security policies - the Arenadata Platform Security module.

Arenadata Platform Security

According to the company, the service will allow businesses to provide an integrated approach to security organization in the following key areas: perimeter security, user authentication and authorization, audit of user actions, data protection. Allows you to create a single data security system for multiple installations, clusters, and heterogeneous infrastructures at the same time.

The first release of Platform Security (1.0.) comes as a free addition to the Enterprise edition of the latest versions of the Arenadata Hadoop 2.1 distribution (based on Hadoop 3.x). Starting with the current release, all components necessary for the organization and security settings will be delivered as part of a separate ADPS module.

Arenadata Hadoop is a distribution based on Apache Hadoop, adapted for corporate use as part of a single corporate platform Arenadata EDP. It is an open source Hadoop distribution, starting with ADH 2.1.2.3 is available in the Enterprise edition.

The enterprise version of Arenadata Hadoop 2.1 includes all updated releases of Apache Ranger components, a component for monitoring and managing comprehensive data security on the Arenadata Hadoop platform. This service is responsible for administering security-related tasks, monitoring system access requests, and various authorization methods for all Hadoop components and tools.

A number of updates will appear in the Arenadata Hadoop 2.1 functionality with the Platform Security module:

  • Ranger update to support Apache Hadoop 3.1.
  • Ranger update to support Hive 3.0.
  • Ranger update to support Hbase 2.0.
  • Ranger support for Apache Kafka 2.0.0.
  • Plug-in support for enabling, monitoring and managing Elasticsearch.
  • Security zones in Apache Ranger.
  • Support for trusted proxy.
  • Ranger update to support Ozone.
  • KeySecure HSM integration.
  • Support for user conditions at the policy level.
  • Improvements to support roles in Ranger policies.
  • Hive plugin enhancements to support SQL commands.

File:Aquote1.png
The first release of Arenadata Platform Security will allow our users to centrally manage data security when working with a variety of latest Arenadata Hadoop services and clusters. This solution will reduce operating costs and will be especially important in working with large amounts of sensitive data, namely in industries such as banks, fintech, telecom, medicine, insurance and retail.

told Alexander Ermakov, technical director of Arenadata
File:Aquote2.png

The functionality of any product based on the Arenadata EDP platform is native integration with other Arenadata solutions and a fully automated installation process for other products. In the case of Platform Security, the owners will have a separate bundle, which includes all security solutions. Using the standard visual interface in the Arenadata Cluster Manager orchestrator, the client deploys it on its system. Fully automated any processes associated with installation, configuration and other activities required to integrate products into the enterprise platform.

The upcoming product development plans include connecting other components of the Arenadata EDP platform, in particular, implementing Apache Ranger support for Arenadata Streaming clusters. In the future, Arenadata Platform Security will become a separate umbrella add-on for managing the security of all components of the Arenadata Enterprise Data Platform (EDP) enterprise data collection and storage platform.

2020

Arenadata Hadoop 2.1.2.4

On October 26, 2020, Arenadata announced that the Arenadata Hadoop (ADH) enterprise distribution is now available in different versions - a free free download and a corporate version with high availability functionality for a key component of the Namenode High Availability system to improve HDFS fault tolerance.

Arenadata Hadoop is a full-fledged open-source distribution based on Apache Hadoop, adapted for enterprise use and designed to store and process both structured and unstructured data.

In October, the release of ADH 2.1.2.4 was released:

  • separation of versions - for free download and paid advanced corporate version (with a number of own developments and additional advantages);
  • added high availability functionality for the key component of the Namenode High Availability system. This will increase the fault tolerance of the key component of the solution - HDFS. Available in the Enterprise release version.

Starting with ADH 2.1.2.3, the latest Arenadata Hadoop distribution is available in 2 versions - Community and Enterprise: Community is a fully functional version that can be freely used for both development and testing as well as industrial operation. The community version of Arenadata Hadoop is open for free download. Enterprise - contains advanced high availability functionality and, in the near future, information security.

Previously, if the master server fell from NameNode, it was necessary to manually restore the file system from SecondaryNamenode in order to regain access to the cluster and services. Now all actions on the feilover are performed automatically, and service is not interrupted. While these actions are being performed, the cluster and services were not available.

File:Aquote1.png
HighAvaliability for HDFS Namenode is an important requirement for many of our customers. The availability of a key component of the system will allow them to upgrade to a more advanced version of Hadoop 3.x, which reduces storage overhead by 50-200% and provides storage using redundancy codes to ensure fault tolerance. It is on the Hadoop 3.x version that the community's attention is concentrated, and more and more developments from global IT vendors are going in this version of the ecosystem. At the same time, the Arenadata Hadoop product itself has become more convenient to maintain - urgent administrator intervention is no longer required to restore health in case of failures,

- Arenadata Hadoop manager Aleksei Belozersky comments on the product
File:Aquote2.png

By the end of 2020, it is planned to introduce the Apache Ranger component, which will be responsible for importing users and groups from external sources and authorization in all components of the Hadoop distribution. This will allow you to create a single user authorization point, control access to data and audit access.

Where to download

Arenadata Hadoop (ADH) is a fully open-source Hadoop distribution. You can download the Community version of Arenadata Hadoop ADH 2.1.2.4 yourself, including learning the source code of the product.

Arenadata Hadoop 2.1.2

On February 25, 2020, Arenadata introduced a minor version of Arenadata Hadoop 2.1.2 (ADH 2.1.2), including two additional Airflow and Solr services, as well as YARN on GPU support for the use of video cards for computing.

Thanks to this, Arenadata Hadoop users will be able to run tasks inside Hadoop using GPU machines. A classic example is Spark. Inside it, you can write code that will use the kernel CUDA and fast memory of GPU machines.

File:Aquote1.png
Arenadata Hadoop version 2.1.2 will speed up Spark using graphics card hardware. At the same time, all the functionality of our product will remain. We also made several corrections and changes to the functioning of services in the updated version, added deeper checks of health check services, their integration checks of interaction with each other,
told Alexander Ermakov, technical director of Arenadata
File:Aquote2.png

2019: Arenadata Hadoop 2.1 distribution based on Hadoop 3

Arenadata presented to the market in October 2019 an update to the distribution of the Arenadata Hadoop distributed storage platform (ADH) - Arenadata Hadoop 2.1, including components such as Hadoop 3, Spark 2, Hive 3, YARN 3, HBase 2 and Phoenix 5.

Arenadata Hadoop 2.1 is filled with a number of new features. Among them:

  • the new Arenadata Cluster Manager management, deployment and monitoring system, which allows you to install and manage Hadoop services both on-premium and in the cloud;
  • Erasure Coding data recovery algorithm that can reduce disk system over utilization by up to 40% compared to classic HDFS replication
  • Hive 3 DBMS, which allows you to create relational tables, write data to them, use transactions and materialize views;
  • A new version of YARN Resource Manager 3 that allows you to control the allocation of multiple cluster resources between competing applications (YARN Federation) in this release.
  • Using a static range of ports for Hadoop services
  • Phoenix relational database to solve problems with streaming and reading data.



2018

Availability in Kazakhstan

On November 14, 2018, the company, a Arenadata Russian developer of a multi-purpose data platform, announced the availability Kazakhstan Russian of the Arenadata Hadoop (ADH) distribution kit on the market, as well as analytical massively parallel. DBMS Arenadata DB

File:Aquote1.png
Technical support for Arenadata solutions will be provided by DIS Group specialists, - said Kanat Abirov, CEO of DIS Group KZ. - In Russia, our specialists have already acquired experience with Arenadata products. At the same time, we have been working on the Kazakhstan market for many years and know the specifics of the region well.
File:Aquote2.png

Kazakh companies will be able to acquire Arenadata Hadoop and Arenadata BD through the DIS Group office Almaty in - DIS Group KZ. Industrial modules have also become available to expand the functionality of the Arenadata platform in the field of integration, data ensuring their quality, cataloging and independent work of the business user. Modules - based on company tools. Informatica

Arenadata Hadoop 1.5.2 Distribution

In the second quarter of 2018, the release of version 1.5.2 Arenadata Hadoop was released.

This version of the distribution includes the following components:

Unlike other corporate distributions on the market, Arenadata Hadoop has a number of features:

  • all support and direct expertise is available in Russia and in Russian;
  • there is a package of utilities for full offline installation (without access to the Internet);
  • the entire assembly is based on open source Apache projects, there are no proprietary components;
  • the Russian software;
  • support is available both remotely and on-site;
  • there is a set of standard batch services available for planning, installing, and auditing the system.

Arenadata Hadoop provides a full suite of capabilities and tools to automatically deploy components on both bare hardware and virtual machines (in the cloud). Cluster configuration monitoring and management tools optimize performance for all system components. Apache Ambari provides interfaces for integration with existing management systems such as Microsoft System Center and Teradata ViewPoint.

The original documentation in Russian makes it easier to plan and deploy the Hadoop cluster.

The Arenadata Hadoop distribution includes the latest current versions of all the most popular tools, some of which have been significantly improved, which guarantees the minimum number of software errors, the full existing functionality of each tool and the correct integration of tools with each other. In addition, Arenadata Hadoop includes tools for implementing enterprise security models (Apache Knox, Apache Ranger), managing data and cluster metadata (Apache Atlas), implementing ETL\ELT data streams (Apache Flink, Apache NiFi).

Components and versions of ADH 1.5.2:

  • Apache Ambari 2.6.1
  • Apache HDFS 2.8.1
  • Apache YARN 2.8.1
  • Apache MapReduce 2.8.1
  • Apache Zookeeper 3.4.10
  • Apache Tez 0.9.0
  • Apache Hive 2.3.0
  • Apache HBase 1.3.1
  • Apache Phoenix 4.11.0
  • Apache Pig 0.17.0
  • Apache Sqoop 1.4.6
  • Apache Flume 1.8.0
  • Apache Oozie 4.3.0
  • Apache Atlas 0.8.1
  • Apache NiFi 1.3.0
  • Apache Apex 3.6.0
  • Apache Flink 1.3.2
  • Apache Kafka 1.0.0
  • Apache Knox 0.12.0
  • Apache Mahout 0.13.0
  • Apache Ranger 0.7.1
  • Apache Ranger KMS 0.7.1
  • Apache Solr 6.6.0
  • Apache Spark 2.2.0
  • Apache Zeppelin 0.7.3
  • Apache Giraph 1.1.0
  • Apache Slider 0.92.0

Additional components included in the distribution:

  • Hue 3.11.0
  • Bigtop-groovy 2.4.10
  • Bigtop-jsvc 1.10.15
  • Bigtop-tomcat 6.0.45
  • Bigtop-utils 1.3.0
  • extjs 2.2
  • fping 3.10
  • grafana 4.3.1
  • libconfuse 2.7
  • lzo 2.06
  • lzo-devel 2.06
  • lzo-minilzo 2.06
  • mysql-connector-java 5.1.25
  • net-tools 2.0
  • numactl-libs 2.0.9
  • pdsh 2.3.1
  • perl-Crypt-DES 2.05
  • perl-Net-SNMP 6.0.1
  • rrdtool 1.4.8
  • rrdtool-devel 1.4.8
  • snappy 1.1.0
  • snappy-devel 1.1.0

2016: Open Data Platform Initiative Standards Certification

In 2016, the Arenadata Hadoop 1.3.2 distribution was certified and received confirmation of full compliance with the Open Data Platform Initiative (ODPi) standards. ODPi is a global community of storage project developers big data open source under the auspices of. Linux Foundation