Data storage infrastructure
The information storage industry promptly loses a conservative appearance of a segment over which winds of radical conversions are not imperious. The subject of capacity of drives and access rate to data in DPC recede slightly aside, giving way to questions of flexible and smart management. Article is included into the Technologies for DPC overview
Content |
Present time is time of a celebration of Big Data and online. Big and further the multiplied amounts of data plus speed of IT services at the level of click define the main requirements to advanced technologies of data storage in modern DPCs.
For solving of tasks of Internet of Things and future use of a 5G of networks gains popularity a new class of the server devices Edge, and for service of mobile applications more and more productive storage systems are created, - Vladimir Leonov, the technical director of AMT Group describes a situation. - Modern requirements of digital economy require reliable and safe data storage and huge computing powers. Besides, all this should be cost-efficient and managed. |
Often solutions are integrated to exact calculations of volumes of data storage, system performance of storage and its scaling. The problem was even more aggravated with the advent of Big Data in recent years. |
Economy of DWH of DPC
In 2019 sales of the hyper convergent systems grew by 25% and made nearly a third of sales of DWH. The industry of storage systems feels not bad, and here some producers of classical DWH are obviously nervous and try to manage to jump in the last car of the leaving train. |
In terms of storage economy, technologies of deduplication and a compression which continue to develop actively that is promoted by the increased computing power of the processors used in DWH give excellent effect.
Simple calculation shows that increase in compression ratio of data with 2:1 to 3:1 improves storage economy by one and a half times. And, by the way, it is one of areas in which DWH seriously wins against hyper convergent solutions, Pavel Karnaukh emphasizes. |
However, owners of DPC purchase, finally, not advanced technologies in itself, and the expected functionality, simplicity of management and flexibility of storage system. Therefore fight for reduction in cost of ownership of DWH passes across the line of a compromise between the capacity and performance on condition of reduction in cost of this compromise.
Alexander Sysoyev, the head of computing infrastructure of Croc company, lists some acceptances of achievement of a required compromise:
Not so long ago we used only hard drives, leveling a high-capacity cache the bottlenecks, for example, the common bus of data in an array which exhausted the opportunities the data transfer protocols back-end SCSI, SAS, etc. In practice increase in quantity of spindles for information processing fall forward was often used, use of faster disks of smaller volume – so tried to reach a compromise between the capacity and performance of systems. |
Today hard drives were succeeded by fast solid-state SSD which at cost practically were already equal to HDD. New standards of carriers (NVMe and SCM drives) are developed for solving of tasks which require very high performance, the architecture of arrays is optimized for work with new drives.
All this provides the excellent speed of execution of operations and low response time, - Alexander Sysoyev notes and notices that the context of DPC imposes additional requirements: Density of placement of systems, their power consumption also become important priorities, the volume of the stored information grows, and the place in DPC, unfortunately, is not present. |
For optimization of storage system traditionally use different types of disks. For example, on modern fast disks it is inexpedient and expensively to store archives. For this purpose there are cheaper distributed archive storages of large volume on SATA disks which perfectly solve these problems.
Company 3data uses other option of combination of disks of different types, providing service of hierarchical data storage (ArcTape). This approach means that, in addition to disks, for storage of corporate these clients tape libraries are used, and data are distributed depending on different parameters on these or those carriers.
It allows to reduce significantly data storage cost, and at the same time to increase reliability. The combination of disks and a tape is an optimal solution for reliable storage of any amounts of data, - Ilya Hala, the CEO of network of data centers 3data considers. |
However the head 3data does not observe excessive demand for this type of storage yet. He explains it with the fact that the mass customer yet not strongly thinks of efficiency and uses usual formats of storage.
We connect it, including with the fact that customers have not such large volumes of the saved-up data yet, - the expert believes. |
Probably the main battles on fields of support of Big Data in DWH still ahead.
Fight of drives: rest only dreams us
Economic trend such is that with a growth of volume the price of unit falls. Amounts of data increase every day by orders, and it is an incentive to reduction in cost of DWH. All industry, beginning from disks, finishing with storage systems, reduces the cost of gigabyte or terabyte from year to year because volumes grow, - Yury Novikov, the head of development of cloud computing of Softline explains influence of amounts of data on storage economy. |
According to the expert, one of bright trends – transition almost completely to SSD-carriers which at the price are compared to the spindle disks HDD.
Is most likely an irrevocable trend when DPCs pass all to the SSD formats which quicker and are better adapted for work in infrastructure of DPC, - Yury Novikov believes. |
However, not all leaders of the industry agree with this thesis. So, according to experts of Kingston Technology traditional magnetic drives do not fall at all into oblivion but only only endure the technology Renaissance: modern HDD offer the capacity of 16 Tb for data storage, and within the next five years capacity will reach 32 Tb.
At the same time drives on hard magnetic disk drives will still remain the most available storage of random access. Their superiority in specific cost for gigabyte of disk space will be indisputable still for many years Kingston Technology are sure. At the heart of this growth – a number of modern technologies:
- Helium drives (helium reduces aerodynamic drag and turbulence, allowing to install in the drive more magnetic plates without increase in heat production and energy consumption.
- Thermomagnetic drives (HAMR HDD) using the principle of microwave data writing when section of a disk heats up the laser and is remagnetized. Of which their emergence is expected next year.
- HDD based on "tile" record (SMR drives) in which placement of tracks with data has an appearance of a tile laying that provides the high density of data recording.
It is expected that helium drives will be especially demanded in cloud DPCs, and SMR HDD - for storage of big archives and other rather seldom used data.
The second direction of development of HDD - the hybrid solutions combining advantages of the classical hard drives HDD and non-volatile a flash memory of NAND. Such solutions are popular today, so a "mechanical" part is rather volume, but at the same time has low cost.
The Pure Storage company went to approach on this front, having provided the second generation of FlashArray//C disk arrays which it christened the murderer of the hybrid systems in August.
These arrays which are actively using QLC NAND are focused on scenarios of use in DPC of Tier 2. Capacity of one module reaches 24.7 Tb, the maximum general capacity in 9 units reached 1.8 PB, and due to deduplication and compression the final digit can be increased up to 5.2 PB. This year the company promises to provide modules of 49 of Tb using which it will be possible to receive 10.4 PB after a compression and deduplication. According to the company, the specific cost of data storage is at the same time less than 0.50 dollars for "net" gigabyte.
This proprietary solution Pure Storage uses not the standard disks SSD, but arrays like DirectFlash for which work own Purity OS operating system is responsible. Affirms that she monitors reliability of a system to within one crystal of NAND, provides telemetered informations of each block, and the array constantly "studies", using algorithms of predictive analytics.
New DWH it is much more smart than the predecessors. They are able to balance own loading, to predict problems in work up to actual rejection of a system. The software of systems uses artificial intelligence and algorithms of machine learning which manage utilization of resources. And new effective algorithms of deduplication and data compression allow to increase the actual capacity of DWH in two and more times, - Alexander Sysoyev comments. |
On the way of integration flash and OZU
High-speed storage devices of the class SCM (Storage Class Memory) use a non-volatile memory of NVDIMM. It implements the hybrid model of storage integrating advantages of operational DRAM and a flash memory of NAND. SCM devices are quite often called a revolutionary step forward in development of architecture of calculations.
Really, the module of non-volatile storage of NVDIMM is installed in the standard DIMM connector intended for normal OZU that provides direct access of the multi-core processor to data with the minimum delays with low overhead costs of an input-output in comparison with traditional channel PCIe. Data warehouses based on such modules provide performance to 5 million IOPS, at the same time the latency decreases to 200 microsec. According to SNIA association (Storage Networking Industry Association), application of NVDIMM allows to increase performance (IOPS) by 34 times, bandwidth by 16 times, to lower delays by 81 times, in comparison with standard a flash devices. At the same time memory of SCM can play a role, both RAM, and the solid state drive: having performance at the level of DRAM, it is non-volatile that allows to use effectively it as the buffer for SSD devices.
The main obstacle for revolution – the high cost of SCM – devices. So, the Intel Optane SCM device with a capacity of 512 GB costs about 8 thousand dollars today. However experts of the market expect fast reduction of prices thanks to activity of powerful competitors, first of all, of Micron and South Korean giant SK Hynix.
In the Russian DWH of Trinity FlexApp of Trinity company persistent memory based on non-volatile NVDIMM-N modules of Micron company is used.
Persistent memory (PMEM, Persistent Memory) integrates advantages of traditional data storage devices and big capacity of memory of DRAM. The important PMEM property is byte addressing (load/store) which unlike traditional "blochnik", functions with a speed of DRAM. After emergency power off the modules NVDIMM-N within one minute independently force out data from DRAM in NAND. After completion of data transfer modules can be taken from the faulty controller and are placed in operational, like normal DIMM modules.
The company notes that DWH of Trinity FlexApp under control of the Russian software of RAIDIX can be used in the form of the "boxed" solution of a high-performance cluster (Cluster-in-a-box). Support of heterogeneous clusters in the Active-Active mode is provided that allows to scale vertically a system without interruption of data access and with fast replacement of controllers by more modern and productive. This solution is used, for example, for support of high-performance computing in the largest scientific cluster in Japan. Trinity notes that the software-defined RAIDIX technology provides not reduced speed of calculations (to 25 GB / with on a processor core) and fault tolerance at the RAID levels: 6, 7.3, N+M.
This almost compulsory provision if you seriously want to compete with the hyper convergent systems, - the expert notes. |
NVMe time
As of 2020 the share of solid state drives with the NVMe interface exceeded a half from the total number of SSD in the market. Serious falling of the prices of NVMe SSD became result of the growing popularity. New prospects are offered with the advent of NVMe SSD with support of PCIe 4.0 that will give a speed gain at data processing in DPC. Kingston and Samsung speak about the speed of 4.8 GB / with, and in the next generation of Kingston NVMe SSD PCIe gen 4.0 capacity at the level of 7 G / is promised page.
At the end of August, 2020 the Samsung company announced fast start of sales of 980 Pro, the first consumer SSD drive with support of the PCIe 4.0 standard which will support reading speed - to 7000 Mbps, record — 5000 Mbps on condition of support by the motherboard of the PCIe 4.0 interface. Three options of amount of memory are expected: 250 GB, 500 GB, 1 TB.
The expert of Network World Andy Patrizio says that for the industry of DPC emergence of NVMe over Fabrics technology (NVMeoF) became a crucial step: the SSD devices which are in one server had an opportunity to exchange data with any other drives connected to network. The organizations will be able to create high-performance storage area networks with the minimum delays. It will make the serious competition of DPC to direct connection of DAS (or Direct-attached storage) as transactions of input-output using NVMe-oF are processed much more effectively, and the delay is comparable with DAS systems. Analysts foretell that deployment of the systems working under the NVMe-oF protocol promptly will accelerate in 2020.
Yes, the corporate system can not support NVMEoF or Storage Class Memory right now "from a box", but when in 2-3 years the customer ripens before use these technologies, it should have an opportunity to make it, - Pavel Karnaukh comments on Dell Technologies in Russia. |
Software-defined storage
In line with the general trend on "software-defined everything" (Software-Defined Everything, SDE) is the direction of software-defined networks of storage (Software-Defined Storage, SDS). The key idea of SDS consists in separation of the hardware from program using virtualization of storage functions of data.
Due to emergence of program systems of data storage process of selection of the most effective solution taking into account almost each specific functional requirement of the customer considerably becomes simpler, - Dmitry Chindyaskin from AiTeco emphasizes. |
According to a market research which conducted in 2018, IHS Markit company, 53% of respondents intend to increase investments into software-defined storages, 52% – in NAS and 42% – in SSD.
As the expert notes, at the moment there is rather large number, both universal, and highly specialized SDS solutions, i.e. a software-defined storage which often already make the serious competition to hardware solutions of world leaders.
SDS is already rather "adult" technology. Approach of SDS allows to use normal equipment of architecture of x86 for creation of storage systems, even in the geodistributed execution with several platforms which were geographically remote from each other, - Dmitry Chindyaskin emphasizes. |
The SDS technology allows to build the budget distributed storages with a large supply of a possibility of upgrade even in 5-7 years, - Alexey Malyshev, the founder and the CEO confirms SONET. |
However, in the Russian market application of SDS is still more popular with small business, than with the large companies. Comparing data on Russia with global estimates of IHS Markit, experts of Xelent noted that large domestic enterprises use SDS, generally as a development environment and testing. By Xelent estimates made in 2018, investments into software-defined storages in comparison with other solutions will make no more than 15% of the total amount of purchases.
Race of arms of cloud storage and DPC
Requirements to storage system at the owner of own and commercial DPC differ a little. One of the lines of demarcation passes by the parameter of cost of DWH – at commercial owners it higher as he is interested in higher margin at the subsequent resale of resources of storage to buyers. Fight in the financial plane under which flags cloud providers won the market Alexey Kazmin, the product manager of Hewlett Packard Enterprise in Russia, compared to a race of arms.
The competition under the slogan of decrease in specific cost of data storage imposed in "a race of arms" of global cloud providers and "traditional" DPCs led today to the fact that own IT infrastructures can provide similar economy, in comparison with clouds, - Alexey Kazmin says and gives examples: For example, the global knowledge base of HPE InfoSight helps to prevent complex problems on a joint of different components of DPC and gives effective advice how to increase performance and to cope with growth of data. It helps to reduce idle times more precisely to plan resources, so, to save. Guarantees of vendor, such as guarantee of 100% of data availability for DWH of HPE Primera, serve the same purpose – to ensure smooth operation of services, so not to allow losses for business. |
His colleague Pavel Karnaukh from Dell Technologies in Russia also notes, listing the most significant technologies for DWH of the near future:
Simplicity of installation, setup, operation and service. There is a mass of opportunities for innovations – from the cloud systems of predictive analytics accumulating and processing information from thousands of installed systems before use of artificial intelligence for automation of routine tasks. |
Storage with support of cloud environments
Relevant trend of technology development of this segment - the integrated storages of hybrid clouds. According to a research of the market of storage systems which the Hitachi company published in August of last year, the world market of cloud storages will grow by 2024 to 106 bln. dollars and will show annual growth at the level of 23.76%. At the same time services of the managed data processing center will annually grow for 17.67%.
The idea of use of hybrid clouds in general looks very perspective for the industry. As for storage, the format of a hybrid cloud allows to integrate local and public cloud environments to provide seamless mobility during the work with corporate data, to receive high flexibility and scalability of resources of storage. However to integrate "the best of two worlds" in practice it appeared not simply as local and cloud storages work, in fact, at different speeds. Let's tell, mechanisms of deduplication, thin selection of the resources and compact instant pictures providing high efficiency of data processing on the local platform, it is impossible just to copy in a cloud environment. The cloud structure is more convenient local, and local is more productive and it is safer cloud, corporate customers of solutions usually reflect, weighing for and against each approach.
The main problems with any hybrid and cloud systems are a gap in tseleustanovka between public and private "shoulders"", - Pyotr Dubenskov, the director of the department of product development and production of Rubytech company explains. |
On the one hand, the customer selects private model of deployment, generally for safety reasons, according to requirements of regulators, upon readiness for capital costs and predictability of variable costs. On the other hand, if he passes to public clouds, then expresses readiness for restrictions in connection with "not ownership" of infrastructure in exchange for elasticity and rather high variable costs.
Being hybridized, the customer can lose completely advantages of one of models, without finding advantages another, - Pyotr Dubenskov reflects. - At the same time at hybrid model manifestations of some attractive technical and economic moments are possible. |
He explains that they are especially shown where the public cloud provider offers the low price due to economy at a scale, for example, for "cold storage" of backup copies or for spot virtual machines.
But in general we do not see noticeable demand for hybrid solutions in our market yet, - the expert summarizes. |
Waiting for the future demand the leading vendors of DWH develop own means of uniform management of hybrid environments. For example, the solution HPE GreenLake Central allows flexibly and even to automatically determine from where to take resources for storage – from the infrastructure or from a cloud, using at decision-making of policy of performance, data protection and cost. And such additions to public clouds as HPE Cloud Volumes, allow to store data so that they were available to the services placed in clouds, but without shortcomings of the last – without payment for extraction of data and their movement between regions.
The solution problems of creation of hybrid storage was proposed in August by the company Pure, having created the joint solution of hybrid storage optimized for Amazon the Web Services platform (AWS). The idea is in creating uniform architecture of storage which will integrate processes of deployment of applications in the local environment and in a cloud so that data made the uniform space equal available from different platforms.
Pure Storage Cloud Data Services is a set of services for the data located, both in local, and in a cloud environment, including, in particular such opportunities as:
- Cloud Block Store for AWS is the storage system with block access for commercial operation optimized for work in a cloud of AWS.
- CloudSnap for AWS is the cloud data protection which is built in directly the Pure FlashArray disk array. Service gives the chance to send snapshots with FlashArray to the data warehouse of S3 of the AWS platform, and it provides flexible data recovery, as in a cloud, and locally.
- StorReduce is the cloud technology of deduplication developed for speed cloud backup in the data warehouse of AWS S3 together with local a flash array for fast recovery.
The mechanism of deduplication of object storage implemented in StorReduce allows the organizations to replace specialized backup devices with a flash arrays and to use object storage Amazon S3 for data storage outside the platform. The company calls this architecture "from a flash-to-flash-to-cloud" and consider it as new generation of technologies of backup and recovery after failures with use a flash DWH of a public cloud.
Storage as service
The modern term "Data Storage as Services" (Storage sa a Service, STaaS) means use of storage resources on local resources in style of a public cloud.
So, the Pure Storage company offers family of the services implementing the concept of STaaS including:
- Block service of providing DWH: highly productive or high-capacity.
- Storage of objects on local resources as service with monitoring of capacity.
- File service on local resources.
Services include management of a subscription and storage using one tool kit, including analytics of the VM virtual machines.
"Storage as service" from Hitachi Vantara is based on the idea of a public cloud which completely is controlled Hitachi with support of end-to-end change management and requests, with selection of resources of a host and also permission of incidents from a host to DWH. From the point of view of the client, STaaS includes agreements on the service level. monitoring, reporting and management of architecture. At the same time all responsibility for selection of resources of storage, change management, remains requests and configurations in the organization. Elastic resource consumption of storage for availability as required is provided – such mode well is suitable for modern problems of analytics in real time and scalings of Internet of Things.
Object storage
Separate call of an era of Big Data – storage and processing of unstructured data.
Hitachi Vantara offers structure of the ecosystem of solutions for work with given object cloud storage system of Hitachi Content Platform which allows to structure unstructured data and to work effectively with them. Its opportunities are supplemented by the gateway between users and the systems of cloud storage, means of universal remote access to data of HPC Anywhere and also the platform of an intellectual research of data, as for the purpose of their preparation for analytical processing, and for monitoring of performance of storage systems.
The service "storage as service" by IBM Cloud Object Storage is also focused on storage of large volumes of unstructured data which can be placed, both on the local systems, and in public and private clouds. In this system IBM applied own SecureSlice technology which combines enciphering and methods of excess coding (Erasure Coding) which effectively resolve issues, so availability, and data security.
When data come to the IBM Cloud Object Storage system, first of all each data segment is automatically ciphered, and then they are divided into fragments using the mechanism of excess coding and are distributed in a system. Afterwards the ciphered data can be decoded and repeatedly collected using SecureSlice in several public and private endpoints of IBM Cloud which can be located in DPCs, even other than where data were originally obtained or processed. The company notes that such technology does not require expensive replication of data that increases profitability of service for the customer.
Elastic calculations and storage
The elastic storage is a concept which reflects the most essential modern requirement to corporate systems of storage, - providing resources for storage and data processing "on demand".
For example, in understanding of Oracle elastic calculations and storage give the chance to clients to execute any workloads in a cloud, using virtual environments, applications, the connected configurations and management tools by them. So, the Oracle Compute Cloud Service implements functionality of scalable calculations, block data storage and network services within cloud services of Oracle Cloud. At the same time it is possible to select between multirent elastic calculations or the selected service for calculations. In the latter case computing environment with the isolated computing resources is provided. Thanks to network isolation all computing resources are at the complete disposal of the client.
The Russian company "Arsientek" implemented Resilient Cloud Storage — the elastic distributed DWH. The property of elasticity is implemented using own mechanism of information storage: data break into objects of fixed size which are evenly distributed on all carriers in a bullet in proportion to their capacity. At the same time each part of information (object) is stored, at least, in three copies. Using rules of the choice of locations of copies is defined, so-called, area of fault tolerance at which failure DWH saves complete functionality. One carrier, the server, a rack and also a number of racks or DPC in general can become such area.
Resilient Cloud Storage has a horizontally scalable architecture with ample opportunities: a small system from several modules can be turned rather easily into a system with several thousand modules of storage totaling several hundred petabyte.
It is interesting that DWH of Resilient Cloud Storage does not contain the traditional centralized controllers, all interactions are performed directly with storage modules. The modular architecture and uniform distribution of data on modules allows to create a system with mass parallelism in which input-output operations are executed in distributed environment. Thus, building of modules of storage leads not only to increase in capacity of DWH, but also linear performance improvement of a system in general.
Thus, modern storage systems lose usual outlines, turning into a pool of the resources placed on different cloud environments and territorial and spaced local platforms. The concept of tough configuring of resources gradually loses sense – it is succeeded by extremely flexible concept of selection of resources on demand which goes hand in hand with intellectual technologies of data management.