[an error occurred while processing the directive]
RSS
Логотип
Баннер в шапке 1
Баннер в шапке 2
2020/04/23 15:31:57

The storage system is DWH

The Storage System (SS)  is a conglomerate of the specialized equipment and the software which is intended for storage and transfer of big arrays of information. Allows to organize information storage on disk platforms with optimal resource allocation.

The directory of solutions DWH and projects is available on TAdviser.

Content

"Physics" of storage

Perhaps, the most fascinating part of computer history is a chronicle of the storage systems (SS) because in this area there was a big variety both in physics, and in the system organization, besides many years here everything was very evident. Computers lost visual attractiveness soon enough, to replace beautiful and various vacuum lamps and separate semiconductor components (triodes and diodes) identical integrated circuits and microprocessors came. Now we can distinguish something on texts, put in the body of the different sizes differing with the number of contacts. The physics of semiconductor innovations finally comes down to search of the scientific and technological solutions providing increase in density of transistors on a substrate. These major achievements have no appearance and for the consumer come down to digits 0.18, 0.13, 0.11 … However, today the same can be told and about disks – externally it is the boxes of several standard sizes differing with contents.

In 60–70 years of DWH evolved from the simplest cards and tapes with holes used for storage of programs and data to drives on a solid body. On this way the set unlike the friend on the friend of devices was created is and magnetic tapes, both drums, and disks, and optical disks. Part of them remained in the past: these are perforated carriers, magnetic drums, flexible (flopp) disks and optical disks, and others live and will long live. What left today can be looked and ponostalgirovat in the museum of the low Museum Of Obsolete Media media technologies. And at the same time, it would seem fateful, remains. In due time predicted the end to magnetic tapes, however today nothing prevents their existence, precisely the same treats also the hard rotating drives (HDD), prophecies on their end are deprived of any basis, they reached such level of perfection that behind them their own niche will remain, without looking on any innovations.

At a present multilevel landscape of DWH there are tape libraries for backup and archiving, the fast and slow disks HDD, the solid-state disks SSD on a flash memory imitating (interfaces, a form factor) under HDD first of all for review with the existing software and constructs and also the latest flash drives at a format of the cards connected via NVMe interface. This picture developed under the influence of several factors among which there is John von Neumann's scheme which divides the memory into operational directly available to the processor, and secondary, intended for data storage. This division became stronger after succeeded the ferrite memory saving the current status the semiconductor, requiring loadings of programs to start work. And of course the specific cost of storage, the quicker the device influences, the this cost is higher therefore in the near future there will be a place both for tapes, and for disks. In more detail about evolution of DWH read here.

As stored data earlier

The data media using perforation

Punched cards

Before emergence of computers throughout centuries in the simplest program-controlled devices (weaving looms, street organs, hours carillons) used perforated carriers of the most different formats and the sizes and drums with pins. Saving this principle of record, Herman Hollerith, the founder of TMC company, after entered into IBM, made discovery., in 1890 he realized as it is possible to use punched cards for record and data processing. He implemented this idea when processing of the statistical data obtained during the population census, and transferred it and to other applications later, than provided wellbeing of IBM for the decades ahead.

Why cards? They can be sorted and to them "direct access" can be provided, figuratively speaking, on the special device tabulator, following the simple program, to partially automate data processing.

The format of cards changed, and from 20th years 80-columned maps became the international standard. The monopoly for them prior to the beginning of the 60th belonged to IBM.

These simple pieces of cardboard with rectangular openings remained the dominating data medium for several decades, they were made by billions. About volumes of consumption of cards it is possible to judge at least by one example of the Center of interpretation of the German radiograms in Blechli Parke: week of work – 2 million cards, it the average size the truck! Post-war business was based stored data on cards too. Speaking about punched cards, it is necessary to remember that they were used in Germany for collection of data on the people who are liable to destruction.

Punched tapes

It would seem, punched tapes are more practical carriers, but in business they were practically not used though devices for input and output were significantly simpler and easier. Their distribution was interfered by sequential access, smaller capacity and low input speeds and an output, complexity of archiving. Narrow 5-columned punched tapes since 1857 used for preparation and the subsequent data transmission on telegraph not to limit input speed to physical capacities of the operator and by that it is better to use capacity of the channel. Wide 24-columned punched tapes were created for record of programs in the electromechanical Harvard Mark I calculator in 1937. As the carrier not subject to influence of a miscellaneous electromagnetic and gamma studying, punched tapes were widely used as onboard devices, they are still used in some defense systems.

In more detail about evolution of DWH read here.

Magnetic tapes

The sound writing method on the bobbin magnetic carrier, at first on a wire was offered in 1928. The tape recorder of this kind was used in UNIVAC-1. IBM Model 726 which was a part of the IBM Model 701 computer is considered the beginning of history of computer magnetic tapes. Tape width for IBM Model 726 and other devices of that time was equal to one inch, but such tapes were inconvenient in operation. Because of their big weight powerful drives therefore soon they were succeeded by semi-inch "open tapes" (open reel) in which rewind was performed from one reel on other (reel-to-reel) were required. They had three recording densities 800, 1600 and 6250. Such films with removable rings for write protection became the standard for archiving of data until the end of the 80th years.

Model 726 used coils from a film, respectively width of a tape it was equal to one inch, and diameter of the reel – to 12 inches. Model 726 was capable to save 1.4 MB of data, density of 9-dorozhechny record was 800 bits per inch; at the movement of a tape with a speed of 75 inches per second 7500 bytes per second were transferred to the computer. The magnetic tape was developed for Model 726 by company 3M (now Imation).

Soon enough refused inch tapes, because of their weight during the work in the start-stop mode too powerful drives and vacuum columns were required, and for the long period almost exclusive domination of semi-inch "open tapes" (open reel) in which rewind was performed from one reel on other (reel-to-reel) was established. Recording density increased from 800 to 1600 and even 6250 bits per inch. These films with removable rings for write protection were popular on EU computers and CM of a computer. of semi-inch "open tapes" (open reel) in which rewind was performed from one reel on other (reel-to-reel). Recording density increased from 800 to 1600 and even 6250 bits per inch. These films with removable rings for write protection were popular on EU computers and CM of a computer.

Incentive to further development was the fact that in the mid-eighties tanks of hard drives began to be measured by hundreds of megabytes or even gigabytes therefore drives of reservation, the corresponding capacity were necessary for them. Inconveniences of open tapes were clear, even in life cassette tape recorders quickly forced out bobbin. Natural transition to cartridges happened in two ways: one – to create the specialized devices focused on computers (on linear technology): the second – to address the technologies invented for a video and an audio recording with the rotating heads (on screw technology). Since then there was separation into two camps which gives to the market of drives unique specifics.

In thirty years several tens of standards of cartridges, the LTO standard (Linear Tape-Open) which is most distributed today in the course of which cartridges were improved were developed, their reliability, capacity, transmission rate and other utilization properties increased. The modern cartridge is the difficult device supplied with the processor and a flash memory.

Transition to cartridges was promoted by the fact that now tapes work only in streaming mode. Cartridges are used or in self-contained units, or as a part of tape libraries. The first the robotic library on 6 thousand cartridges was released by StorageTek company in 1987.

Analysts and producers of disks predicted to tapes death more than once. The slogan "Tapes must die" is known, but they are alive and will long live because are expected long-term storage of big archives. The amount of the business connected with production of tape transports tapes and tape libraries in 2017 was estimated approximately at $5 billion. And the more there are information volumes which can be saved on hard drives, the need for archiving and creation of backup copies is more. On what? Certainly, on tapes: an alternative to magnetic tapes, economically justified at cost of storage, it is not found yet. The present 8th generation of the LTO standard allows to save regularly up to 12 Tb, and these digits will increase in the compressed Tb mode 30, perspective much and more, at alternation of generations not only quantitative indices, but also other utilization properties raise.

In more detail about evolution of DWH read here.

Magnetic drum

Time method for resolution of conflicts between technology of consecutive tape recording and a necessity of direct access to data on the external device the magnetic drum, more precisely a cylinder with stationary heads became. It was invented by Austrian Gustav Tutschek in 1932

Not the drum at which, as we know, as a working surface serves the bottom, and a cylinder with the ferrimagnitny covering applied on its lateral surface separated into tracks is magnetic, and they, in turn, are divided into sectors. Over each of tracks own head of reading/record is placed, and all heads can work at the same time, i.e. transactions of reading/record are performed in the parallel mode.

Drums were used not only as the peripheral device. Before transition to ferrite cores RAM was extremely expensive and unreliable therefore in some cases drums played a RAM role, there were even computers which were called drum. Usually magnetic drums were used for operational (often changeable) or important information to which quick access was necessary. In the conditions of restrictions for the extent of RAM because of its high cost on them the copy of the operating system was stored, intermediate results of accomplishment of programs registered. On drums the procedure of swapping representing virtualization of memory at the expense of space on a drum, and later and on a disk was for the first time implemented.

Drives on magnetic drums had capacity less, than disks, but worked quicker because unlike disks of a head in them are not mobile that excludes time required for a supply to the necessary track.

Drums were actively used up to the beginning of the 80th years, some time they lived in parallel with disks. Drums completed a computer of a high-speed electronic computer 6 and her contemporaries. From open sources it is known that the last drums staid in Minitmen missile control systems to the middle of the 90th years.

In more detail about evolution of DWH read here.

Floppy disks

Active life of flexible (floppy) disks lasted for 30 years since the end of the seventieth until the end of the ninetieth. They turned out extremely demanded in communication by the fact that PCs appeared earlier, than users had a possibility of data transmission on network. Conditions of the floppiki served in these not only for the intended purpose for storage of backup copies, but, perhaps, more for data exchange between users, that is why they are called still by sneaker as sneakers, typical footwear of programmers. Exchanging floppika, they created some kind of network – sneakernet.

There were 3 main types of disks and a set of different modifications. Flopii-diski with a diameter of 8 inches were created in 1967 in IBM, they thought as the device of initial loading (bootstrap) for mainframes IBM/370 on replacement of more expensive fixed memory (non-volatile read-only memory), it completed the previous generation of IBM/360. However, having realized the commercial value of innovation, in 1971 IBM turned flopp into an independent product, and in 1973 the head of development Alan Shugart created the Shugart Associates company which became to leaders of producers of 8 inch disks with the maximum capacity of 1.2 MB. These big disks used on the PCs which were issued before emergence of IBM XT. This type of diskettes received special popularity thanks to the CP/M operating system of Harry Kildal.

As for diskettes with a diameter of 5.25 inches, their emergence reminds a joke of Nicholas II who dolno peculiar explains the increased width of the Russian railway track in comparison with European. In our case of En Vang, the owner of Wang Laboratories company, met in bar natives of Shugart Associates who suggested to make cheaper disk drive for its computers, but they could not be solved on specific diameter. Then Vang took a cocktail napkin and told that it seems to it that the size should be such. Five-inch disks with a capacity of 360 and 720 KB released until the end of the ninetieth years, they were contemporaries of IBM XT and IBM AT computers, the MS-DOS and DR-DOS operating systems, truly serving formation of the new industry.

The alternative cartridge offered in 1983 to Sony had the size of 90.0 mm × 94.0 mm, but it by tradition began to be called 3.5 inch. In the American professional environment he is called stiff (stiffy disk, at transfer it is worth looking in the dictionary). After a number of improvements in 1987 the industry standard 3.5-inch by HD (High Density) with a capacity of 1.44 MB was adopted. First completed with such disks IBM PS/2 and Macintosh IIx, and later it became the universal standard for PC and Macintosh. Disks of bigger capacity Extended Density (ED) of 2.88 MB and also magnetooptical Floptical disk of 25 MB, SuperDisk of 120-240 MB and HiFD seeming perspective 150-240 MB of market success did not make attempt to make in the second half of the ninetieth.

In more detail about evolution of DWH read here.

Why there was a need for DWH

Follows from the conducted research IDC Perspectives that data storage takes the second place among expenses on IT and makes about 23% of all expenses. According to The InfoPro, Wave 11 "the gain of expenses on DWH in medium-sized company of Fortune of 1000 exceeds 50% a year".

In the general opinion of analysts, in the organizations worldwide volumes of the stored and processed information every minute increase. Unique information becomes more expensive, its amount increases every year repeatedly, and its storage requires costs. So the organizations aim not only to create development of infrastructure of data storage, but also to find possibilities of improvement and increase in cost efficiency of DWH: decrease in energy consumption, expenses on service, total ownership cost and purchase of backup systems and storage.

Growth of amounts of data, the increased requirements to reliability of storage and high-speed performance of data access do necessary assignment of storage in a separate subsystem of the computer system (CS). The accessibility to data and managements of them is a necessary condition for accomplishment of business processes. The irretrievable loss of data subjects business of serious danger. The lost computing resources can be recovered, and the lost data, in the absence of competently designed and implemented restoration system, are not subject to recovery any more.

There is a noticeable development of need not only for acquisition of DWH by corporate clients, but also in strict accounting, audit and monitoring of use of expensive resources. There is nothing worse than a stop of business processes because of impossibility to timely obtain necessary data (or their complete loss), and it can cause irreversible effects.

The factors contributing to the development of DWH

Growth of the competition and complication of its character in all segments of the market was a pacing factor. In Western Europe these phenomena could be observed earlier, and in Eastern Europe — in the last five years. Five years ago the mobile operator had 25-25 million registered SIM cards, and today — 50-70 million. Thus, mobile communication from these companies provides practically each resident of the country, and there are still regional operators. Here actual level of the competition: in the market there is no nobody left who would not have the mobile phone. And now operators cannot extensively grow due to sale of the products by that who has no similar products yet. They need clients who work with competitors, and it is necessary to understand how to get them. It is necessary to understand their behavior, what they want. To take useful information from available data, it is necessary place them in storage[1].

One more factor — emergence in the market of a set of the companies which propose the solutions for support of business of the enterprises: ERP, billing systems, decision making support systems, etc. All of them allow to collect detailed data of the most different character in huge volumes. In the presence in the organization of the developed IT infrastructure these data it is possible to gather and analyze them.

The following factor — technology character. Till some time producers of applications independently developed different versions of the solutions for different server platforms or proposed open solutions. Creation of adaptable platforms for the solution of different analytical tasks which include a hardware component and DBMS became a technology trend, important for the industry. Users does not concern any more who made the processor or RAM for their computer — they consider the data warehouse as a certain service. And it is the major shift in consciousness.

Technologies which allow to use data warehouses for optimization of operational business processes practically in real time not only for highly skilled analysts and top managers, but also for employees of the front office, in particular for the staff of sales offices and contact centers. Decision making is delegated to the employees standing on lower steps of a corporate ladder. Reports necessary for them are, as a rule, simple and short, but them it is required much, and time of formation should be small.

Scopes of DWH

Traditional data warehouses can be met everywhere. They are intended for reporting, helping to deal with what happened in the company. However it is the first step, basis.

To people there is not enough nobility that occurred, they want to understand why it happened. Business intelligence tools which help to understand that tell data are for this purpose used.

After this use of the past for prediction of the future, creation of predictive models comes: what clients will remain what will leave; what products are waited by success and what will be unsuccessful, etc.

Some organizations already are at a stage when data warehouses begin to use for understanding of what occurs in business in nastoshchy time. Therefore the next step is "activation" of the frontal systems by means of the solutions based on data analysis, often in the automatic mode.

Volumes of digital information grow as avalanche. In the corporate sector this growth is caused, on the one hand, by toughening of regulation and the requirement to save more and more information, belonging to business. On the other hand, toughening of the competition requires more and more exact and detailed information on the market, clients, their preferences, orders, actions of competitors etc.[2].

In public sector growth of volumes of the stored data supports universal transition to interdepartmental electronic document management and creation of departmental analytical resources which basis are various primary data.

Not less powerful wave is created also by normal users who post on the Internet the photos, videos and actively exchange a multimedia content on social networks.

Requirements to DWH

What selection term of disk DWH is more important for you? A survey result on the website www.timcompany.ru, February, 2012

TIM group in 2008 conducted survey among clients with the purpose to find out what characteristics are most important for them when choosing DWH[3]. On the first positions there were a quality and functionality of proposed solution. At the same time calculation of total cost of ownership for the Russian consumer the phenomenon atypical. Customers most often not up to the end realize what them expect costs, for example, rent costs and equipment of the premises, the electric power, conditioning, training and the salary of skilled staff and so forth.

When there is a need to purchase DWH, at most, that is estimated for himself by the buyer, these are the direct costs passing through accounting on acquisition of this equipment. However, the price on degree of importance appeared on the ninth place from ten. Certainly, customers consider the possible difficulties connected with service of the equipment. Usually packets of expanded warranty support which usually offer in projects help to avoid them.

Practice of AvroRAID company shows that a number of problems pushes to purchase of the new DWH or updating existing consumers.

Reasons of acquisition of DWH

File:AvroRAID.png

Source: AvroRAID, 2010.

What DWH consists of

As a rule, the storage system contains the following subsystems and components:

DWH often assumes installation in a standard 19-inch cabinet and contains hard drives, external interfaces for connection of hosts (servers) and several power supplies. Inside processor blocks, controllers of disks, input/output ports, a cache memory and other necessary components are located.

File:structure.gif

Example of the skeleton diagram of DWH. 1. The controller including the central processor (or a little), interfaces for switching with hard drives and external ports, a cache memory. 2. External interface, in this case Fibre Channel. 3. Hard drives – capacity extends additional regiments. 4. The cache memory is usually mirrored not to lose data at failure of one of modules.

The disks which are available in a system can be broken into groups and to integrate in RAID of different levels. The turned-out disk is divided into logic blocks (LUN) – hosts get access to them and "see" them as local hard drives. The number of RAID-group, LUN ов, logic of work of a cache, availability LUN ов to specific servers is configured by the administrator.

Disk arrays

In the late nineties successfully two innovations – scientific base RAID and winchesters released in the mass edition met. If to gather them, it was possible to create the commercial drive of cluster type capable to compete with disks of IBM on technical performances at significantly smaller price.

Giantism of the disks which were issued before emergence of winchesters contradicted plain logic of these devices. Their logic was primitive, it almost completely corresponded to physical infrastructure (sectors and tracks). And, as limited and specialized products, they were expensive.

In 1988 Michael Ryuttgers who became further the main strategist of EMC suggested to develop the disk system consisting of winchesters and to deliver them for mainframes, IBM-compatible, and for AS/400. Other, perhaps, most successful specialist in DWH Moshe Yanai put forward ideology a cache memory of Integrated Cached Disk Array (ICDA), the primogenitor of disk clusters of EMC Symmetrix as a result was born.

In the fall of 1990 when EMC provided Symmetrix which became a legend of disk arrays, model 4200 ICDA had the volume of 24 Gbytes, a cache memory of 256 MB and the controller based on the 32-bit processor. Symmetrix for several years brought the company to a position of the leading supplier of drives for mainframes. According to IDC, its share in the market of drives for mainframes increased with 1% (in 1990) up to 42.5% (in 1996).

Symmetrix was cheap for mainframes, but is too expensive to Unix servers and the more so for h86 servers therefore many companies jerked in the opened market segment, they offered the products which are of a lower quality Symmetrix, but not so expensive. In the subsequent in the market the set of models of disk arrays most of different function appeared.

Key requirements to DWH

In practice not one server, but many tens and hundreds is connected to DWH. It dictates a number of key requirements to systems such[4]:

Reliability and fault tolerance. Full or partial redundancy of all a component – power supplies, access paths, processor modules, disks, a cache, etc. is provided in DWH. Existence of a monitoring system and the notification about the possible and existing problems is obligatory.

Data availability. It is provided with the thought-over functions of preserving of integrity of data (use of RAID technology, creation of complete and instant copies of data in a disk rack, a replitsirovaniye of data on remote DWH, etc.) and a possibility of adding (updating) of the equipment and the software in the hot mode without stopping of a complex;

Management tool and control. Control of DWH is exercised via the web interface or the command line, there are functions of monitoring and several versions of the notification of the administrator about malfunctions. Hardware technologies of diagnostics of performance are available.

Performance. The cache memory, the computing power of a processor subsystem, number and type of internal and external interfaces and also opportunities of flexible configuration and configuring is defined by number and type of drives, amount.

Scalability. At DWH usually there is a possibility of building of number of hard drives, amount a cache memory, hardware upgrade and expansion of functionality using special software. All listed transactions make without considerable reconfiguring and losses of functionality that allows to save and to approach flexibly design of IT infrastructure.

Types of DWH

Disk DWH

Use for operational work with data and also for creation of intermediate backup copies.

There are following types of disk DWH[5]:

  • DWH for working data (the high-performance equipment);
  • DWH for backup copies (disk libraries);
  • DWH for long-term storage of archives (the system of the CAS).

Tape DWH

Are intended for creation of backup copies and archives.

There are following types of tape DWH:

  • separate drives;
  • automatic loaders (one drive and several slots for tapes);
  • tape libraries (more than one drive, a set of slots for tapes).

Options of connections of DWH

For connection of devices and hard drives in one storage different internal interfaces are used:

The most widespread external interfaces of connection of DWH:

The popular interface of internodal cluster interaction Infiniband is also used now for access to DWH.

Options of topology of DWH

Traditional approach to data warehouses consists in direct connection of servers to storage system of Direct Attached Storage, DAS (Direct Attached Storage). In addition to Direct Attached Storage, DAS, the data storage devices connected to network — NAS (Network Attached Storage), an also components of storage area networks — SAN (Storage Area Networks) exist. Both NAS-, and SAN systems appeared as an alternative to architecture of Direct Attached Storage, DAS. And each solution was developed as the answer to the growing requirements to storage systems and was based on use technologies available at that time.

Architecture of network systems of storage were developed in the 1990th, and elimination of the main shortcomings of the systems Direct Attached Storage, DAS was their task. Generally network solutions in the field of storage systems had to implement three tasks: reduce costs and complexity of data management, to reduce traffic of local networks, to raise degree of readiness of data and overall performance. At the same time architecture of NAS and SAN solve different aspects of a common problem. Simultaneous coexistence of two network architectures became result, each of which has the advantages and functionality.

Storage systems of direct connection (DAS)

Основная статья: DAS

The storage devices Direct Attached Storage, DAS also known as SAS (Server Attached Storage), i.e. systems connected directly to the server were developed many years ago for expansion of capacity of storage of the existing servers. In those days in need of increase in number of the volumes connected with applications new disks were added to the server or the new server was purchased. In view of technological limits on that time (narrow band of transmission, slow networks, expensive microprocessors) and rather low requirements to capacity and time of access, the DAS system were quite adequate solution.

Direct Attached Storage, DAS is, in fact, this expansion of a disk storage system of a single server. Clients get data access, addressing this server through network. So the server has block data access on DWH, and already clients use file access.

Image:DAS.gif

The data storage devices connected to network (NAS)

Основная статья: Network Attached Storage

The main objective of the NAS systems — to simplify file sharing. At basic level of the NAS device is the equipment which is connected directly to a local network. In it their main difference from systems with individual servers with direct connection of the isolated drives consists.

Storage area networks (SAN)

Основная статья: Storage Area Networks

SAN is the separate storage area network which has the high performance and scalability, can extend as vertically (by adding of additional disks and shelves of expansion to uniform disk storage), and is horizontal (with adding of new storages in infrastructure of network). In this case servers get access to disk storage modules by means of SAN network, and do not load a local network. If necessary it is possible to organize transport of data between networks of storage.

These solutions became the answer not only to shortcomings of the DAS and NAS systems, but that is more important, on problems of an overload of communication channels and a delay in local IP networks (10/100-Мбит/с). For the first time the concept of SAN was offered in 1998. As well as many other modern computer technologies, it was borrowed from the world of mainframes where it was applied, for example, in data processing centers to connection of computers to storage systems and distributed networks.

Multilevel data storage

Multilevel data storage (Data multy tiering) can be considered as one of components of wider old concept of virtualization of memory.

The term virtual in relation to memory and DWH arose in 1959 for designation of the external bulk memory virtual in essence on disks used for expansion of an internal memory which during that time was collected from magnetic cores. It by determination was very small, but at the same time extremely expensive. Small and expensive memory was substituted for method, transparent for the processor, cheaper disk memory of incomparably bigger size. In modern storage systems to talk about storage integration, replacement of physical addresses and device numbers with the logical addresses and logical device numbers and about more effective methods of management more precisely.

Emergence of SSD gave a new impulse to continuation of works on virtualization, the present stage is called by Automated Tiered Storage (AST), on it procedures of DataTiering, i.e. data movement for storage levels are automatically performed.

Emergence of SSD gave a new impulse to continuation of works on virtualization, the present stage is called by Automated Tiered Storage (AST), on it procedures of DataTiering, i.e. data movement for storage levels are automatically performed.

Need for data migration is connected with the nature of data. The distribution curve of number of addressing data on time reminds a Gaussian curve – the number of addressing the fresh data requiring quick access is small, in process of aging of data it increases, and further falls and to the archived data on slow to devices the number of addresses are significantly less than peak. This property of data induces to creation of multilevel DWH, at the current level of development of technology it is possible to implement 4th-level model: at 0 level – SSD, on them are stored the most demanded data; at 1 level – fast disks of SAS; at the 2nd level – slow disks of SAS or SANA, at the 3rd level – tapes. The three-level scheme accepted before from disks of SAS, SATA and tapes became outdated.

AST can consider development of earlier known management of hierarchical data storage of Hierarchical Storage Management (HSM) created in 1974 for disk library IBM 3850 which together with disks for the first time allowed to form uniform data space. Perhaps, use of the new name reflects acceleration of processes of migration to the level of real time that allows to use SSD.

AST is a process of permanent data movement between devices, different in cost, according to "temperature" of data: what data hot, that more expensive and respectively can quicker be a device, i.e. SSD, and cold data it is possible to move on a tape. For this AST on the set algorithms periodically browses data and performs movement, being guided by temperature.

It is necessary to distinguish the AST functions with that role which is played by a cache memory on a flash, connected on NVMe. The principle of work of a cache is simpler, than AST, any cache is the tool, in it the fragment from slower memory is for a while copied. The cache is the simple accelerator, AST – optimizes use of resources of DWH.

It is necessary to distinguish the AST functions with that role which is played by a cache memory on a flash, connected on NVMe. The principle of work of a cache is simpler, than AST, any cache is the tool, in it the fragment from slower memory is for a while copied. The cache is the simple accelerator, AST – optimizes use of resources of DWH.

Work with corporate data is one of the most important components of digital changes in the companies. This work requires existence of the effective remedies supporting the interface between hierarchically organized multi-layer systems of storage, analytical and other technologies which are directly serving to the business purposes. Such interface gives the chance to transform passively stored data to the most important asset of the enterprise allowing to take knowledge, useful to decision making, from the saved-up data. With increase of amount of data and emergence of Big Data, the value of interrelation between data and business repeatedly increases.

Responding to the requests arising from modern business, the CROC company offered own concept "Smart data storage" according to which their storage will be organized taking into account their further use and a possibility of extraction of a maximum of useful information from them. Implementation of "Smart data storage" allows to receive business benefits due to more effective use of corporate information. The technologies put in the concept "Smart data storage" extends as to the structured data which are stored in relational DBMS and to promptly increasing volumes of unstructured data. Together with the partner of Dell EMC CROC gives an opportunity to create productive infrastructure for data storage based on a line of arrays of Dell EMC Unity. Thanks to flexibility and simplicity of management it is possible to integrate easily cloud environments, possibilities of all flash and hybrid DWH for transition to the new level of digital transformation.

"Smart data storage" increases cost efficiency of work with information due to its distribution by storages, proceeding from demand at simultaneous respect for data availability for the analytical systems. Besides support of workflows by means of "Smart data storage" allows to increase their reliability as data are stored and processed in the general and protected from failures Wednesday. It is possible to pass to new approach to data storage using Dell EMC technologies as fast as possible and without capital costs, having used the Hardware as a Service model.

In more detail about evolution of DWH read here.

Program and hardware RAID

All existing DWH are divided into the using hardware RAID and specialized software for calculation of RAID – program RAID[6]. The last systems are more economic. Now many problems of processing and data storage are solved within DWH with program RAID much more effectively. For example - reservation of system disks and virtual machines, storage and processing of video, work with large files in workflow systems.

After leadership of program RAID in the early nineties succeeded it hardware, and until recently it prevailed in the market of DWH. The part of inexpensive amateur and house storage systems was assigned to program RAID. Now there is a class of tasks which the program RAID provided directly to OS Windows, Unix and others suffices. DWH with program RAID from category of the systems of initial level entered business market.

File:RAID.gif

Components of program RAID

Development of the direction of DWH with program RAID in many respects is defined by the companies releasing standard component parts: processors with the new built-in commands, the switches and baskets supporting more productive data transfer protocols. Server component parts of new generation and their attractive price, the innovation algorithms of calculation, – all this allowed DWH to exceed with program RAID on characteristics analogs with hardware RAID.

DWH producers with program RAID use all power of new generation of hardware component parts and for one-two years are ahead of producers of hardware RAIDs on terms of release of new models. While producers of hardware RAID need to upgrade production process, for DWH with program RAID it is enough to test a new basket or the processor, - and the new model is ready to delivery.

Among advantages of program RAID it is possible to note high performance on x86-64 platform, inexpensive, available and interchangeable server component parts and also attractive cost of processing and data storage. At the same time the cost of upgrade of a system will be quite low due to component-wise updating of equipment rooms and software tools and also them considerably big functionality. Program RAID allows to implement enciphering at the level of the code of the processor, for example, of Intel Core i7). The similar systems have the increased fault tolerance of N+2 and even N+3.

The interest of the Russian consumers in DWH on the basis of program RAID is shown by a number of factors. Large Russian integrators included in the offers of storage system on the basis of program RAID. In price lists of the Russian assemblers of servers and storage systems the similar systems occupy about 20-30%. Owners of DPCs place resources on program RAID according to practicians of multilevel data storage (see further).

World market of DWH

Main article: DWH (world market)

Russian market of DWH

Основная статья: the Market of DWH in Russia

In the last several years the Russian market of DWH successfully develops and grows. So, at the end of 2010 revenue of producers of the storage systems sold in the Russian market exceeded $65 million that in comparison with the second quarter of the same year is 25% more also for 59% of the 2009th. The general capacity sold to DWH was about 18 thousand terabyte that is growth indicator more than for 150% a year.

The Russian market of storage systems develops extremely dynamically owing to the fact that it is still very young. The lack of the legacy equipment has on it no considerable impact as because of the explosive growth of amounts of data the old systems simply do not meet customer requirements and "are washed away" much quicker, than, for example, ancient servers and workstations.

Rapid growth of amounts of data even more often forces the domestic companies to purchase external disk storage systems. It is promoted in no small measure also by a traditional downward tendency of cost of IT components. If earlier external DWH was perceived only as attribute of the large organizations, then now the need for these systems is not rejected even by small[7].

Main stages of projects of creation of data warehouses

The data warehouse — very difficult object. For 2011 consumption of DWH becomes an integral part of implementation of complete infrastructure solutions. As a rule, it is about impressive investments for 3-5 years, and customers calculate that during all life a system will fully meet requirements imposed from business.

Further, it is necessary to have technologies of creation of data warehouses. If you began to create storage and develop for it a logical model, then you should have a dictionary defining all basic concepts. Even such ordinary concepts as "client" and "product", have hundreds of determinations. Only having gained an impression about what is meant by these or those terms in this organization, it is possible to define sources of necessary data which should be loaded into storage.

Now it is possible to start creation of a logical data model. It is a crucial stage of the project. It is necessary to achieve from all project participants of creation of the data warehouse consent concerning relevance of this model. Upon completion of this work it becomes clear that in fact it is necessary for the client. And only then it makes sense to speak about technology aspects, for example about the sizes of storage. The client appears face to face with a huge data model which contains thousands of attributes and communications.

It is necessary to remember constantly that the data warehouse should not be a toy for IT department and a cost object for business. And first of all the data warehouse should help clients to solve their most critical problems. For example, to help telecommunication companies to prevent leakage of clients. For solution it is necessary to fill certain fragments of a large model of data, and then we help to select applications which will help to solve this problem. It can be very simple applications, will tell Excel. First of all it is worth trying to solve the main problem using these tools. Try to fill all model at once, will use all data sources a big error. Data in sources need to be analyzed carefully to provide their quality. After the successful solution of one-two problems of paramount importance during which the quality of data sources necessary for this purpose is provided it is possible to start the solution of the following problems, gradually filling other fragments of a data model and also using the fragments filled earlier.

A number of the Russian companies concerning deliveries and implementation of DWH and providing related services is listed in the directory of TAdviser. At the same time it is worth understanding that in a number of large projects some vendors can participate directly, first of all, HP and IBM. Some customers in this case feel more surely, entirely relying on service support of world leading manufacturers. Certainly, ownership cost in this case considerably increases.

Trends and perspectives

2020: Western Digital: Five trends in the field of data storage which will define development of the industry

On April 21, 2020 the Western Digital company shared with TAdviser the overview of global trends in the field of data storage to which, according to the company, it is necessary to pay attention in 2020. According to Darrah O'Tul, the senior marketing manager of products of WesternDigital in EMEA region, these trends will define development of the industry of DWH in 2020 and in farther perspective.

Western Digital: Five trends in the field of data storage which will define development of the industry. Photo: alter-ego-media.de.

1). The number of local DPCs will increase, new architecture will appear

According to the company though rates of transition to a cloud do not decrease, it is possible to select two factors which support further growth local (or micro) DPC. First, the updated regulatory requirements to data storage still remain on the agenda. Many countries adopt laws on storage conditions of data therefore the companies are forced not to release data far from themselves correctly to estimate and mitigate the potential risks connected with security and confidentiality of the withheld data. Secondly, repatriation of clouds is observed. The large companies aim to keep the data in property and due to lease of a cloud can reduce costs and at discretion control different parameters, including means of protecting, a delay and data access; such approach leads to increase in demand for local DWH.

In addition, for processing all of the increasing volume and variety of data architecture will appear DPC. During an era zettabytes because of increase in volume and complexity of working tasks, applications and AI/ IoT- data sets the architecture of infrastructure of data storage should be changed. The updated logical structures will consist of several SDH levels optimized under different working tasks, besides, approach to system will change SOFTWARE. The initiative source code of ZonedStorage opened on zone data storage will help clients to realize completely the potential of control of the block devices of storage separated into zones as on HDD- drives with SMR (the tile magnetic record), and on SSD- drives with ZNS for working tasks with consecutive record and with dominance of read operations. Such unified approach allows to manage naturally serialized data with scaling and provides predictable performance.

2). Standardization of AI for simpler deployment of peripheral devices

The analytics is a good competitive advantage, but amount of data which collect and process the companies for the sake of insights, just too big. Therefore for April, 2020, in the conditions of the existing world where everything is connected to everything, accomplishment of certain working tasks is displaced on the periphery because of what there is a need to teach to start and analyze these tiny terminal units escalating amount of data. Because of small dimensions of such devices and need they will quickly put into their operation to evolve towards bigger standardization and compatibility.

3). It is expected that devices for use of data will be separated into levels, and innovations in the field of carriers and factories will gain steam, but not to be reduced

Stable eksabaytny growth of applications with dominance of read operations in DPC will continue and will lead to emergence of requirements to performance, capacity and economic profitability of levels of storages as the companies differentiate the services implemented using their infrastructure of data storage more and more. To meet these requirements, architecture of DPC will gravitate even more to the model of data storage giving the chance to provide and get access to it over factory with the reference platform of storage and devices providing implementation of the whole set of agreements on the service level (SLA) in compliance with specific requirements of applications. Increase in number of solid state drives for processing of fast data and, along with it, continuation of indefatigable demand for exabytes of economically profitable scalable storages which will still support the stable growth of capacity of the park of corporate HDD drives for storage of Big Data is expected.

4). Factories as solution for unification of the general access to storage

Against the background of the exponential growth of amounts of data, further diversification of workloads and requirements to IT infrastructure of the company should propose to clients more and more fast and flexible solutions, in parallel reducing time of an output of products for the market. Ethernet-factories become "a universal unifying payment" of DPC, unifying processes of the general access, filling and management with scaling to respond to the requirements arising because of a bigger variety of applications and working tasks. The arranged infrastructure represents architectural approach in which the NVMe-over-Fabric expansion is used for cardinal improvement of use, performance and flexibility of computing powers and DWH in DPC. It allows to disaggregate storage from computing systems, permitting applications to use the general pool of storage, at the same time data can easily be shared by applications, and required capacities can dynamically be selected for some application, irrespective of location. In 2020 further distribution will receive the arranged disaggregated solutions for DWH which mashtabirutsya effectively over Ethernet-factories and realize all operating potential of NVMe-devices for the most different applications of DPC.

5). HDD drives for DPC will still develop high rates

In spite of the fact that several years many predict recession of popularity of HDD drives, for April, 2020 there is no adequate replacement with the corporate HMDD just, they not only satisfy as before the requirements connected with growth of amount of data but also show cost efficiency in terms of the total ownership cost (TSO) when scaling for hyper scalable DPCs.

As the TRENDFOCUS analytical company in the report "Cloud, hyper scaling and corporate DWH" notes (Cloud, Hyperscale, andEnterpriseStorageService), corporate HDD drives are in steadily high demand: the exabyte of devices will be brought to the market for corporate needs, and annual growth in five calendar years from 2018 to 2023 will be 36%. Moreover, according to IDC, for 2023 103 Zbayta of data will be generated, 12 Zbayt will be saved from which 60% will go to main/peripheral DPCs. Adjusted by the insatiable growth of amount of data, created by both people, and machines, this fundamental technology will face other acceptances of placement of data, more high density of record, innovations in mechanics, smart data storage and the inventions of materials. All this in the near future will lead to increase in capacity and optimization of the total ownership cost (TCO) when scaling.

In view of their fundamental role in warehousing and data management, having critical value for the companies, HDD and a flash technology will remain one of fundamental pillars of successful and safe business operations, irrespective of organization size, its type or the industry in which it works. Investments into complex infrastructure of data storage will allow the companies to strengthen the positions and in long-term perspective it is easier to cope with increase in amount of data, without worrying that the system constructed by them will not cope with the loading connected with implementation of modern and hi-tech business processes.

2018

Diverse infrastructure of DWH became problem No. 1 for most of big corporate customers today: the organizations quite often should support tens of DWH of different classes and generations from different producers as different applications impose different requirements to data storage. So, high reliability and performance, inherent DWH of an upper price segment are required for the crucial transaction systems (billing, processing, ERP, etc.). For the analytical systems the high performance and low cost per storage unit therefore they for them are reserved by DWH with solid-state disks (SSD) are necessary. And, for example, for work with files the functionality and low cost therefore here traditional disk arrays are used are necessary. In diverse infrastructure the level of utilization of DWH is low, the total ownership cost (TCO) — unreasonably high, controllability — weak, besides complexity of such infrastructure of storage, as a rule, [8]

One more serious problem — upgrade of DWH. Often DWH purchased three-five years ago does not cope any more with the growing amounts of data and requirements to access rate to them therefore a new system to which data with former are transferred is purchased. In fact, customers, repeatedly pay for the storage volumes required for placement of data and, besides, incur expenses on installation of new DWH and data transfer on it. At the same time former DWH, as a rule, not so outdated to refuse them completely therefore customers try to adapt them for other tasks.

2009

Rapid evolution annually makes serious changes to the main trends of development of DWH. So, in 2009 capability was regarded as of paramount importance economically to distribute resources (Thin Provisioning), the last several years pass under the sign of work of DWH in "clouds". The range of the offered systems differs in a variety: a huge number of the provided models, different options and combinations of solutions of initial level to Hi-End of a class, the turnkey solution and component-wise assembly using the most modern stuffing, hardware-software solutions of the Russian producers.

The aspiration to cost reduction on IT infrastructure requires permanent balance between the cost of resources of DWH and value of data which on them are stored time at present. For making decision on how it is the most effective to place resources on program and hardware, specialists of DPC are guided not only by approaches of ILM and DLM, but also practice of mnogourovny data storage. Certain metrics are appropriated to each information unit which is subject to processing and storage. Among them availability degree (speed of providing information), importance (data loss cost in case of machine and program failure), the period through which information passes to the following stage.

Example of separation of storage systems according to requirements to storage and information processing by a technique of multilevel data storage.

At the same time, requirements to performance of the transaction systems increased that assumes increase in quantity of disks in a system and respectively the choice of more high-class DWH. In response to this call producers supplied storage systems with the new solid-state disks exceeding former on performance more than by 500 times on 'short' transactions of reading record (characteristic of the transaction systems).

Promoting of a cloud paradigm promoted increase in requirements to performance and reliability of DWH as in case of refusal or data loss will suffer not one-two servers connected directly — there will be a failure in service for all users of a cloud. Owing to the same paradigm the tendency to consolidation of devices of different producers in federation was shown. She creates the integrated pool of resources which are provided on demand with a possibility of dynamic movement of applications and data between geographically spaced platforms and service providers.

A certain shift is mentioned in 2011 in the field of management of Big Data. Earlier similar projects be at a discussion stage, and now they passed into implementation phase, having passed all way from sale before implementation.

In the market the break which already happened in the market of servers, and, perhaps is planned, in 2012 we will see in a mass segment of DWH, the supporting deduplication and Over Subscribing technology. As a result, as well as in case of server virtualization, it will provide large-scale utilization of capacity of DWH.

Further development of optimization of storage will consist in improvement of methods of data compression. For unstructured data of which 80% of all volume are the share the compression ratio can reach several orders. It will allow to reduce significantly the specific cost of data storage for modern SSD carriers, having provided maximum capacity.

See Also

Sources

Notes

  1. of the Data warehouse have to bring income
  2. the Round table of CNews: "Market of DWH: realities and perspectives"
  3. of DWH: trends and perspectives
  4. Modern storage systems
  5. the Storage System (SS)
  6. of DWH in Russia: program RAID returns
  7. companies DWH for "middlings"
  8. told Pure Storage velikakompaniye in June, 2018 to participants and guests of the forum "Infrastructure of 2018" about how Pure of technology which are built in arrays provide a unique combination of high reliability, performance and cost efficiency of DWH.