RSS
Логотип
Баннер в шапке 1
Баннер в шапке 2
2017/11/21 18:53:41

Redundant Array of Inexpensive Disks RAID

Literally "excess array from inexpensive disks". The main objective of RAID - using a disk set to receive a virtual volume of the bigger size and/or bigger reliability.

Content

Today RAID is applied everywhere as the main technology of storage systems in modern DPCs. Most likely, so will be and in the near future, considering development of cloud computing in DPCs using RAID technology and massive amounts of data which emergence is caused by a phenomenon of social networks, devices for smart clients and the mobile Internet, distribution of video on consumer and corporate platforms. Implementation of solutions by corporations on a basis a flash is the next logical step in evolution of technology of DWH as the gap between growth of data and investments into IT infrastructure increases every day more and more, creating problem "bottlenecks" (a proizvoitelnost or I/O) in crucial applications.

As to the companies to avoid these gaps? Many consider that for this purpose there are solid state drives which are capable to close a gap between the calculating capacity and storage system as their time of accidental access and data transfer rate much quicker at the low level of a delay of consecutive read performance in comparison with traditional hard drives. The cost of replacement of all infrastructure of data storage with solid state drives is inexpedient from the economic point of view, more profitable option is development of methods of integration a flash technology in architecture of a corporate system for receiving considerable improvements of work.

History RAID

On mainframes and the mini computer used 14-inch disks, and later winchesters. It is difficult to believe, but such disk with a capacity of 600 MB weighed several tens of kilograms. Later as opposed to RAID such disks began to refer to as "single big expensive disks" (Single Large Expensive Disk, SLED), here to you one more retronym.

RAID definitely did a long way since in 1978 Norman Ken Ouchi from IBM company took out the patent under the name 'The System of Data Recovery in a Failed Unit of Memory' which in essence describes what will become RAID 5 later. Nine years later three scientists of the IT area from Berkeley's (California) University had an idea of consolidation of several hard drives in logical one. In 1988 they published article under the heading 'Arrays of Reserve Inexpensive Disks' where well described the main objective – problem solving of performance by potential I/O. On a twist of fate, the technology which was for the first time offered by them widely extended for the capability to improve functional reliability of data storage more than performance.

Thus for the first time the concept of RAID (engl. redundant array of independent disks is an excess array of independent disks) was provided in 1987 by researchers from Berkeley university David Petterson, Garth Gibson and Randi Katts. In June, 1988 scientists submitted the report "The argumentation for benefit of RAID" at the SIGMOD conference. The first levels of the RAID specification of RAID 1 steel (the mirror disk array), RAID 2 (reserved for arrays which apply the hamming code), RAID 3 and 4 (disk arrays with alternation and one selected parity disk) and RAID 5 (the disk array with alternation and lack of the selected parity disk).

The cluster from disks was arranged so that from the outside it was considered as one disk. But the low cost – not the main advantage of RAID, is more important the fact that the cluster from independent disks could be integrated according to the different schemes (on so-called "levels" of RAID) providing big or smaller degree of safety of data due to redundancy. Therefore letter I in an abbreviation began to be decrypted and as Independent ("independent"). The configuration on the disks ATA still can consider inexpensive, but more often in corporate arrays there are disks SCSI or Fibre Channel.

One more remarkable quality of RAID – repeatedly high speed of exchange, it linearly depends on number of spindles. Thus, parallelization solves a problem of "a bottle throat" on the channel a disk computer. But, if it is simply excessive to increase number of disks, then reliability of a system will decrease.

Data on disks are distributed on bands from several kilobytes to several megabytes in size. The second procedure for reliability augmentation – mirroring. Arrays of RAID differ on a combination of these procedures. There is a choice between access rate and reliability, also "compromise" options are possible.

There are two approaches to implementation of RAID: hardware and program. The first does not take away resources from the processor, it is more reliable, but is more expensive. The second RAID type is cheaper.

It is necessary to distinguish the AST functions with that role which is played by a cache memory on a flash, connected on NVMe. The principle of work of a cache is simpler, than AST, any cache is the tool, in it the fragment from slower memory is for a while copied. The cache is the simple accelerator, AST – optimizes use of resources of DWH.

In more detail about evolution of DWH read here.

Years later the number of standard RAID schemes evolved and received the name 'levels'. RAID 0 increased performance and added storage function, at the same time resistance to failures was lost. While RAID 1 allowed to write mirror data equally on two disks. RAID 2 and RAID 3 synchronized spindle rotation of a disk and saved consecutive bits and bytes on a parity disk. With RAID 4 files extended between different disks which executed transactions independently, allowing to react to I/O requests in parallel. In spite of the fact that all even data remained on one disk, 'bottlenecks' did not cease to appear. RAID 5 distributed parity together with data. In case of failure the subsequent readings could be calculated on the distributed parity. Provide to RAID 6 fault tolerance of two failed disks, doing the big RAID groups by more practical for the systems of high availability.

Distribution of data to different devices in the RAID system can be implemented at the level of the software or using the equipment. RAID on the basis of software is usually provided using OS. OS of a server class which offer management of logical volumes usually support RAID and many operating systems, providing basic functionality of RAID. Some advanced file systems are created for data structure on different storage devices directly. ZFS, for example, supports all RAID levels and any enclosed combinations.

At the level of the equipment RAID controllers can support a set of operating systems as they represent RAID as one more logical drive. They turn on a reading/record cache therefore can improve performance. As reading/record are non-volatile, the current records will not be lost in case of failure of power supply until the cache is protected a backup by the mechanism. The equipment of RAID provides the guaranteed performance and the host computer, but as controllers use a format of own data does not add additional calculations, work with controllers of different suppliers becomes impossible.


Reduction of a role of RAID5 in the field of use of the hard drives HDD. I believe, the reasons of this phenomenon are quite clear when using technology of SAS. Briefly we will remind: the speech about support of all types of disks, including disks for desktop computers as the SAS controllers supports the disks SATA, and they belong to different to the classes in respect of reliability. This also significant increase in quantity of disks as on the controller (up to 256 disks on the controller for RAID controllers of Adaptec of series 2, 5, 6, 7), and as a part of volume (up to 32 disks as a part of RAID5, up to 128 disks as a part of RAID50), especially if to compare to times when in servers the SCSI technology dominated. Besides, this significant increase in capacity of disks. Today in sheets of compatibility of Adaptec of RAID controllers of series 6 and 7 there is a support of disks 4 Tbyte.[1]

As use of solutions RAID5 in projects decreases (and in some cases it is simply forbidden), stacks of RAID controllers propose new solutions, such as 5EE, 6, 1E, etc., for creation of basic user volumes instead of traditional RAID5. Let's note that not only the new RAID levels help to make failure from use of RAID5 painless – in this area it is possible to find also other solutions. For example, support of the bad stripe function allows to save RAID volume data access at emergence of critical situations which led to real loss of access earlier.

Restrictions for use of the hard drives desktop and nearline. Emergence of such restrictions should not be particularly puzzling. Support of all classes of disks in various volumes is complicated as in some statuses (for example degraded RAID5 from desktop-disks) volumes have extremely bad properties connected with reliability. In our case such RAID5 will be several times less reliable, than a normal separate desktop-disk. Violation of restrictions leads to a situation when use of the RAID controller not only is senseless, but also is connected with serious risks for data access and integrity of data.

Growth of popularity of hybrid volumes. In a broad sense the hybrid volume is any volume where both traditional hard drives (HDD), and solid-state SSD are at the same time used. Owing to this fact such solution as SSD caching, is one of options of implementation of hybrid volume too. In RAID controllers of Adaptec the Hybrid Volume function means the special mode for volumes RAID1, 10 where both HDD-, and SSD disks are used.

Growth of popularity of hybrid volumes speaks quite simply. SSD disks in pure form do not find broad application as a number of their server properties is in development so far. The price of SSD disks is quite high. But at the same time SSD solutions have unique performance. Hybrid volumes allow to add reliability and capacity from HDD disks, performance from SSD disks and to optimize the price of such solution.

An opportunity to use RAID volumes of SSD. A good example of the controller which is already available in the market which is developed taking into account wide use of SSD disks is the 7th series of controllers of Adaptec. The core of the controller has an indicator at the level of more than 500 thousand IOPS for accidental traffic (fig. 1) and about 6 Gbytes / with for consecutive templates of traffic. Such indicators allow to use SSD disks of average category of performance (the most popular models) in the quantity at least equal to the number of ports on controller (8-16 SSD).

The internal architecture, 24 ports in the system of the RAID controller, PCI-E v.3 support, etc. create a certain backlog for growth of popularity of support of "net" SSD volumes on RAID controllers.

It is quite obvious that use of SAS3 technology – 12 Gbps will be the next step to support of SSD volumes.

Let's emphasize that there is a number of factors which interfere with wide circulation of SSD of RAID volumes and replacement of HDD of RAID volumes and hybrid volumes so far. The high cost of SSD, their low capacity, the guaranteed record stop at record of a certain amount of information, optimization of algorithms of caching of controllers for work with SSD, the beginning of implementation of support of TRIM commands for RAID controllers, etc. belong to these factors.

Gap in an avalanche of data

Still RAID evolved along with shown to requirements, but if to think … According to researches of Cisco company, traffic of network will test approximate cumulative annual average growth rate of 32% between 2011 and 2015. The IDC company predicts necessary cumulative annual average growth rate of capacity of storage devices of 50% in size for the same period. Gartner estimated cumulative annual growth rates of expenses on IT at 5%, and the indicator of expenses on telecommunication equipment was 7%.

At the same time, considering growth of number of the data created and used worldwide performance gap requires the problem solving RAID, connected with maintenance of performance level and acceleration of data processing. Generally all processors reach writing rate in 1 nanosecond in L1 a cache memory, 10 nanoseconds in L2 and 100 nanoseconds in the main cache memory. It much quicker, than writing rate in 10 milliseconds on DWH of level 1 or of 20 milliseconds of level 2 and in DWH of a near profile. Data confirm waiting time by 100,000 times owing to violation of hierarchy of memory.

The combination of such rupture of performance and expanded growth of data and traffic of network, most likely, overloads RAID of DWH infrastructure, creating 'bottlenecks', passing clock periods of application performance and doing necessary of the data is more difficult for the companies to take all. The modern world is anxious with this problem, access rate to data is the most important aspect of our life - all want to get access to everything and immediately.

Redefinition of performance of DPC

Use of DWH a flash from the existing DWH can save a significant amount of money as multilevel DWH arrays about a flash on the RAID controller (together with intellectual software) can substitute a huge number of disks which is necessary for preserving of the I/O levels in traditional DWH arrays on a basis. The combined approach allows the companies to use intellectually a flash DWH and the existing hard drives together that provides an optimal ratio price/quality in multilevel DWH Wednesday.

Such approach was selected also by LSI corporation, the leader from a segment of the market and the leader of technology in the equipment of RAID and high-performance DWH controllers. Recently the LSI company provided product porfel Nytro™ integrating in itself PCIe flash technology with software of intellectual caching and management. According to LSI, the similar combination provides impressive acceleration of performance which depends on configurations and the application, but easily increases performance of different HDD. Quite often performance the application accelerates by 5-6 times, and in certain cases, according to users, by 30 times.

The new solutions, such as product line of LSI® Nytro™ offering a flash of DWH adapter of PCIe® provide different to capacity and meet the different requirements. High capacity a solutions flash of PCIe can be used in the main DWH for receiving opaque DWH with is largest value, but at reasonable price. Average capacity a solutions flash which combine PCIe flash technology with software of intellectual caching can accelerate SAN and the DAS complex connected by DWH and to provide balance between the cost and value of result. Low capacity a solutions flash which combines RAID контроллернуюю the card with the built-in software a flash and intellectual caching for acceleration of DAS connected by DWH brings advantages a flash 'to masses'.

Intellectual implementation a flash technologies can help development of RAID, solve the problems connected with a gap in an avalanche of data.

RAID 0, strayping

On another Striping (strayping) is called. The organization of such scheme of storage requires at least 2 disks of identical volume. Feature of RAID-0 is an opportunity to receive a virtual volume of the bigger size, than compound disks. The principle of data writing on disks - consecutive disk striping by data units identical are long (Stripe Unit). At the same time the first part of data registers in the first disk, the second - in the second and so on, physical media (the first line of record) will not end yet. After that record follows on the first carrier, but on the following data unit of this carrier, then on the second block of the second carrier and so on (the next lines). A lack of such organization of data storage is the lack of duplication that at failure at least of one carrier leads to complete data loss. However, such organization type of data at the correct selection of the Stripe Unit size allows to receive the bigger performance of a virtual volume, than the single physical medium.

RAID 1, mirroring

Or mirroring (Mirroring). The organization of such scheme of storage requires at least 2 disks of identical volume. When mirroring data units register at the same time in two or more fizicheky devices. When mirroring the virtual volume contains two and more copies of data. RAID-1 helps at failure of one or more carriers, however has an essential shortcoming: at such organization of data storage a large number of excess physical media is required (twice more, at least).

RAID 4

Alternation with record of checksum. The organization of such scheme of storage requires at least 3 disks of identical volume. At the organization of a RAID-4 virtual volume data register in physical media as it becomes in a case with RAID-0. However the checksum which is calculated using the simplest algorithm of CEC from the previous blocks of the same line registers in the last volume. At failure of one of disks data can be recovered using an excess disk with checksums. A lack of such organization of data storage is "hot spot" - the last disk storing checksums as any write operation will demand read operation of checksum, its recalculation and record.

RAID 5

Data writing with alternation of checksum. At the organization of a virtual volume RAID-5 the first line of data registers alternation as in case of RAID-4, however distinctions follow further. The second line of data uses a penultimate disk for storage of checksums and so on. After search of all carriers checksum registers in the last disk and the cycle again is repeated. Thus RAID-5 allows "spread" a hot spot, checksum, on all physical media and eliminates the defect inherent in RAID-4.

RAID 4 and RAID 5 allow to protect a virtual volume from failure of one physical medium.

RAID 6

It is also known as Advanced Data Guarding (ADG). The organization of record is similar to RAID-5 except that checksum registers in duplicate on different disks. The minimum quantity of disks for the organization of an array – 4. Has the following characteristics:

  • High speed of reading;
  • High readiness - RAID-6 allows to protect a virtual volume from failure to two physical media;
  • Higher coefficient of useful use of disks in comparison with RAID 1+0 as storage of redundant information requires only space equivalent to two physical disks;
  • Lower writing rate in comparison with RAID-5 as it is required to make 2 write operations for preserving of checksum.

The combination of the different RAID levels is also possible, for example:

RAID-0+1

The organization of such scheme of storage requires at least 4 physical media, the total quantity of carriers should be even. Carriers are grouped on 2 groups of the identical size, in each group RAID-0 will be organized. After that groups are mirrored relatively each other.

RAID-1+0

As well as in case of RAID-0+1 at least 4 fizicheky carriers will be required, the total quantity of carriers should be even. Carriers are also grouped by pairs between which the mirror (RAID-1) will be organized, after that between two mirrors the strayping (RAID-0) will be organized.

There are also both other RAID levels and their combinations, however they were not widely adopted.

RAID 7.3

Initially RAID was exclusively hardware technology. The physical RAID controller is capable to support several arrays of different levels at the same time, however more effective implementation of RAID is possible using program components (drivers). So, the kernel of Linux allows to manage RAID devices flexibly. Having taken modules of a core Linux and technologies of noiseproof coding as a basis, developers of a software technology RAIDIX managed to create the solution for creation of high-performance failsafe DWH based on standard component parts.

Software of RAIDIX allows to work with arrays of the RAID 0, RAID 5, RAID 6 and RAID 10 levels. Among patent algorithms of Raidix there are unique RAID 7.3 and N+M levels.

RAID 7.3 is RAID 6 analog with double parity, but has higher degree of reliability. RAID 7.3 is block interleaving level with threefold distribution of parity which allows to recover data at failure to three disks in an array and to reach high high-speed rates without additional load of the processor. RAID 7.3 significantly reduces failure probability of disks without loss in performance and cost and is often used for large arrays more than 32 TB.

RAID N+M

Other patent RAID N+M technology is the level of block interleaving with arbitrary distribution of parity allowing the user to define independently quantity of the disks selected under storage of checksums. This unique algorithm of Raidix allows to recover data at failure to 32 disks (depending on quantity of disks of parity).

On the basis of artificial intelligence and machine learning the research laboratory "Raidix" is engaged in development of new algorithms and creation of advanced technologies. The Raidix company registers more than 10 technology patents in Russia and the USA. Among the key directions of a research in laboratory It should be noted a gradual failure from the old recording levels introducing additional delays; development of predictive analytics; setup of parameters of storage on the fly, etc.

Additional functions of RAID volumes

Copy Back Hot Spare

Between solutions like "that RAID without disk Hot Spare" and "that RAID with Hot Spare a disk" the case of appointment of Hot Spare-disk to several volumes lies. It adds reliability in comparison with the first case and gives some economy in comparison with the second. Even more economies brings the solutions Copy Back Hot Spare implemented, for example, on RAID controllers of Adaptec of series 5, 6, 7.

When using this function there is an opportunity to close volumes on SATA-disks with the help of the disk SATA Hot Spare with the Copy Back function and volumes on a SAS disks – with the help of the disk SAS Hot Spare with the Copy Back function. But the maximum effect of reliability and economy gives a case when is a little, we will tell five, volumes RAID5 constructed on a SAS disks, and to all five volumes in the Copy Back mode SATA Hot Spare a high-capacity disk – such that it was enough for replacement of the failed disk in all five working volumes is appointed.

In case of failure of one of disks in working volume on its place (through the transaction build) a part of the disk SATA instantly will begin to be built in. Certainly, it is not too good when are a part of volume both SAS - and SATA disks, but in this case the controller selects smaller of the evils. Instead of keeping in extremely dangerous status (especially for our case as an exit of one disk will transfer volume RAID5 to degrade status), the controller begins to build in a part of a SATA disk volume. And it for minimum possible period returns volumes in an initial status in respect of reliability. At the same time performance of volume falls (now in it the SATA disk works). The administrator needs only to order and receive a new SAS disk and if to insert a new disk into the same slot where was failed, then all necessary blocks will be calculated and written on this disk, and only after that a part of a SATA disk will be disconnected from volume. This part is ready to service volumes again.

Protection of caching through RAM using the module AFM

It is known that almost for all types of traffic caching through RAM gives essential advantages in performance. For use of this function the cache on reading and on record is enough to include, however such scheme requires obligatory protection. It is important to note that the UPS does not solve this problem completely. For example, loss of PDB (power distribution board) – a payment of logic of control of power supplies with redundancy or the separate power supply of the server if the redundancy on a power supply is not used – will inevitably lead to data loss in a cache even in the presence of the UPS.

Earlier in RAID controllers the special battery (battery backup unit, BBU) which task was to feed RAM for some time was used not to allow to be nullified to cache contents that could lead to data loss on volume. The battery had a number of shortcomings: on its charging and testing day, time of support of a cache 24 hours in the theory (8 hours mean value), life term about 2 years, a guarantee 1 year, extremely negative impact of temperature for the term of life, a required stop of the server is required approximately when replacing the battery, etc.

Taking into account these shortcomings modern controllers of Adaptec offer for protection of a cache through RAM so-called a flash modules (Adaptec Flash module, AFM). The principle of operation of such module is extremely simple: the payment with logic receives a power supply from the supercondenser which manages to be charged completely for load time of the server and loading of OS, and in case of power failure of it is enough to copy cache contents on a flash memory that allows to store without serious consequences cache contents at power failure though several years. Today such modules have 3 years of a guarantee; in comparison with BBU in them the dependence of term of life on temperature is improved. This technology is constantly improved: the sizes decrease, the logic is built in a controller chip, the dependence on temperature improves that, perhaps, in the next years will allow to increase guarantee period.

SSD caching

SSD caching is urged to strengthen work with those templates of traffic where caching through RAM does not give significant effect of a gain of performance; first of all this accidental reading. As the cache size on SSD on orders exceeds the extent of RAM, the mechanism of caching works as follows. The special algorithm marks RAID volume data units, for example volumes RAID5 as "hot" if the address to them exceeded a certain threshold of requests in unit of time. When the block becomes "hot", it is copied on a SSD cache (the last represents or SIMPLE VOLUME, or – in the last implementations – some RAID volume). The following requests are already serviced from SSD volume that many times increases read performance. In practice it not only gives significant increase in performance, but also influences project cost.

Let's note that if on volume there are no "hot data", i.e. the amount of requests in unit of time to all data units is approximately identical or if the template of hot data very quickly changes, quicker, than the algorithm of SSD caching copies them in a SSD cache or if hot data are not enough, then use of SSD caching on reading will not give a desirable gain in performance.

Let's say load of servers 1U in data center grew, and there is not enough capacity of such server. Simple adding of a disk in the server solves a problem only partially as there will be no empty seat left soon. Then performance improvement will demand either adding of one more server, or installation of an external disk rack. The new same server will increase performance twice, an external rack if its height 1U, at most by 3 times. But installation of one SSD and activation of SSD caching can increase performance by 5–8 times! At the same time neither the space in cabinets, nor additional external devices will be required.

Loss of a SSD disk does not lead to loss of access and data loss: the cache will just be switched off (performance at the same time will fall), but when replacing SSD by a new template of "hot" data will quickly enough be recovered and performance will return.

The SSD cache on record as it was mentioned in the table, requires protection of a cache (fig. 2). The SSD-cache-pool in the form of RAID volume with redundancy (for RAID controllers of Adaptec RAID1, 1E, 10, 5 are used) is for this purpose created.

In a product line of Adaptec function of SSD caching is supported only by controllers with index Q.

It would be desirable to note that the SSD setup of caching is configured in two stages. On the first SSD disks or RAID volumes on SSD disks are transferred to a cache a pool, after this SSD stop being available as normal disks, and in a management system they receive the special designation cache. The exception among products of Adaptec is made by series 7Q. From SSD disks "the volume of the necessary capacity is cut off and transferred to a SSD cache; the capacity remaining balance if like that is available, can be used for creation of volume of data. At the second stage to each volume (SSD caching is individually configured for each volume, as well as caching through RAM), which needs to activate SSD caching, join a cache on record and a cache on reading. Settings of volume can be changed dynamically at any moment when in it need appears.

Power Management function

It is function extremely simple inherently with very simple settings. A solution essence in an opportunity at first to reduce amount of turnovers of the hard drive approximately twice if a system does not receive input-output requests during the specified period (such stage is fulfilled by hard drives which support the standby mode), and after the next specified period in the absence of commands in input-output system to switch off disks at all.

It gives direct economy on a power supply and indirect economy on a power supply for cooling of the working hard drives, at the same time the wear of hard drives can decrease considerably. For example, if once a month becomes a backup from working volume RAID5 on volume RAID0, the idea to activate the Power Management function on volume RAID0 looks extremely tempting. The whole month, except one-two days when there is a copying, disks of this volume will be switched off.

Such scheme can be applied also to system volumes, separate input-output operations lay down in a cache (OS even in an inactive status uses requests to volumes) and only after exceeding of some threshold will force disks of this volume to be untwisted and enter the Full Power mode.

Let's note that all controllers of Adaptec of series 2, 5, 6, 7 support the Power Management function.

Management

Management systems – an important factor of ensuring reliability of volume of RAID. The basic rule requires that all user volumes were brought to the control circuit.

The utility of management for controllers of Adaptec is called ASM (Adaptec Storage Manager) in the latests version of Max View ASM. It is distributed for free (it is available on the website of the producer). The utility allows to manage all volumes and all controllers in all servers in the local area network from one administrator machine, and in the last implementations – from any device equipped with the network interface card and the browser.

For pro-active tracking of a status of volume (it is extremely important to do not pass the moments of loss of a disk or disks as a part of volume of RAID, operation of Hot Spare-disks, etc.) there is a possibility of sending single or block messages by e-mail. Thus, the administrator can receive the message from the RAID controller about state change of volumes or any other critical events at any time. The utility allows to configure easily sending e-mails, has a possibility of sending test messages for debugging of works and extent of detailing, there is a possibility of sending all messages from the OS magazine of the RAID controller and other magazines, for example Java or the magazine of the agent of ASM.

Notes