Storage - DSS
A storage system (storage DSS) is a conglomerate of specialized equipment and software that is designed to store and transfer large amounts of information. Allows you to store information on disk sites with optimal resource allocation.
The solutions DSS and Projects Catalog is available on TAdviser.
"Physics" of storage
Perhaps the most fascinating part of computer history is the chronicle of data storage systems (DSS), because in this area there was a great variety in physics and in the system organization, moreover, for many years everything was very clear here. Computers soon lost their visual attractiveness, replacing beautiful and diverse vacuum lamps and individual semiconductor components (triodes and diodes) with single-faced integrated circuits and microprocessors. Now we can distinguish by inscriptions something enclosed in bodies of different sizes, differing in the number of contacts. The physics of semiconductor innovations ultimately boils down to finding scientific and technological solutions that provide an increase in the density of transistors on the substrate. These important achievements do not have an appearance and for the consumer are reduced to the numbers 0.18, 0.13, 0.11... However, today the same can be said about disks - outwardly, these are boxes of several sizes, differing in content.
Over 60-70 years, DSS evolved from the simplest cards and hole tapes used to store programs and data to solid-state drives. On this path, many devices that are unlike each other were created - these are magnetic tapes, drums, discs, and optical discs. Some of them are in the past: these are perforated media, magnetic drums, flexible (floppy) disks and optical disks, while others live and will live for a long time. What is gone today can be viewed and nostalgic at the Museum Of Obsolete Media Technology Museum. And at the same time, seemingly doomed, remains. At one time, they predicted the end of magnetic tapes, but today nothing prevents their existence, the same applies to hard rotating disks (HDD), prophecies about their end are devoid of any basis, they have reached such a level of perfection that their own niche will remain behind them, regardless of any innovations.
The current tiered DSS landscape include tape libraries for backup and archiving, fast and slow HDDs, flash-based solid-state drives SSD , HDD mimics (interfaces, form factors) primarily for alignment with existing software and designs, and the latest flash drives in NVMe card format. This picture was influenced by several factors, including the John von Neumann scheme, which divides memory into operational, directly available to the processor, and secondary, designed for storing data. This division was strengthened after the ferritic memory, which retains its current state, was replaced by semiconductor memory, which requires loading programs to start work. And of course, the unit cost of storage affects, the faster the device, the higher this cost, so in the foreseeable future there will be room for both tapes and disks. Read more about DSS evolution here.
How data was stored before
Media using perforation
Punched cards
Before the advent of computers for centuries, the simplest software-controlled devices (looms, curlers, carillon watches) used perforated media of various formats and sizes and pin drums. Keeping this recording principle, Herman Hollerit, founder of TMC, later part of IBM, made the discovery. It was in 1890 that he realized how punched cards could be used to record and process data. He implemented this idea in the processing of statistics obtained during the census, and later transferred it to other applications, which ensured the well-being of IBM for decades to come.
Why cards? They can be sorted and, relatively speaking, "direct access" can be provided to them so that on a special tabulator device, following a simple program, partially automate data processing.
The format of the maps changed, and since the 1920s, 80-column maps have become the international standard. The monopoly on them until the early 60s belonged to IBM.
These simple cartons with rectangular holes remained the dominant data carrier for several decades, they were produced in billions. The volume of card consumption can be judged by at least one example of the Center for Decoding German Radiograms in Blechly Park: a week of work - 2 million cards, this is a medium-sized truck! The post-war business was also built on storing data on maps. Speaking of punched cards, it should be remembered that they were used in Germany to collect data on people to be destroyed.
Punched tapes
It would seem that punched tapes are more practical media, but they were practically not used in business, although input and output devices were much simpler and easier. Their distribution was hindered by sequential access, lower capacity and low input and output speeds, and the complexity of archiving. Since 1857, narrow 5-column punched tape has been used for the preparation and subsequent transmission of data by telegraph, so as not to limit the input speed to the physical capabilities of the operator and thereby better use the channel capacity. Wide 24-column punched tape was created to record programs in the Harvard Mark I electromechanical calculator in 1937. As a carrier unaffected by different electromagnetic and gamma studies, punched tapes were widely used as onboard devices, they are still used in some defense systems.
Read more about DSS evolution here.
Magnetic tapes
A method of recording sound on a coil magnetic medium, first on a wire, was proposed in 1928. A tape recorder of this type was used in UNIVAC-1. The beginning of the history of computer magnetic tapes is considered to be IBM Model 726, which was part of the IBM Model 701 computer. The width of the tape for the IBM Model 726 and other devices of that time was one inch, but such tapes were inconvenient to use. Due to their large mass, powerful drives were required, so soon they were replaced by half-inch "open reel," in which rewinding was carried out from one reel to another (reel-to-reel). They had three recording densities of 800, 1600 and 6250. Such removable-ring tapes for write protection became the standard for data archiving until the late 1980s.
The Model 726 used reels from the film, respectively, the width of the tape was one inch, and the diameter of the reel was 12 inches. The Model 726 was capable of storing 1.4 MB of data, the density of 9-track recording was 800 bits per inch; when the tape moved at 75 inches per second, 7500 bytes per second were transmitted to the computer. The magnetic tape itself for the Model 726 was developed by 3M (now Imation).
Quite soon, inch tapes were abandoned, because of their weight when working in start-stop mode, too powerful drives and vacuum pockets were required, and for a long period almost monopoly dominance of half-inch "open reel" was established, in which rewinding was carried out from one reel to another (reel-to-reel). The recording density increased from 800 to 1600 and even 6250 bits per inch. These tapes with removable recording protection rings were popular on computers such as EU and SM computers. half-inch "open reel," in which rewinding was carried out from one reel to another (reel-to-reel). The recording density increased from 800 to 1600 and even 6250 bits per inch. These tapes with removable recording protection rings were popular on computers such as EU and SM computers.
An incentive for further development was that in the mid-80s, the capacity of hard drives began to be measured in hundreds of megabytes or even gigabytes, so they needed backup drives of the corresponding capacity. The inconveniences of the open tapes were understandable, even in everyday life, cassette recorders quickly replaced the reel ones. The natural transition to cartridges took place in two ways: one is to create specialized devices focused on computers (using linear technology): the second is to turn to technologies invented for video recording and audio recording with rotating heads (using screw technology). Since then, there has been a division into two camps, which gives the storage market a unique specificity.
Over thirty years, several dozen cartridge standards have been developed, the most common LTO (Linear Tape-Open) standard today, during which cartridges were improved, their reliability, capacity, transmission speed and other performance characteristics were increased. A modern cartridge is a complex device equipped with a processor and flash memory.
The transition to cartridges was facilitated by the fact that now tapes work exclusively in streaming mode. Cartridges are used in either standalone devices or tape libraries. The first robotic library with 6 thousand cartridges was released by StorageTek in 1987.
Analysts and disk manufacturers have repeatedly predicted the death of tapes. The slogan "Tapes must die" is known, but they are alive and will live for a long time, because they are designed for many years of storage of large archives. The size of the business associated with the production of tape drives, tapes and tape libraries in 2017 was estimated at about $5 billion. And the greater the amount of information that can be stored on hard drives, the greater the need for archiving and backup. On what? Of course, on tapes: no alternative to magnetic tapes has yet been found economically viable in terms of storage costs. The current 8th generation of the LTO standard allows you to regularly save up to 12 TB, and in compressed mode 30 TB, in the future these numbers will increase by an order of magnitude or more, with the change of generations, not only quantitative indicators increase, but also other operational characteristics.
Read more about DSS evolution here.
Magnetic drum
A temporary way to resolve the contradictions between the technology of sequential recording on tape and the need for direct access to data on an external device has become a magnetic drum, or rather a cylinder with fixed heads. It was invented by Austrian Gustav Tuchek in 1932
Magnetic is not a drum, in which, as is known, the bottom serves as the working surface, but a cylinder with a ferrimagnetic coating applied to its side surface, divided into tracks, and they, in turn, are divided into sectors. Each of the tracks has its own read/write head, and all heads can work simultaneously, that is, read/write operations are carried out in parallel mode.
Drums were used not only as a peripheral device. Before the transition to ferrite cores, RAM was extremely expensive and unreliable, so in some cases drums played the role of RAM, there were even computers called drums. Typically, magnetic drums were used for operational (often variable) or important information that needed quick access. In conditions of restrictions on the size of RAM, due to its high cost, a copy of the operating system was stored on them, intermediate results of programs were recorded. On drums, a swapping procedure was first implemented, representing the virtualization of memory due to space on the drum, and later on the disk.
The magnetic drum drives had a capacity smaller than the disks, but worked faster because unlike the disks in them, the heads are stationary, which eliminates the time required to get to the desired track.
Drums were actively used until the early 80s, for some time they lived in parallel with discs. The BESM 6 computer and its contemporaries were equipped with drums. It is known from open sources that the last drums stood in the Minitman missile control systems until the mid-90s.
Read more about DSS evolution here.
Floppy disks
The active life of floppy discs stretched for 30 years from the late seventies to the late nineties. They turned out to be extremely popular due to the fact that PCs appeared earlier than users had the ability to transfer data over the network. Under these conditions, floppics served not only for their intended purpose for storing backups, but, perhaps, more so for exchanging data between users, which is why they are also called sneaker, like sneakers, typical shoes of programmers. By exchanging floppics, they created a kind of network - sneakernet.
There were 3 main types of disks and many different modifications. Flopy disks with a diameter of 8 inches were created in 1967 at IBM, they were conceived as an initial bootstrap device for mainframe computers IBM/370 to replace the more expensive permanent memory (non-volatile read-only memory), it was equipped with the previous generation of IBM/360. However, realizing the commercial value of the novelty, in 1971 IBM turned floppy into an independent product, and in 1973 development manager Alan Shugart created Shugart Associates, which became the leading manufacturer of 8-inch drives with a maximum capacity of 1.2 MB. These large disks were used on PCs produced before the advent of IBM XT. This type of floppy disk has gained particular popularity thanks to Harry Kildahl's CP/M operating system.
As for floppy disks with a diameter of 5.25 inches, their appearance resembles an anecdote about Nicholas II, which simply peculiarly explains the increased width of the Russian railway gauge compared to the European one. In our case, En Wang, the owner of Wang Laboratories, met in a bar with immigrants from Shugart Associates, who proposed making a cheaper drive for his computers, but they could not decide on a specific diameter. Then Wang took a cocktail napkin and said that it seemed to him that size should be like that. Five-inch disks with capacities of 360 and 720 KB were produced until the end of the nineties, they were contemporaries of IBM XT and IBM AT computers, MS-DOS and DR-DOS operating systems, correctly serving the formation of a new industry.
The alternative cartridge proposed by Sony in 1983 was 90.0 mm × 94.0 mm, but it was traditionally called 3.5 inch. In the American professional environment, it is called stiffy disk (the translation is worth looking at in the dictionary). After a number of improvements, the industry standard 3.5-inch HD (High Density) was adopted in 1987 with a capacity of 1.44 MB. At first, IBM PS/2 and Macintosh IIx were equipped with such disks, and later it became a universal standard for PC and Macintosh. Attempts to make larger capacity Extended Density (ED) 2.88 MB disks in the second half of the nineties, as well as the promising magneto-optical Floptical disk 25 MB, SuperDisk 120-240 MB and HiFD 150-240 MB were not market success.
Read more about DSS evolution here.
Why DSS Are Needed
The IDC Perspectives study shows that data storage ranks second among IT costs and accounts for approximately 23% of all costs. According to The InfoPro, Wave 11 "increases in DSS costs in the average Fortune 1000 company exceed 50% per year."
According to analysts, organizations around the world are growing the volume of stored and processed information every minute. Unique information is becoming more expensive, its volume increases many times every year, and its storage is costly. In view of this, organizations strive not only to shape the development of the storage infrastructure, but also to find opportunities to improve and increase the economic efficiency of DSS: reduce energy consumption, service costs, total cost of ownership and purchase backup and storage systems.
The growth of data volumes, increased requirements for storage reliability and data access speed make it necessary to allocate storage facilities to a separate subsystem of the computing complex (IC). The ability to access and manage data is a prerequisite for running business processes. Irretrievable data loss puts businesses at serious risk. Lost computing resources can be recovered, and lost data, in the absence of a well-designed and implemented backup system, can no longer be restored.
There is a noticeable development of the need not only for the purchase of DSS by corporate clients, but also for strict accounting, audit and monitoring of the use of expensive resources. There is nothing worse than stopping business processes due to the inability to timely obtain the necessary data (or their complete loss), and this can lead to irreversible consequences.
Factors that contribute to the development of DSS
The main factor was the growth of competition and the complication of its character in all market segments. In Western Europe, these phenomena could be observed before, and in Eastern Europe - in the last five years. Five years ago, the mobile operator had 25-25 million registered SIM cards, and today - 50-70 million. Thus, almost every resident of the country is provided with mobile communications from these companies, and there are also regional operators. Here is the real level of competition: there is no one left on the market who does not have a mobile phone. And now operators cannot grow extensively by selling their products to those who do not yet have similar products. They need customers who work with competitors and need to understand how to get them. We need to understand their behavior, what they want. To extract useful information from the available data, you must place it[1] Warehouse store[2]
Another factor is the emergence on the market of many companies that offer their solutions to support the business of enterprises: ERP, billing systems, decision support systems, etc. All of them allow you to collect detailed data of a very different nature in huge volumes. If your organization has a well-developed IT infrastructure, you can gather this data together and analyze it.
The next factor is of a technological nature. Until some time, application manufacturers independently developed different versions of their solutions for different server platforms or offered open solutions. An important technology trend for the industry has been the creation of adaptable platforms for solving various analytical problems, which include the hardware component and DBMS. Users no longer care who made a processor or RAM for their computer - they consider data storage as a kind of service. And this is the most important shift in consciousness.
Technologies that allow you to use data storage to optimize operational business processes in almost real time not only for highly qualified analysts and top managers, but also for front office employees, in particular for employees of sales offices and contact centers. Decision-making is delegated to employees standing on the lower rungs of the corporate ladder. The reports they need are usually simple and concise, but a lot of them are required, and the formation time should be short.
Applications DSS
Traditional data stores can be found everywhere. They are designed to form reports to help deal with what happened in the company. However, this is the first step, the basis.
It becomes not enough for people to know what happened, they want to understand why it happened. To do this, business intelligence tools are used to help you understand what the data says.
Following this comes the use of the past to predict the future, the construction of prognostic models: which clients will stay and which will leave; which products are waiting for success, and which will be unsuccessful, etc.
Some organizations are already at a stage where data stores are beginning to be used to understand what is happening in business in the future. Therefore, the next step is to "activate" front systems using solutions based on data analysis, often in automatic mode.
The volume of digital information is growing avalanche-like. In the corporate sector, this growth is caused, on the one hand, by tightening regulation and the requirement to preserve more and more information related to business. On the other hand, tightening competition requires more and more accurate and detailed information about the market, customers, their preferences, orders, competitors' actions[3]
In the public sector, the growth of stored data supports the widespread transition to interdepartmental electronic document management and the creation of departmental analytical resources, the basis of which is a variety of primary data.
An equally powerful wave is created by ordinary users who post their photos, videos on the Internet and actively exchange multimedia content on social networks.
DSS Requirements
In 2008, TIM conducted a survey among customers to find out which characteristics are most important for them when choosing[4] DSS[5]. The first positions were the quality and functionality of the proposed solution. At the same time, the calculation of the total cost of ownership for the Russian consumer is not typical. Customers most often do not fully realize what costs await them, for example, the cost of renting and equipping the premises, electricity, air conditioning, training and salaries of qualified personnel, etc.
When it becomes necessary to purchase DSS, the most that the buyer estimates for himself is the direct cost passing through the accounting department for the purchase of this equipment. However, the price in terms of importance was in ninth place out of ten. Of course, customers take into account the possible difficulties associated with the maintenance of equipment. Usually, extended warranty support packages, which are usually offered in projects, help to avoid them.
The practice of AvroRAID shows that a number of problems are pushing to buy new DSS or update existing consumers.
Reasons to Purchase DSS
Source: AvroRAID, 2010.
What does DSS consist of?
Typically, a storage system contains the following subsystems and components:
- storage devices (disk arrays, tape libraries)
- storage access infrastructure
- subsystem and backup data archiving
- storage management software
- management and monitoring system
The DSS system often involves mounting into a standard 19-inch cabinet and contains hard drives, external interfaces for connecting hosts (servers) and several power supplies. Inside there are processor units, disk controllers, I/O ports, cache memory and other necessary components.
An example of DSS structure diagram. 1. A controller that includes a central processing unit (or several), interfaces for switching with hard drives and external ports, and cache memory. 2. External interface, in this case Fibre Channel. 3. Hard drives - the capacity is expanded by additional shelves. 4. Cache memory is usually mirrored so as not to lose data when one of the modules fails.
The drives available on the system can be grouped and RAID grouped at various levels. The resulting disk is divided into logical units (LUNs) - hosts access them and "see" them as local hard drives. The number of RAID groups, LUNs, cache logic, LUN availability to specific servers is configured by the administrator.
Disk arrays
In the late 90s, two innovations successfully came together - the scientific base of RAID and hard drives produced by mass circulation. If you put them together, it turned out to be possible to create a commercial cluster-type drive that can compete with IBM disks in technical terms at a significantly lower price.
The gigantism of disks produced before the advent of hard drives was contrary to the uncomplicated logic of these devices. Their logic was primitive, almost entirely consistent with physical infrastructure (sectors and paths). And, like small-circulation and specialized products, they were expensive.
In 1988, Michael Ruettgers, who later became EMC's chief strategist, proposed developing a disk system consisting of hard drives and supplying them for IBM-compatible mainframes and AS/400. Another, perhaps the most successful DSS specialist, Moshe Yanai, put forward the ideology of the Integrated Cached Disk Array (ICDA) cache, as a result of which the progenitor of EMC Symmetrix disk clusters was born.
In the fall of 1990, when EMC introduced Symmetrix, the legend of disk arrays, the 4200 ICDA had a capacity of 24 GB, 256 MB cache and a 32-bit processor-based controller. Symmetrix has established the company as a leading supplier of mainframe drives over several years. According to IDC, its market share in mainframe drives increased from 1% (in 1990) to 42.5% (in 1996).
Symmetrix was cheap for mainframes, but too expensive for Unix servers, and even more so for x86 servers, so many companies rushed into the opened market segment, they offered products inferior to Symmetrix in quality, but not so expensive. Subsequently, many models of disk arrays of various purposes appeared on the market.
Key DSS Requirements
In practice, not one server connects to the DSS, but many tens and hundreds. This dictates a number of key requirements for systems of this kind[6]:
Reliability and fault tolerance. The DSS provide full or partial redundancy of all components - power supplies, access paths, processor modules, disks, cache, etc. It is mandatory to have a monitoring and notification system for possible and existing problems.
Data availability. Provides thoughtful data integrity features (RAID technology, full and instantaneous copies of data inside the disk rack, data replication to remote DSS, etc.) and the ability to add (update) hardware and software in hot mode without stopping the complex;
Controls and controls. The DSS are controlled through a web interface or command line, there are monitoring functions and several options for notifying the administrator about problems. Hardware-based performance diagnostics technologies are available.
Performance. It is determined by the number and type of drives, the amount of cache memory, the processing power of the processor subsystem, the number and type of internal and external interfaces, as well as the possibilities of flexible configuration and configuration.
Scalability. The DSS usually have the ability to increase the number of hard drives, the amount of cache memory, hardware upgrades and expansion of functionality using special software. All of these operations are performed without significant reconfiguration and loss of functionality, which allows you to save and flexibly approach the design of the IT infrastructure.
DSS types
Disk DSS
It is used for operational work with data, as well as for creating intermediate backups.
There are the following types of Storage System [7]) disk DSS:]
- Working Data DSS (HHV)
- Backup DSS (Disk Library)
- DSS for long-term storage of archives (CAS system).
Tape DSS
Designed for creating backups and archives.
The following types of tape DSS exist:
- separate drives;
- autoloaders (one drive and several tape slots);
- tape libraries (more than one drive, many tape slots).
DSS Connection Options
Various internal interfaces are used to connect devices and hard drives within the same storage:
The most common external DSS connection interfaces are:
The popular inter-node cluster interface Infiniband is now also used to access DSS.
DSS Topology Options
The traditional approach to storage is to directly connect servers to Direct Attached Storage (DAS). In addition to Direct Attached Storage, DAS, there are storage devices connected to the network - NAS (Network Attached Storage), as well as components of storage networks - SAN (Storage Area Networks). Both NAS- and SAN-systems appeared as an alternative to the Direct Attached Storage, DAS architecture. Moreover, each solution was designed as a response to the growing requirements for storage systems and was based on the use of technologies available at that time.
Networked storage architectures were developed in the 1990s, and their task was to eliminate the main disadvantages of Direct Attached Storage, DAS. In general, storage networking solutions had three objectives: reduce data management costs and complexity, reduce LAN traffic, improve data availability, and improve overall performance. At the same time, NAS and SAN architectures solve various aspects of a common problem. The result was the simultaneous coexistence of two network architectures, each of which has its own advantages and functionality.
Direct Attached Storage (DAS)
Storage devices, Direct Attached Storage, DAS also known as SAS (Server Attached Storage), that is, systems that connect directly to, to the server were developed many years ago to expand the storage capacity of existing servers. At that time, when it was necessary to increase the number of volumes associated with applications, new disks were added to the server or a new server was purchased. Taking into account the technological limitations of that time (narrow bandwidth, slow networks, expensive) microprocessors and relatively low requirements for capacity and access time, DAS systems were quite an adequate solution.
Direct Attached Storage, DAS is essentially an expansion of the disk storage system of a single server. Clients access the data by accessing this server over the network. That is, the server has block access to data on the DSS, and clients already use file access.
Network Attached Storage Devices (NAS)
The main task of NAS systems is to simplify file sharing. At the base level, NAS devices are hardware that connects directly to the LAN. This is their main difference from systems with individual servers with direct connection of isolated drives.
Storage Area Networks (SANs)
SAN - A separate SAN with high performance and scalability can expand vertically (by adding additional disk drives and expansion shelves to a single disk storage) or horizontally (by adding new storage to the network infrastructure). In this case, the servers access the disk drives via the SAN and do not load the LAN. If necessary, you can organize data transport between storage networks.
These solutions were a response not only to the shortcomings of DAS and NAS systems, but, more importantly, to the problems of communication channel congestion and delay in local IP networks (10/100-Mbps). The SAN concept was first proposed in 1998. Like many other modern computer technologies, it was borrowed from the world of mainframe computers, where it was used, for example, in data centers to connect computers to storage systems and distributed networks.
Tiered Storage
Data multy tiering (Data multy tiering) can be considered as one of the components of the broader long-standing concept of memory virtualization.
The term virtual in relation to memory and DSS originated in 1959 to refer to inherently virtual external memory on disks, used to expand internal memory, which at that time was assembled from magnetic cores. By definition, she was very small, but at the same time extremely expensive. Small and expensive memory was replaced by a processor-transparent way of cheaper disk memory of incomparably larger size. Modern storage systems are more accurate about integrating storage, replacing physical addresses and device numbers with logical addresses and logical device numbers, and more efficient management methods.
The emergence of SSDs gave a new impetus to the continuation of virtualization work, the current stage is called Automated Tiered Storage (AST), it automatically executes DataTiering procedures, that is, moving data across storage tiers.
The emergence of SSDs gave a new impetus to the continuation of virtualization work, the current stage is called Automated Tiered Storage (AST), it automatically executes DataTiering procedures, that is, moving data across storage tiers.
The need to migrate data is related to the nature of the data. The curve of the distribution of the number of accesses to data over time resembles a Gaussian curve - the number of accesses to fresh data requiring quick access is small, as the data ages, it increases, and then falls to archived data on slow devices, the number of accesses is significantly less than the peak. This property of data prompts the creation of tiered DSS, at the current level of technology development, a 4-level model can be implemented: at level 0 - SSD, they store the most demanded data; Level 1 - Fast SAS drives on level 2 - slow SAS or SANA drives, on level 3 - tapes. The previously adopted three-level scheme of SAS, SATA and tape drives is outdated.
AST can be considered a development of the previously known hierarchical storage management Hierarchical Storage Management (HSM), created in 1974 for the IBM 3850 Disk Library, which together with disks for the first time made it possible to form a single data space. Perhaps the use of the new name reflects the acceleration of migration processes to the level of real time, which allows the use of SSD.
AST is the process of permanently moving data between different cost devices according to the "temperature" of the data: the hotter the data, the more expensive and accordingly faster the device can be, that is, the SSD, and cold data can be moved to tape. To do this, AST periodically looks at the data according to the specified algorithms and moves, guided by the temperature.
Distinguish between AST features with the role played by the NVMe flash cache. The principle of cache operation is simpler than AST, any cache is a tool, a fragment from slower memory is copied into it for a while. Cache - a simple accelerator, AST - optimizes the use of DSS resources.
Distinguish between AST features with the role played by the NVMe flash cache. The principle of cache operation is simpler than AST, any cache is a tool, a fragment from slower memory is copied into it for a while. Cache - a simple accelerator, AST - optimizes the use of DSS resources.
Working with corporate data is one of the most important components of digital changes in companies. This work requires effective tools that support the interface between hierarchically organized tiered storage systems, analytics, and other technologies that directly serve business goals. This interface enables you to transform passively stored data into a critical asset for your enterprise, enabling you to extract decision-making knowledge from your data. With the increasing volume of data and the emergence of big data, the importance of the relationship between data and business increases many times over.
Responding to requests from modern businesses, CROC has proposed its own Smart Data Storage concept, according to which their storage is organized taking into account its further use and the possibility of extracting the maximum useful information from them. Implementing Smart Storage enables you to gain business benefits by leveraging enterprise information more efficiently. The technologies embedded in the Smart Storage concept extend to both structured data stored in relational DBMSs and rapidly increasing volumes of unstructured data. With Dell partner EMC, CROC provides the ability to build a powerful storage infrastructure based on Dell's EMC Unity array line. With flexibility and ease of management, you can easily combine cloud environments, all flash capabilities, and hybrid DSS to take you to the next level of digital transformation.
"Smart data storage" improves the cost-effectiveness of working with information by distributing it to storage, based on demand, while maintaining the availability of data for analytical systems. In addition, support for workflows with Smart Storage improves their reliability because data is stored and processed in a shared and crash-proof environment. Move to a new approach to storage with Dell EMC technologies as quickly and cost-effectively as possible with Hardware as a Service.
Read more about DSS evolution here.
Software and hardware RAID
All existing DSS ones are divided into using hardware RAID and specialized ON for calculating RAID - software RAID[8] systems are more economical. Now many processing tasks data storage and are much more efficiently solved in DSS with software RAID. For example - backing up system disks and virtual machines, storing and processing video, working with large files in document management systems.
After the leadership of software RAID in the early nineties, it was replaced by hardware, and until recently it was he who dominated the DSS market. Software RAID was assigned the role of inexpensive amateur and home storage systems. Now there is a task class that quite enough software RAID, provided directly by Windows, Unix and others. DSS with software RAID from the entry-level category have entered the corporate market.
Software RAID Components
The development of DSS with software RAID is largely determined by companies that produce standard components: processors with new built-in commands, switches and baskets that support more efficient data transfer protocols. New generation server components and their attractive price, innovative calculation algorithms - all this allowed DSS with software RAID to surpass the characteristics of analogues with hardware RAID.
Software RAID DSS manufacturers use the power of a new generation of hardware components and are one to two years ahead of hardware RAID manufacturers in terms of the timing of new models. While hardware RAID manufacturers need to upgrade the production process, it is enough to test a new basket or processor to DSS with software RAID - and the new model is ready for delivery.
The advantages of software RAID include high performance on the x86-64 platform, inexpensive, affordable and interchangeable server components, as well as attractive processing and storage costs. At the same time, the cost of upgrading the system will be quite low due to the component-by-component update of hardware and software, as well as their significantly large functionality. Software RAID allows encryption at the processor code level, such as Intel Core i7). Such systems have increased fault tolerance N + 2 and even N + 3.
A number of factors indicate the interest of Russian consumers in DSS based on software RAID. Large Russian integrators have included software RAID-based storage systems in their offerings. In the price lists of Russian server and storage assemblers, such systems occupy approximately 20-30%. Data center owners place resources on software RAID in accordance with tiered storage practices (see below).
Global DSS Market
Main Article: DSS (Global)
Russian DSS Market
Шаблон:Main 'Russian DSS Market
Over the past few years, the Russian DSS market has been developing and growing successfully. Thus, at the end of 2010, the revenue of storage system manufacturers sold on the Russian market exceeded $65 million, which is 25% more than in the second quarter of the same year and 59% more than in 2009. The total capacity of the DSS was approximately 18 thousand terabytes, which is an indicator of growth of more than 150% per year.
The Russian storage market is developing extremely dynamically due to the fact that it is still very young. The lack of legacy equipment does not significantly affect it, since due to explosive data growth, old systems simply do not meet customer requirements and "wash out" much faster than, for example, ancient servers and workstations.
The rapid growth of data volumes is increasingly forcing domestic companies to purchase external disk storage systems. This is largely facilitated by the traditional tendency to reduce the cost of IT components. If previously external DSS perceived only as an attribute of large organizations, now the need for these systems is not rejected even by[9]
Data Warehousing Project Milestones
A data store is a very complex object. For 2011, DSS consumption becomes an integral part of the implementation of integrated infrastructure solutions. As a rule, we are talking about impressive investments for 3-5 years, and customers expect that during the entire service life the system will fully meet the requirements of the business.
Next, you need to have technologies for creating data warehouses. If you have started creating a repository and are developing a logical model for it, then you must have a dictionary that defines all the basic concepts. Even common concepts such as "client" and "product" have hundreds of definitions. Only once you have an idea of what certain terms mean in your organization can you identify the sources of data you need to load into the vault.
You can now start creating a logical data model. This is a critical phase of the project. It is necessary from all participants in the data warehouse project to obtain agreement on the relevance of this model. Upon completion of this work, it becomes clear what the client really needs. And only then does it make sense to talk about technological aspects, for example, the size of the storage. The client comes face to face with a giant data model that contains thousands of attributes and links.
Keep in mind that the data warehouse should not be a toy for the IT department and a cost to the business. And first of all, the data warehouse should help customers solve their most critical problems. For example, help telecom companies prevent customers from leaking. To solve the problem, you need to fill in certain fragments of a large data model, and then help you select applications to help solve this problem. These can be very simple applications, say Excel. The first step is to try to solve the main problem with these tools. Trying to populate the entire model at once, using all data sources will be a big mistake. The data in the sources must be carefully analyzed in order to ensure their quality. After successfully solving one or two issues of primary importance, during which the quality of the necessary data sources is ensured, you can begin to solve the following problems by gradually filling in other fragments of the data model, as well as using previously filled fragments.
The TAdviser catalog lists a number of Russian companies related to the supply and implementation of DSS and the provision of related services. At the same time, it should be understood that in a number of large projects, some vendors can participate directly, first of all, HP and IBM. In this case, some customers feel more confident, relying entirely on the service support of the world's leading manufacturers. Of course, the cost of ownership in this case increases markedly.
Trends and prospects
2024: Prospects and challenges in data storage in Russia by 2030
Every year, data volumes continue to grow, and by 2025 the world will generate up to 160 zettabytes of information annually. But what awaits us in 2030? What will storage look like in the future? What technologies and approaches will be key to business? Anton Aplemakh, an expert in enterprise data storage, presented his vision for the future of data storage in Russia in October 2024 and shared with TAdviser his opinion on decentralization, quantum technologies and ways to optimize data storage in an avalanche-like growth of information. The expert spoke about this in October 2024. Read more here.
2020: Western Digital: Five storage trends that will drive industry development
On April 21, 2020, the company Western Digital shared TAdviser with an overview of global trends in the field, data storage which, according to the company, should be paid attention to in 2020. According to Darrah O'Toole, WesternDigital's senior to marketing product EMEA data DSS manager for the region, trends will shape the industry's development in 2020 and beyond.
1). The number of local data centers will increase, new architectures will appear
According to the company, although the pace of transition to the cloud does not decrease, two factors can be distinguished that support the further growth of local (or micro-) data centers. First, updated storage regulations are still on the agenda. Many countries pass data retention laws, so companies are forced to keep their data away in order to properly assess and mitigate the potential risks associated with securing and maintaining the privacy of retained data. Secondly, there is the repatriation of clouds. Large companies strive to keep their data in ownership and by leasing the cloud can reduce costs and, at their discretion, control various parameters, including means of protection, delay and access to data; This approach leads to increased demand for local DSS.
In addition, architectures will appear to process an increasing volume and variety of data. DPC In the era zettabytes , due to the increasing volume and complexity of work tasks, applications and/- AIIoT data sets, the architecture of the storage infrastructure will have to change. The updated logical structures will consist of several levels of SDH optimized for different work tasks, in addition, the approach to the system will change. ON The source code ZonedStorage Open Zonal Storage Initiative will help customers fully unlock the potential of managing zoned block storage devices on both (HDD tile SMR magnetic recording) and SSD ZNS storage devices for sequential write and read-dominated work tasks. This unified approach allows you to manage naturally serialized data with scaling and provides predictable performance.
2). Standardize AI for easier peripheral deployments
Analytics is a good competitive advantage, but the amount of data that companies collect and process for the sake of insights is simply too much. Therefore, for April 2020, in an existing world, where everything is connected to everything, the performance of certain work tasks shifts to the periphery, which makes it necessary to teach these tiny endpoints to launch and analyze an increasing amount of data. Due to the small dimensions of such devices and the need to quickly put them into operation, they will evolve towards greater standardization and compatibility.
3). Data devices are expected to break down into tiers, with media and factory innovations gaining momentum rather than dwindling
The steady exabyte growth of read-dominated applications in the data center will continue and lead to demands on storage-tier performance, capacity, and economic profitability as companies increasingly differentiate services delivered through their storage infrastructure. To meet these requirements, data center architectures will further gravitate toward a storage model that allows it to be provided and accessed on top of a fabric with a core storage platform and devices that implement a set of service level agreements SLA (SLAs) to meet specific application requirements. The number of SSDs for fast data processing is expected to increase and, at the same time, continue the relentless demand for exabytes of economically cost-effective scalable storage, which will continue to maintain stable growth in the capacity of the fleet of corporate HDD storage devices. big data
4). Factories as a solution to unify shared storage access
Against the background of exponential data growth, further diversification of workloads and requirements To IT infrastructure , companies must offer customers faster and more flexible solutions, while simultaneously reducing the time to market. - Ethernet Factworks become the "universal backplane" of the data center, unifying the processes of sharing, filling and managing with scaling in order to meet the needs arising from an even greater variety of applications and work tasks. The built-in infrastructure is an architectural approach that uses the NVMe-over-Fabric extension to dramatically improve the utilization, performance, and flexibility of computing power and DSS in the data center. It allows you to disaggregate storage from computing systems, allowing applications to use a shared storage pool, while data can be easily shared by applications, and the required power can be dynamically allocated to an application, regardless of location. In 2020, composable disaggregated DSS solutions will be further distributed, which are effectively scaled over Ethernet factories and reveal the full operational potential of NVMe devices for a wide variety of data center applications.
5). HDD drives for data centers will continue to develop at a high pace
Despite the fact that for several years many have been predicting a decline in the popularity of HDD drives, for April 2020 there is simply no adequate replacement for corporate HDDs, because they not only meet the needs associated with data growth as before, but also show economic efficiency in terms of total cost of ownership (TCO) when scaling for hyperscale data centers.
As the analytical company TRENDFOCUS notes in its report "Cloud, Hyperscale and Corporate DSS" (Cloud, Hyperscale, andEnterpriseStorageService), corporate HDD drives are in steady demand: an exabyte of devices will be launched on the market for corporate needs, and annual growth for five calendar years from 2018 to 2023 will be 36%. Moreover, according to IDC, 103 Zbytes of data will be generated in 2023, 12 Zbytes will be saved, of which 60% will go to the main/peripheral data centers. Driven by the insatiable growth of data generated by both humans and machines, this fundamental technology will face other data placement techniques, higher write density, innovation in mechanics, smart data storage, and material inventions. All this in the foreseeable future will lead to increased capacity and optimization of total cost of ownership (TCO) when scaled.
Given their fundamental role in warehousing and managing data of critical importance to companies, HDD and flash technologies will remain one of the fundamental pillars of successful and secure business operations, regardless of the size of the organization, its type, or the industry in which it operates. Investing in a comprehensive storage infrastructure will make it easier for companies to strengthen their position and cope with data growth for many years without worrying that the system they build will not cope with the load associated with the implementation of modern and high-tech business processes.
2018
The No. 1 problem for most large enterprise customers today is the heterogeneous storage infrastructure DSS of organizations often have to support dozens of DSS of different classes and generations from different manufacturers, since different applications have different storage requirements. Thus, critical transactional systems (billing, processing, ERP, etc.) require high reliability and performance inherent in the DSS of the upper price segment. Analytical systems require high performance and low cost per storage unit, so they reserve DSS with solid state drives (SSDs). And, for example, working with files requires functionality and low cost, so traditional disk arrays are used here. In a heterogeneous infrastructure, the level of DSS utilization is low, the total cost of ownership (TCO) is prohibitive, manageability is weak, and the complexity of such a storage infrastructure is usually [10]the technologies [11]
Another major challenge is DSS upgrades. Often, DSS purchased three to five years ago can no longer cope with the growing volumes of data and the requirements for the speed of access to them, so a new system is purchased, to which data is transferred from the previous one. As a matter of fact, customers re-pay for the amount of storage required to host the data, and also incur the cost of installing and migrating the data to the new DSS. At the same time, previous DSS, as a rule, are not yet so outdated as to abandon them completely, so customers are trying to adapt them to other tasks.
2009
Rapid evolution annually makes major changes to the main trends in the development of the DSS. So, in 2009, the ability to economically allocate resources (Thin Provisioning) was put at the forefront, the last few years have been held under the sign of DSS in the "clouds." The range of offered systems is diverse: a huge number of presented models, various options and combinations of solutions from entry level to Hi-End class, turnkey solutions and component assembly using the most modern filling, software and hardware solutions from Russian manufacturers.
The desire to reduce IT infrastructure costs requires a constant balance between the cost of DSS resources and the value of the data that is currently stored on them. In order to decide how best to place resources on software and hardware, data center professionals are guided not only by ILM and DLM approaches, but also by tiered storage practices. Specific metrics are assigned to each unit of information to be processed and stored. These include the degree of availability (speed of information provision), the importance (cost of data loss in the event of a hardware and software failure), the period through which the information goes to the next stage.
Example of how to separate storage systems according to storage and information processing requirements using tiered storage techniques.
At the same time, the performance requirements of transactional systems have increased, which implies an increase in the number of disks in the system and, accordingly, the choice of higher-class DSS. In response to this call, manufacturers have provided storage systems with new solid-state drives that exceed the previous performance by more than 500 times on "short" read-write operations (characteristic of transactional systems).
The popularization of the cloud paradigm has contributed to increased DSS performance and reliability requirements, since in the event of a data failure or loss, not one or two directly connected servers will suffer - there will be a denial of service for all cloud users. Due to the same paradigm, there was a tendency to unite devices from different manufacturers into a federation. It creates a pooled pool of resources that are provided on demand with the ability to dynamically move applications and data between geographically dispersed sites and service providers.
A certain shift was noted in 2011 in the field of management of'Big Data'. Previously, such projects were at the stage of discussion, and now they have entered the stage of implementation, having gone all the way from sale to implementation.
There is a breakthrough in the market that has already happened in the server market, and perhaps in 2012 we will see in the mass segment of DSS of supports deduplication and Over Subscribing technology. As a result, as with server virtualization, this will ensure a large-scale utilization of DSS capacity.
Further development of storage optimization will be to improve data compression techniques. For unstructured data, which accounts for 80% of the total volume, the compression ratio can reach several orders of magnitude. This will significantly reduce the unit cost of storing data for modern SSD media, providing maximum performance.
See also
Sources
- HP SAN DESIGN GUIDE
- Andrey Zakharov, Key Storage Systems and Features
- SANs
- NAS - Networked Storage
- Journal of Upgrade4_08_05
Notes
- ↑ [http://www.osp.ru/cw/2011/07/13007618/ in the Data
- ↑ and generate revenue.]
- ↑ , etc. CNews Round Table: "Storage Market: Realities and Prospects."
- ↑ [http://www.timcompany.ru/article47.html DSS
- ↑ : trends and prospects]
- ↑ Modern storage systems
- ↑ [http://sss.incom.ua/content/view/371402/19/ (DSS
- ↑ DSS in Russia: software RAID is returned. The latter
- ↑ small DSS companies for "middle peasants."
- ↑ great. Pure Storage in June 2018 told the participants and guests of the Infrastructure 2018 forum about how
- ↑ built into Pure arrays provide a unique combination of high reliability, performance and cost-effectiveness.. DSS