RSS
Логотип
Баннер в шапке 1
Баннер в шапке 2
2011/06/20 00:30:41

Deduplication of data

Almost for three years deduplication was one of the most discussed technologies in the industry of the storage systems (SS). During this time it caused marketing wars, became the reason of industry consolidation, and generated criticism and disputes from suppliers. As a rule, IT administrators of most data centers of the average level have in the subordination narrow personnel and few specialists in oblaasti backup, and it is difficult to provide as in such situation it would be possible to use technology of deduplication. Some important issues on which IT administrators need to receive the answer are included below before unrolling technology of deduplication in data center of the average level.

Content

What is deduplication of data?

Deduplication is a technology of search of the repeating data at the level of the file, and replacement with their corresponding pointer. It can be used for reduction of disk space and also the band of transmission necessary for data transmission.

There are several different and acceptable methods of accomplishment of deduplication — and though it, in the majority of cases, is executed at the level of data units, some solutions are able to detect differences between files at the level even of one byte. Different methods can have restrictions on performance, the extent of RAM, a software maintenance (software) and also simplicity of setup of replication.

Whether deduplication is popular technology?

Deduplication finally passed from category of popular technologies, experimental in category. According to analysts, today in the west of more than 30% of IT departments apply it, on a last resort, to a part of the data. Now in the market the products and solutions having for themselves already couple of generations which are already optimized for the simplified installation, without violation of work of other applications are proposed.

Nevertheless, it does not mean that all solutions at different producers are identical. Most of suppliers of technologies of deduplication pass a stage of accumulation of technical experience therefore it is desirable at assessment of solutions to learn skill level of the company, to require the recommendation and to learn about technical support.

For what solution of problems deduplication best of all is suitable?

The most widespread scope of deduplication is data backup. It is natural because backup copies can contain more repetitions, than in any other data arrays, and besides, they are longer stored. High speeds of deduplication are profitable in case of the majority of the general types of office data — including e-mails, databases and flat files.

Recently the Quantum company conducted survey of users of the DXi-Series devices for receiving quantitative data by efficiency of inclusion of deduplication in the user backup strategies. In comparison with standard DWH, backup speed on average increased by 125%, the number of unsuccessful cases of backup decreased by 87% and also huge changes of profiles of recovery are mentioned — procedures of recovery on which before use of deduplication several hours or days were required, usually managed to be reduced to minutes. Costs also often significantly decreased. According to messages of users, the total costs of removable mediums fell on average to a half, expenses on carriers using magnetic tapes were cut down for 97%, and time required for management of backup copies was reduced by 63%.

Users who use remote replication for protection against data loss during accidents and natural disasters observe growth of quantity of recovery points one system which automate remote replication and eliminate need for use of magnetic tapes and their administration for small offices.

Whether matters what software of backup is used?

Most of suppliers of technologies of deduplication tested the systems on different programs of backup and achieved quite good results. Some suppliers can even turn on in a storage system support more than one software of backup. Therefore it is worth learning from the supplier of technology of deduplication what software of backup is supported. Surely make sure that the softwares specialized interfaces of backup are supported. For example, Symantec, developed the OpenStorage interface which gives to backup systems additional working benefits, such as the increased performance, the improved management of replication, and even direct, autonomous creation of backup copies. Learn from system providers of deduplication about their strategic vendor relations of the systems of backup. You should understand, how closely they cooperate and also to learn their plans for interaction and integration in the future.

What easiest way of implementation of deduplication?

Most IT departments face the choice – or to install specialized devices of deduplication, or to execute deduplication using the software for backup. On a question of what approach simpler is not present definite answer. Nevertheless, it is possible to make several recommendations.

When using specialized devices of deduplication, it is the most widespread method of deduplication now, all backup copies of data go the specialized server, and deduplication is executed there. In this case, users can replace or supplement the installed direct systems of backup with the minimum changes of the general methodology of reserve copying. As deduplication is executed on the specialized device, it does not lead to increase in loading on clients of backup or media servers, and it simplifies accomplishment of such transactions as replication. This method is not only the most widespread, but also the most developed. Its use means faster installation and smaller requirements to service maintenance.

When using programs, the backup system includes deduplication in the list of other tasks which are carried out or on clients of backup, or on media servers. Accomplishment of deduplication of data before their sending for a direct system allows to reduce the amount of data which need to be transferred on network. This idea is similar to accomplishment of data compression in the program, and, by the way, deduplication of data almost always includes data compression. As deduplication is rather resource-intensive transaction, there is a probability of deceleration of transactions of reserve copying therefore adding of new servers or specialized DWH can be required. It can increase the cost of a system and complexity of integration.

In certain circumstances any of the approaches described above can approach. To decide what of them approaches in your situation better, define critical elements of the system, utilization coefficient of the media of servers and also integration level which will be justified in this situation.

Whether it is worth refusing in general use of drives on magnetic tapes

Though most of end users who apply technologies of deduplication reduce utilization coefficient of the removable mediums, very few of them refuse them completely. And for this purpose there is quite serious reason. Usually, needs of users for reserve copying can be separated into three levels: daily backup and recovery, short-term protection against data loss on a case of accidents or natural disasters, and long-term data storage. It makes sense to use different technologies at each level.

Daily backup and recovery: at many users profiles of a read and write of disks allow them to execute daily backup and recovery. The technology of deduplication allows them to store longer data on a disk, thereby giving the chance several times to use these profiles for data recovery.

Short-term protection against data loss on a case of accidents or natural disasters: replication features, included in technology of deduplication, allows the users having several platforms to replace removable mediums with technology of remote replication with a case of accidents or natural disasters. As a result, they receive more recovery points, reduce costs, and automate transactions which the majority is forced to do manually.

Long-term data storage: removable mediums continue to remain the economic and safe solution. They consume less electrical energy, take less places, and require less cooling in any DWH that does them by preferable means of long-term data storage. New technologies using drives on magnetic tapes (NML), including enciphering and the analysis of integrity of information, made them safer and reliable.

Source