RSS
Логотип
Баннер в шапке 1
Баннер в шапке 2

RS: Data warehouse

Product
Developers: RedSys (Redsis)
Technology: DWH

Storage of binary objects in tables of a relational database is a reality which often it is necessary to face at maintenance, modification and operation of systems. Most often binary data (the scanned images of documents, photos of goods, etc) are stored in the same tables, as accounting items. So it turns out for various reasons. Sometimes insufficient understanding of effects of such solution, i.e. costs which will arise in connection with such storage becomes the reason that the base for 70% consists of binary objects. Also the explosive growth of amount of transactions (record count) which nobody expected at initial design can be the cause.

But anyway, such storage to become this headache at those who are forced to support and develop a system which consists of huge static content now: cloning of base for detailed analysis or experiments, backup, preparation of the stand for developers — everything turns into the administrator's nightmare.

Even when there is an understanding as well as where such data should be stored, transition to new architecture is often not trivial: most likely change of the applied code will be required that in legacy systems can turn into very long and expensive process requiring a stop and regression tests of all system modules. Besides, it is necessary to notice that transfer of binary objects on the file system will create additional "seam" and a point of failure, i.e. except the database, let slow, at an operation command appears now "undressed with pictures \documents" which though it is rare, but changes, i.e. it it is also necessary to include in backup procedure, in a monitoring system and other.

Some DBMS already solved such problems and propose the solutions on methods of storage and algorithms of processing of binary objects — as in the selected DB specialized areas (a striking example — the Secure Files mechanism in Oracle DBMS), and methods of storage of binary objects in the file system and links to the corresponding files directly in tables.

How to be if you have a PostgreSQL database, and in your system binary objects are stored in the same tables, as accounting items, and alteration of an applied part long difficult and risky transaction? At the same time it would be desirable to receive methods of migration of binary objects directly "on the fly", i.e. at least new objects should remain somewhere in external storage, and the administrator had an opportunity to transfer old objects during a minimum load and if at the same time for binary objects all transactions are also tranzaktsionna...

The solution RedSys"RS: The data warehouse" — uses possibilities of the PostgreSQL expansion by means of external modules (extension), our expansion announces the new access method, implements own index on a basis PostgreSQL hash index. For external storage of big binary objects the file system is used, but also there is an opportunity to save big objects in a distributed file system open source of Ceph.

Ceph is the software-defined distributed file system open source deprived of bottlenecks and uniform points of failure which represents the cluster of the nodes performing different functions which is easily scaled to the petabyte sizes, providing storage and replication of data and also load distribution that guarantees high availability and reliability. A system is free though developers can provide paid support. No special equipment is required.

The RedSys expansion delivers content from the file system or from Ceph in the application which "does not know" that it works with remote storage or with the file system, and works with binary contents as if it was in the database. In PostgreSQL expansion provides the RBYTEA type, by analogy to the standard BYTEA type which is used for storage of binary objects in PostgreSQL. Only in difference from standard type our RBYTEA saves in database blocks only descriptors of the small size (no more than 100 bytes) each such descriptor represents the description of location and a status of a binary object. All transactions with new type of a tranzaktsionna, i.e. a descriptor physically is removed from files of the database together with the data from remote storage or from the file system according to a context of transaction and on rules determined by the PostgreSQL DBMS internal mechanisms. Except type, expansion represents a small set of application functions for viewing values of a descriptor, its creation and filling with binary contents. Now it is possible to provide seamless migration, having created in the table with binary data of a field of "new" type and having defined rules for filling of fields. Besides, the administrator can make the backup copy or clone the database — only descriptors which require significantly less place will be copied.