Features of planning of DWH for the OpenStack TADetails platform
At design of infrastructure of the corporate/public cloud constructed based on the OpenStack platform there is always a question – what DWH to use? Let's tell about different options of creation of solutions for creation of such DWH which are implemented in the practice by experts of ICL Services.
Content |
Relevance of a problem
Today costs on storage systems (DWH) is an essential part of budgets of many large Russian companies. Despite slowly decreasing data storage cost, their quantity grows in high gear. So according to forecasts of the company Oracle, by 2020 IT departments will be forced to manage the infrastructures having in 50 times more of data in 75 times more of files, but in the presence of only 1.5 times more people.
Achievement of so high rates is impossible without fundamental changes in IT, such as automation, self-service, cloud services. One of the most known and platforms which are quickly gaining popularity for creation of own cloud infrastructure the OpenStack platform implementing the concept of program-controlled DPC is. Such solution allows:
- provide virtual resources upon the demand of the user through a self-service portal;
- scale applications at increase in loading;
- increase the speed of deployment of final applications;
- provide to users more convenient method of data storage (corporate DropBox);
- eventually, considerably to reduce costs for IT.
DWH in OpenStack
Data storage is used in many OpenStack components:
- OpenStack Glance is storage of images of VM
- OpenStack Nova – the ephemeral disks VM
- OpenStack Cinder is permanent disks
- OpenStack Swift is storage of objects
- OpenStack Manila is the file system of shared access
Any guest OS uses the block device for placement of file systems. Are selected "ephemeral" and permanent data storage.
"Ephemeral" disks exist until there is VM (during its removal, the ephemeral disk is removed). These are, as a rule, root disks with the operating system, disks for swapping (swap). At deployment only of OpenStack Nova, users of the platform have no access to any form of permanent storage by default. The "ephemeral" disks connected with VM, from the point of view of the user, disappear when the virtual machine was deleted. Permanent block devices are not removed together with the virtual server and can be connected to other server if necessary. Certainly, transactions are available to creation and removal of such devices and also management of connections. The table gives contrastive analysis of available components of data storage in OpenStack below
Possible scenarios
On the chart survey results of users of OpenStack (October, 2015) are displayed below: what low-level implementations are used by them for storage of permanent disks.
OpenStack Nova: Local file system on hypervisors
The standard and corresponding to OpenStack ideology architecture assumes use of local file system on hypervisors for placement of ephemeral disks of virtual servers (for example, in RDO OpenStack it / var/lib/nova/instances). This approach provides localization of transactions of input-output, high rates of scaling and a possibility of use of QCOW2 Copy on Write of clones for effective use of disk space on the file system (in a clone the real place is taken only by the data units which changed from a basic image). Also the local file system allows to use effectively standard mechanisms of caching of file systems. An obvious lack of approach is the fact that at unavailability of a hypervisor there will be no opportunity to start the copy of the virtual machine on other hypervisor as disks are locally and are unavailable to other hosts.
As this option does not require purchase/creation of the selected storage system, it is possible to tell that it is one of the cheapest methods to store the disks VM for OpenStack.
Experts of ICL Services note that this option of DWH is suitable for placement of VM servicing the distributed applications of cloud type which fault tolerance is provided with means of the application, but not infrastructure, and IOPS, not exacting to high rates. Besides, such type of DWH can be used for short-lived laboratory stands and development environments and testing. The main scenario of use of this option of storage is mass start of a large number (hundreds, thousands and tens of thousands) of the same VM.
When using LVM with OpenStack Nova, ephemeral disks of virtual servers are created locally on hypervisors. The disk looks as a logical volume (logical volume) in the preset disk LVM group (volume group). At start of the copy of VM, OpenStack creates a logical volume and fills in in it data from an image. Advantage of this solution is also the high scalability and localization of transactions of input of an output.
For this option of DWH the scenarios described above are also applicable.
OpenStack Cinder: Linux LVM
The linking of LVM+iSCSI for OpenStack Cinder is reference. Block devices are cut as logical volumes of LVM, and then exported under the iSCSI protocol to a hypervisor which forwards the block device in VM. Certainly, because of local placement of data at unavailability of the note of storage placing the disk Cinder, data will be unavailable. Despite obvious shortcomings, architects of ICL Services note that this scenario best of all is suitable for acquaintance to the OpenStack platform and for implementation of noncritical services.
Its main advantage is cheap implementation of permanent storage. As transport of data is performed on iSCSI, it imposes certain requirements of data networks. It should be noted that this scenario is very just horizontally and vertically scaled by increase in disk group in one host or adding of new hosts in a pool of resources.
OpenStack Nova: file system of shared access
Ephemeral disks are placed on the file system of shared access (most often NFS or GlusterFS). As from the point of view of the platform this scenario is similar to the first (the same directory for placement of disks of servers), here does not require special setup of integration (to mount enough the file system on all hypervisors to the right place), but at the same time provides a possibility of restart of the copy of VM on other host at unavailability of the hypervisor servicing the copy of VM at present.
The NFS protocol is supported by almost all industrial NAS of DWH that allows to integrate easily infrastructure available DWH with OpenStack. Using GlusterFS it is possible to use effectively local disks of hypervisors for creation of volumes of storage of the disks VM. This scenario allows to create the failsafe systems of initial level and to implement mechanisms of ensuring high availability of copies of VM.
The distributed Ceph storage
It is one of the most popular methods of creation of DWH for the OpenStack platform – more than a third of all installations worldwide.
Ceph stores data in the form of objects on local disks of servers, integrating them in pools of data. The fault tolerance is provided with redundancy of storage. Level of fault tolerance is configured, and as a rule, is 3 copies for each object which are distributed on different servers. Ceph has very high level of scaling, but imposes high requirements of capacity of data networks and has some requirements to CPU as the algorithm of pseudorandom distribution of data of CRUSH calculates arrangement of an object at each address. Combination of the client of Ceph (hypervisor) with storage notes is not recommended.
Integration with OpenStack is performed at the level of block devices which are directly supported from a hypervisor of KVM. Ceph provides a possibility of selfrecovery of data when falling a note to a cluster. OpenStack Nova, OpenStack Glance and OpenStack Cinder have the built-in mechanisms of integration with the distributed Ceph storage which can be used both for ephemeral, and for permanent disks.
This scenario is used already for the solution of the broad spectrum of tasks including creation of solutions with providing the high level of availability of VM and also solutions where flexible scaling of performance along with system capacity is required. The most part of inexpensive scenarios of creation of infrastructure uses this approach.
Integration with DWH of the enterprise level
OpenStack Cinder has flexible abilities to integrate with a large number of industrial DWH: products EMC IBM HP , etc. At a request for creation of the disk OpenStack performs required operations on DWH through API integration that leads to creation of the block device on the party of DWH. Thus, OpenStack acts as some kind of high-level layer between the used low-level components and the user. It is possible to provide integration with a large number of DWH, but in a user interface all transactions are unified. OpenStack undertakes all complexity of integration.
This approach allows to use all efficiency and performance of industrial DWH and also to involve FC as a transport network. This option is able to afford to create the failsafe and productive systems.
Object storage OpenStack Swift
Amazon S3 the compatible object storage Swift allows to store objects of users (files, backup copies and tp). Fault tolerance as in Ceph, it is provided with existence of copies of the data distributed according to file systems of different servers. The standard configuration provides use local file systems of notes of storage. Besides, Swift can use as GlusterFS. Integration with OpenStack Glance is possible that it allows to store images of virtual servers in Swift.
Outputs
OpenStack allows to build geographically distributed infrastructure, and also to involve at the same time large number of DWH. As the example, one zone of availability uses Ceph as the basic of DWH, the second – EMC, the third – local file system. The user depending on requirements can select an availability zone that will automatically have influence on what DWH will be used for copies of VM.
Thus, a right choice of a storage system for a cloud platform taking into account all requirements for functionality, performance and scalability and also design and implementation of "turnkey" technical solution is a task of highly skilled experts in the field of cloud computing which solution allows to build the effective platform for business of any scale.