RSS
Логотип
Баннер в шапке 1
Баннер в шапке 2

VMware Tanzu Greenplum

Product
The name of the base system (platform): PostgreSQL DBMS
Developers: VMware
Date of the premiere of the system: 2005
Last Release Date: 2015/10/28
Technology: DBMS

Content

Main article: Database Management System (DMS)

The Greenplum Data Base is a massive parallel processing (MPP) data server database with an architecture specifically designed to manage large-scale analytical data warehouses and business intelligence workloads. MPP refers to systems with two or more processors that communicate to perform an operation, each processor having its own memory, operating system, and disks. Greenplum uses this high-performance system architecture to distribute the load to multi-terabyte data stores and can use all system resources in parallel to process the request.

2022: Greenplum Foundation - PostgreSQL

Greenplum Data Base is based on open source PostgreSQL technology. These are essentially multiple instances of the PostgreSQL database that act together as one cohesive database management system (DMS). Greenplum (GP) is relational DBMSs with a massive parallel processing architecture (Shared Nothing)[1]

2020: Commercialization under the brand name VMware Tanzu Greenplum

In 2020, VMware acquired Pivotal, which has been a Greenplum vendor since 2012. From that moment on, open source MPP-DBMS is commercialized under the VMware Tanzu Greenplum trademark.

2018: Integration with Luxms BI's domestic data visualization and analysis platform

In 2018, the Luxms BI platform was integrated with Greenplum's massively parallel open source DBMS. Connection to the Greenplum DBMS is provided by a high-speed bidirectional FDW connector. Read more here.

2015: Greenplum Source Open

On October 28, 2015, it became known about the opening of the source code of the Greenplum Database (GPDB), declared as a fully functional Open Source data store (warehouse) on the free DBMSostgreSQL platform. [2], was opened[2]

Greenplum is a DBMS created by the company of the same name, which was bought by EMC Corporation in 2010, and in 2013 it was transferred to Pivotal Software.

Pivotal announced the opening of GreenplumDB (GPDB) code in February 2015 and now this has become a reality: the project received its website, the source code was published on GitHub under the free Apache License v2. Greenplum provides powerful and fast analytics across vast amounts of data and, according to the developers, uses "the world's most advanced query optimizer based on estimating their cost."

The basis of GPDB is the free PostgreSQL DBMS. Its functionality is enhanced by:

  • architectures for mass parallel data processing (automatic parallelization of all data and queries)
  • MPP technologies for high performance on a petabyte scale,
  • Innovative query optimizer (its analytical capabilities scale to large datasets without sacrificing performance and bandwidth)
  • polymorphic (column-oriented or row-oriented) storage and data processing,
  • advanced machine learning based on the Apache MADLib library.

The Greenplum cluster consists of a master server, which stores only metadata, and many "segment" servers, where all user data is located. All servers use the same database schema.

2012: Acquisition by Pivotal Corporation

In 2012, Pivotal acquired the EMC Greenplum Community Edition, continuing to develop it under its own brand.

2011

EMC Greenplum Community Edition

In 2011, EMC released the free Greenplum Community Edition for public use.

Free Community Edition of EMC Greenplum Database Mass Concurrency Processing (MPP) DBMS, as well as free analytical algorithms and data mining tools. The announcement of the product was made at the 2011 O'Reilly Strata Conference (February 1-3, 2011) in Santa Clara, pc. California, hosted by Scott Yara, vice president of EMC Data Computing Products Division. Free versions can already be downloaded at: http://community.greenplum.com.

Building on the success of Greenplum's previous advanced data-intensive developments, such as the EMC Greenplum Data Computing Appliance, the new version of the EMC Greenplum Community Edition eliminates the cost barriers that hinder the ability to equip large data arrays with powerful tools for a large number of developers, researchers, and other data professionals. This free toolkit allows the specialist community to not only better understand data, gain a deeper understanding of it, achieve better visualization, but also contribute to the development of next-generation tools and solutions. With the Community Edition software stack, developers can build sophisticated applications to collect, analyze, and leverage large amounts of data at a new level, using best-in-class tools to handle large amounts of data, including Greenplum Database with its excellent analytical processing capabilities.

Free EMC Greenplum Community Edition includes:

  • 1) Greenplum Database CE - industry-leading DBMS with massively parallel processing (MPP) for large-scale analytics and next-generation data warehouses;
  • 2) MADlib - a library of open source analytical algorithms that implement parallel processing calculations in mathematical, statistical and machine learning methods for structured and unstructured data;
  • 3) Alpine Miner is a promising analytical tool of independent manufacturers with an intuitive visual data mining modeling tool that provides the ability to quickly "model to scoring," uses analytics built into the database at a new level and is specially created for applications for working with large amounts of data.

For the community

This initial version of EMC Greenplum Community Edition is designed for both new users and experienced Greenplum customers. First-time users have access to a complete, specialized business intelligence environment that allows them to view, modify, and enhance the data demo files included in the product, enabling them to experiment with analytical tools to work with large amounts of data in Greenplum DBMS. Users already using this product can download an updated version of Greenplum Database CE and analytics tools to integrate with their development environment and research environments.

The Community Edition can be downloaded as a VMWare virtual machine with a configuration already configured and used on laptops or desktops, or as a suite of development packages on user machines. All users can participate in the new Greenplum Community Forums for free - receive support, collaborate with colleagues, publish their ideas and test improvements independently developed by different users.

Product Release Dates

Starting February 1, 2011, the EMC Greenplum Community Edition is available for free download http://community.greenplum.com. Regular Community Edition updates will also be available online. The Community Edition is for experimental purposes, development and research only. Users of the current Single-Node Edition can deploy the new Community Edition in their single-node environment. Before using the code for internal data processing or for any commercial or production purposes, you must buy commercial licenses for Greenplum.

Modular Data Computing Appliance

EMC Greenplum created the Modular Data Computing Appliance (announced in September 2011), which provides the ability to work simultaneously with structured and unstructured large data using both relational processing methods implemented in the parallel Greenplum DBMS and the functions of the Apache Hadoop open source platform. The new Modular DCA devices will include high-performance modules that run the SAS Institute's In-Memory Analytics package, which performs parallel processing of data in RAM. The use of SAS programs allows you to place both structured and unstructured data simultaneously on several cluster nodes. It is the possibility of parallel processing that the company considers the main advantage of Greenplum complexes. The modules are now being tested, and should go on sale by the end of the year. EMC also introduced the Greenplum Analytics Workbench test cluster, consisting of more than 1000 nodes and designed for integration testing of Apache Hadoop programs.

The EMC Greenplum Database uses a parallel architecture based on the division of a full array of data into separate segments that can be handled simultaneously (shared-nothing massively parallel processing, MPP). This architecture was originally designed for business intelligence and analytical data processing on standard equipment. Data segments are automatically distributed among multiple segment servers, each of which owns and manages a separate part of a common data array. This shared-nothing architecture means that all communications are made through a network interconnect, so there are no disk sharing problems or addressing conflicts. More information about Greenplum Database can be found at: www.greenplum.com/products/greenplum-database.

2010: Acquisition by EMC of Greenplum

EMC has taken over Greenplum, continuing to work on the project.[3]

2005: Greenplum Release

The first release of the technology by the company of the same name in California (USA).

Notes

  1. Greenplum DB, the National Electronic Library named after N.E. Bauman.
  2. 2,0 2,1 [http://www.nixp.ru/news/13630.html The source code of the Greenplum database, an advanced warehouse based on PostgreSQL
  3. Greenplum