[an error occurred while processing the directive]
RSS
Логотип
Баннер в шапке 1
Баннер в шапке 2
2017/12/19 11:33:06

Accelerators of calculations: graphic processors had a serious alternative

The linking of CPU Intel Xeon used for acceleration of calculations with GPU from Nvidia at the beginning of the 21st century began to be considered as the standard "de facto". But time goes and emergence of the gate arrays of FPGA (Field-Programmable Gate Frray) programmed by the user as an alternative of GPU became one of several signs of the changing situation.

FPGA is much more senior than GPU, this segment was formed in the late eighties of the XX century and was almost monopolized by four producers. Among them two recognized leaders are the Altera company founded in 1983 and in 2015 the purchased Intel, and Xilinx founded for two years later. As of 2017 they possess 31% and 36% of the market of FPGA respectively. There are two more large vendors - Microsemi and Actel purchased by it in 2010.

Ross Freeman, the inventor of FPGA (on the right in the picture) - the author of the ideas of programming of finished products and the semiconductor company which does not have own production (fabless)


Historically emergence of FPGA was preceded by programmable logical arrays of PLA (Programmable Logic Arrays) and the device on difficult programmable arrays of CPLD (Complex Programmable Logic Devices). Difference of FPGA from PLA and CPLD both quantitative, and qualitative - they had no sufficient flexibility.

Traditional arrays of FPGA consist of a set enough simple blocks connected among themselves capable to execute these or those logical actions (we will tell, AND and XOR), and the routing channels which perform communication between blocks. Routing of channels and blocks are programmable, programming in this context comes down to creation of the truth diagrams LUT for blocks and routes along which blocks exchange data.

In canonical option of FPGA consist of millions of identical blocks which status is reflected in the tables LUT (Lookup Table). LUT is placed in a small programmable fragment of memory where logic function of the block registers. The controller connects blocks and routing in a single system.

Fig. 1 FPGA and LUT Blocks with three inputs


In process of development of FPGA the architecture of blocks becomes complicated, increases quantity of inputs to 8 and more that allows to program more difficult logic. In the newest FPGA blocks are even more difficult, they implement not simple logic, but some specialized functions. Such difficult blocks call slice, i.e. a slice or a cut. Blocks of this type work quicker, than assembly of their simplest sides. As examples of specialized blocks it is possible to call multiplicators or the signal processors DSP. If the multiplication of 32-bit numbers implemented on simple blocks demands accomplishment about 2000 transactions, then the specialized block for this purpose will need only one transaction. The nomenclature of specialized cuts constantly extends.


Fig. 2 Specialized FPGA blocks


FPGA on an equal basis with GPU can be used for acceleration of image processing, cloud computing, broadband communications, Big Data, robots and others. It is possible to find many materials opposing GPU and FPGA in network. Results of numerous comparative testings are published, but it is difficult to be sure that the received results are absolutely objective.

As accelerators both at GPU, and at FPGA is own advantages and shortcomings having historical and architectural roots. Graphic processors it is from game computers, they being consumer goods are issued long ago and big circulations that provides rather low price. Programmable arrays were used most often in military applications where the price - not the main criterion. But FPGA use transistor weight more effectively. From this it follows that advantage of GPU in higher performance in terms of the price, but FPGA have significantly the best power efficiency.

Objective comparison of GPU and FPGA on performance is still complicated by the fact that they differ by the nature and for them different tests are used, and results are expressed in case of GPU in well-known flopsa while for FPGA in less known maksa (MACS, Multiply-Accumulate Operations per Second). Wider, complex assessment by nine parameters (fig. 4) gives the score 5:4 for benefit of GPU, but this account can be interpreted differently.

The fig. 3 Comparison of Efficiency of GPU and FPGA on the specific productivity of calculation per watt and calculations for euro


The fig. 4 Comparison of Efficiency of GPU and FPGA by 9 parameters


Until sufficient experience is accumulated, in estimates it is necessary to be limited to speculative comparisons. Hardware implementation of algorithms is among pluses of FPGA, they quicker also have a smaller delay time, it is measured by nanoseconds, but not microseconds, as in case of GPU. In GPU the traditional cycle of algorithms execution – selection of a command and data from memory, setting in queue, accomplishment and return of data to memory is applied. On the party of FPGA significantly the smaller physical sizes, energy consumption and a bigger variety of interfaces.

As the advantage of GPU serves the big flexibility, the best fitness to execution of operations with a floating point, support of old versions of software.

However, despite the absence of the strict argumentation, Microsoft and Intel give preferences to FPGA. It is possible that the overweight on a bowl of FPGA is caused by the significant reduction in cost of these devices observed recently. FPGA become available to installation in serial products. Earlier because of the high cost of FPGA were used only when developing the new systems, at a prototyping stage then the algorithms implemented in them transferred to the dedicated integrated circuits ASIC (Application Specific Integrated Circuit). In serial products already set ASIC, but at the same time they lost a possibility of reprogramming "in the field". So was earlier, but as of 2017 there are no reasons not to complete also serial devices with the modules FPGA. It is easy to provide that will follow from a possibility of preserving of reprogramming, for example, in the computers focused on machine learning and other similar applications.

Realizing FPGA value, the Intel corporation in 2015 purchased Altera company then its vision of processors began to look approximately as it is shown in fig. 5. Though in 2017 there was a new processor family of Xeon Scalable and the Purley platform, and the bus QPI was replaced by the new version of UPI. Besides there was a new memory, but the essence remained invariable – there is CPU and there are FPGA serving accelerators.


Rice 5. Intel processors with ukoritel on FPGA


The special attachment to FPGA is shown by Microsoft where since 2010 the project with the ambitious name Catapult is developed. A target of this throwing shell is creation of the configured cloud (Configurable Cloud), and as means for achievement of this purpose serves the hyper scalable accelerating structure (hyperscale acceleration fabric). Fabric, in this case, is translated as structure, but not factory.

FPGA in Configurable Cloud are used in two qualities. The first – as channel accelerators, what is called bump-in-the-wire placed between NIC network interface cards switches in a rack (top-of-rack switch, ToR). In the second quality of FPGA serve as accelerators in the servers intended for a certain class of tasks, such as bioinformation science, ranging searching, deep training or heavy calculations. For them identical servers, but with different programs (firmwares) in FPGA are used.

Fig. 6 Configurable Cloud from Microsoft


Unlike Altera which merged with Intel, the Xilinx company takes an independent position, she delivers to large giperskeyler (Huawei, Baidu) the stack of products (FPGA-powered Xilinx Reconfigurable Acceleration Stack) supporting the services which received the name FPGA-as-a-Service.

The fig. 7 Place of Xilinx Reconfigurable Acceleration Stack in a quadrant with axes of "Application breadh" (a variety of applications) and "Accelerator utilization" (use of the accelerator)

Services include libraries, constructions for integration (framework integrations), tools for developers (developer boards) and OpenStack support.