RSS
Логотип
Баннер в шапке 1
Баннер в шапке 2
2018/09/19 13:07:39

Tectonic shift: Intel developed new architecture which will replace h86

The computer industry slowly but surely approaches that historical moment when the period of unconditional domination of architecture ends h86. It suppressed everything so that till 2018 in addition to x86 from all former processor variety lived only three RISC architecture - ARM in mobile devices, Power in servers IBM and SPARC in products Fujitsu, and the last two have limited distribution. How Intel using new architecture is going to reach multiple performance improvement of processors – in article prepared especially for TAdviser by the journalist Leonid Chernyak.

The surprising longevity h86 was maintained by two circumstances: the remaining growth of density of transistors on a crystal under Moore's law and permanent hi-tech tuning of the architecture which is initially created for the PC. But sooner or later to all the end comes. In a case h86 it is caused by emergence of new types of loadings, such as machine learning and other of area of the artificial intelligence (AI), work with large volumes of data and high-performance computing. Systems based on h86 are too difficult and ineffective from the power point of view.

The end will come not at once, it can stretch for many years. It is possible that it will not mention a number of applications, for example, of PCs with which x86 era began. As for machine learning, at a stage of a training of deep neural networks the architecture of x86 already showed the insolvency therefore as temporary replacement (in English stub, a patch) graphic processors for universal calculations of GPGPU use. More than one hundred companies from the beginning startups to the largest vendors are engaged with search of an alternative of CPU with von Neumann architecture in anticipation of the future tectonic motions, among them, naturally and Intel.

In 2015 the corporation purchased for $17 billion Altera company, the second most important producer of programmable arrays of FPGA. She made it counting on use of Altera technologies in the reconfigurable processors intended first of all for machine learning. At the end of 2017 Intel told of the neyromorfny Loihi processor which is specially intended for modeling of brain processes. The Loihi processor is constructed on the asynchronous scheme[1]. What is extremely interesting, it consists of 130,000 neurons and 130 million synapses tying them capable to form space, hierarchical and recurrent topology in which each neuron can be tied with thousands of other neurons. It is planned that in 2018 Loihi processors will be transferred to university research laboratories.

Intel created the neyromorfny Loihi processor for modeling of brain processes


Awareness of limitation of opportunities of x86 brought Intel to the decision on need to considerably change technical policy when developing computers with a performance over one exabyte. Aurora A21 should become the first of them, it will be followed by Frontier and El Capitan. All three will be set in the national laboratories conducting the researches connected with use of nuclear energy.

Here what was told about it in 2015 by Al Gara leading creation of ekzaflopny computers in Intel:

File:Aquote1.png
We consider Aurora as a rotary point, and not only in relation to HPC (High Performance Computing) and High Performance Technical Computing (HPTC). The got experience can be extended to the systems of small and medium scale, and in the long-term it will have an impact on all computing.
File:Aquote2.png

It and similar promises drew to themselves attention of the serious computer public and the press. Besides the doubt about why in recent years Intel has no noticeable impact on the highest positions in TOP-500 list of supercomputers of the world disappeared.

Al Gara


The Intel corporation obviously does not aim to participate in a race for the quantity of cores calculated by millions and for the consumption power changed by tens of megawatts. Going the way, she adopted the Exascale Computing Project (ECP) program which is a part of the national National Strategic Computing initiative program adopted in 2015. ECP as the comprehensive program includes development of new architecture, new hardware, system and the application software and also training of specialists. As a result of implementation of these measures for the next decade tenfold advantage in performance in comparison with a traditional evolutionary way should be reached.

The Exascale Computing Project (ECP) program should promote achievement of tenfold advantage in performance in comparison with a traditional evolutionary way


In March, 2017 in the presentation made by Paul Messina, the head of ECP disclosed some details[2]. He told that ECP gives a rare opportunity for improvement of all ecosystem in general thanks to the fact that all its components are again projected.

ECP gives an opportunity for improvement of all ecosystem, thanks to design of all components again


Then the general view of design of a computing node of the future consisting of three-dimensional memory 3D Stacked Memory, Thin Cores/Accelerators connected with an array of accelerators on basic (thin) kernels, and a conventional memory of DRAM supporting normal (thick) cores was published. The central part is the array collected from the thin main cores differently called by processor elements here (Processing Element, PE). Looking at this scheme, the words of professor of the Massachusetts Institute of Technology Anand Agarval who told are remembered: "The processor is a transistor of the present".

General view of design of a computing node of the future


Within ECP the Intel corporation conducts development of the new architectural concept of CSA (Configurable Spatial Accelerator) implementing this design. The name can be translated as "the configured space accelerator". But the word the accelerator should not mislead. This is not about the accelerator in traditional sense, the serving addition to CPU, not about any coprocessor, and absolutely independent devices for processing of data streams (dataflow engines) collected from PE.

In terms of history CSA have predecessors. Something similar can be detected in the researches executed in University of Edinburgh where CFA (Configurable Flow Accelerator) was created and also in developments of the British company Graphcore, the created IPU (Intelligent Processing Unit), intended for machine learning. In it training and inference are combined.

Unlike experimental CFA and IPU, remaining an object of research, the architecture of CSA is developed under the specific objective set of the Ministry of Defence of the USA. The patent application for CSA was submitted in December, 2016, it was satisfied on July 5, 2018. The public announcement of CSA for the present was not, but on August 3, 2018 on the Nextplatform portal the overview of the patent for CSA was published.

CSA project dvuyedin. In it two ideas intertwining among themselves are implemented: one - a possibility of configuring of a multiprocessor system under a solvable task; another - selection from a task of the dataflow graph (dataflow graph) and its transfer on the prepared configuration. It is difficult to separate them.

There is a question: in programmable arrays of FPGA reconfiguring is executed too, than this process differs from CSA? The matter is that array of FPGA inherently more low-level. As it appears from the name field-programmable gate array - it is the array consisting of the valves programmed in use using such means as Verilog or VHDL. In a CSA element the PE processor serves, and reconfiguring is performed by means of reprogramming of the switches integrating PE. Such scheme it would be possible to call by analogy FPPA from field programmable PE array, having replaced gate with PE. The PE element plays a transistor role here as Anand Agarval predicted. In CSA the multiple model of the processor consisting not of transistors, and processors of the bottom level is for the first time created. Unlike uniform transistors, PE should be integer processors, processors with a floating point or some specialized processors.

And further on a ready configuration dataflow graph c by use of technology of explicit execution the column EDGE (Explicit data graph execution) is imposed. The EDGE technology is known since the beginning two-thousand, it serves for splitting the source code into hyper blocks as the size in several thousands of commands with the subsequent static placement. This procedure reminds assembling with that difference that hyper blocks are executed not consistently, and in parallel. The idea of EDGE could not be embodied up to emergence of CSA architecture. The idea of explicit support of data streams experiences a rebirth now, for example it is implemented in the Microsoft E2 EDGE processor.

Before others processors on the basis of CSA will complete the Aurora supercomputer created for Argonnsky national laboratory of the U.S. Department of Energy with the planned performance of 1.2 exabytes. Its start is planned for 2022.

Notes