RSS
Логотип
Баннер в шапке 1
Баннер в шапке 2

Intel Nervana Neural Network Processor

Product
Developers: Intel
Date of the premiere of the system: 2017/10/17
Last Release Date: 2019/11/13
Technology: Processors

Content

2019

Announcement of NNP-T1000 and NNP-I1000

On November 13, 2019 the Intel company provided Intel Nervana Neural Network Processors (NNP) accelerators for training of neural networks (NNP-T1000) and for creation of logical outputs (NNP-I1000). These first dedicated chips (ASIC) of Intel for the solution of difficult tasks of machine learning with good scalability and efficiency are intended for the customers developing cloud computing and data processing centers. Besides, Intel provided generation of visual processors (Vision Processing Unit or VPU) of Intel Movidius Myriad for processing of media and data on peripheral devices, creations of autonomous systems of computer vision and creation of logical outputs.

Intel presented Intel Nervana Neural Network Processors (NNP) for training of neural networks
File:Aquote1.png
Dedicated chips and solutions, such as accelerators of neural networks Intel Nervana and vizalny Movidius Myriad accelerators are necessary for further progress in area of artificial intelligence. Use of more perfect forms of artificial intelligence of system level will help us to pass from data translation to information to transition of information into knowledge.

Naveen Rao, the corporate vice president of Intel, the head of department of Intel on researches in the field of artificial intelligence told
File:Aquote2.png

According to the company, these products strengthen a portfolio of solutions of Intel for work with AI which sales volume in 2019, according to forecasts, will exceed 3.5 billion US dollars.

Accelerators of neural networks Intel Nervana for November, 2019 are already started in production and are delivered to customers.

Accelerators of neural networks Intel Nervana NNP-T help to reach necessary balance between the computing power, communications and memory, providing a possibility of almost linear and energy-efficient scaling from small clusters to the largest supercomputers. Intel Nervana NNP-I accelerators are effective in terms of energy consumption and costs and are ideal for various intensive workloads on creation of logical outputs in real tasks when using various form factors. Both products were developed for accomplishment of tasks which the leading customers in the field of AI, such as Baidu and Facebook face.

File:Aquote1.png
We are glad to work with Intel over deployment of faster and effective calculations based on accelerators of neural networks Intel Nervana (NNP-I) and to expansion of support of our modern compiler of deep learning Glow which we will combine with NNP-I now.

Mischa Smelyansky, the director of Facebook AI System Co-Design told
File:Aquote2.png

Besides, the Intel Movidius VPU accelerator of the next generation which release is planned for the first half of 2020 is equipped with highly effective architectural concepts which, as expected, will provide the leading performance: speed of calculations will increase more than ten times in comparison with the previous generation. At the same time on the energy efficiency the processor exceeds solutions of competitors six times. Intel also submitted the offer for the periphery of Intel DevCloud which together with the Intel Distribution tools from OpenVINO will help to solve a key problem of developers, allowing them to try, to create prototypes and to test solutions using AI on different Intel processors before buying the equipment.

The announcement of Nervana NNP-I (Springhill) — the first Intel processor for problems of machine learning

On August 20, 2019 Intel provided the first in the range processor for machine learning. The solution under the name Nervana NNP-I (Springhill) is held for use in large data centers and, according to the company, promote widespread introduction of artificial intelligence.

The innovation developed in Intel laboratories in Haifa (Israel) is based on the modified 10-nanometer Ice Lake processor and is capable to process intensive processes at the minimum energy consumption, Reuters reports.

Intel provided the first processor for machine learning

The processor works on the printed circuit board which is inserted into M.2 port.  The idea of Intel is in excluding standard processors of Xeon from logical tasks and to concentrate them on more general computation processes.

 The analyst Holger Mueller from Constellation Research  says that such construction was called the coprocessor couple of decades ago. It really unloads base processors of Xeon, but is still not clear whether the new solutions with more tailored architectures created on the basis of graphic processors will be able to compete, the expert noted.

The matrix of Springhill is delivered with two controllers of memory LPDDR4X which are connected to the internal memory and provide capacity up to 4.2 GT/with (68 GB / c), supporting in-band ECC.

File:Aquote1.png
To achieve the goal of universal distribution of AI, we  should solve a problem of enormous volume of data retrieveds and  be convinced that the organizations are equipped by all necessary effectively to use data and  to process them   in process of receipt  — the head of department of Intel Artificial Intelligence Products Group Navin Rao says. —  Computers need acceleration for complex applications of artificial intelligence.
File:Aquote2.png

Facebook became one of the first clients of Intel which began to use Nervana NNP-I.[1]

2018: As Intel attempts upon leadership of Nvidia in processors for artificial intelligence

As of 2018 practically all applications anyway connected with neural networks work on servers of Nvidia company and if others, then all the same on GPU Nvidia. As the report[2] testifies], there are no alternatives. But there is a serious chance that the monopoly of Nvidia will be broken by efforts of Intel. The competitor capable to press, and can be even to displace GPU from the leader's position, there will be new, not having analogs Intel processors Nervana Neural Network Processor (NNP). In them, as it appears from the name, the intellectual property purchased by Intel together with Nervana company in 2016 is implemented.

Intel processor Nervana Neural Network Processor (NNP)

The gravity of intentions of Chipzilla (so call Intel for her giantism) is demonstrated by an interview of the CEO Brian Krzanich which he gave Wall Street Journal in October, 2017. In it he told:

File:Aquote1.png
Presently any company, any serious application are anyway connected with the artificial intelligence (AI). For example, we performed joint work with the producer of jeans Levi Strauss clothes. Using the AI methods we could find out preferences in the choice of buyers, and now Levi can make more selectively those goods which are in demand. It, as well as many other of what we observe induced us to be engaged seriously in specialized processors for neural networks. We delivered the task by 2020 to bypass on two orders of those who are in the lead in this segment of the market today. We are aimed, first of all, at such giperskeyler as Google, Facebook, Amazon and Alibaba
File:Aquote2.png

Good luck of Nvidia

To estimate advantages of the solutions put in NNP it is necessary to tell several words about that as well as why GPU from Nvidia reached the current privileged position. Before the Nvidia company, along with ATI (the brand of this company remained in the name of Radeon AMD processors), existed as one of two leading manufacturers of processors for games, i.e. processors focused on graphics. Its provision considerably changed in 2006 from the beginning of production of processors of the GeForce 8 series. The first GPU G80 and the subsequent models saved not only the shader architecture required for animation with support of DirectX 10, OpenCL and Shader Model 4.0, but also were capable to maintain hardware-software architecture of parallel computings of CUDA (Compute Unified Device Architecture). So appeared a new branch of computing – GPGPU or General Purpose GPU. GPU and CUDA allow to accelerate considerably parallel computings as GPU, unlike classical CPU, consist of bigger number of small and rather basic kernels.

The combination of GeForce 8 to CUDA opened Nvidia an opportunity to change the reputation of equipment manufacturer for games to more prestigious position of equipment supplier for calculations on the basis of GPU (GPU-accelerated computing). First about any machine learning the speech was not. At that time it as the mass phenomenon did not exist yet, but by the time of approach of radical changes in the small egg AI area it was ready. Since 2011 Nvidia could extend area of the interests and to AI where for the lack of competitors became the sole leader.

The debut of GPU in the annex to AI was truly sensational. There was it so. For the first widely known Google Brain project putting the task to learn recognition of cats and people in rollers on YouTube for the lack of nothing else classical servers based on CPU were used. Them it was required more than 2,000 therefore it is not difficult to provide in what the experiment managed. It was successful, but unique. Very few people except Google are able to afford similar luxury. However the situation considerably changed when the mixed crew consisting of researchers of Nvidia Research and Stanford University could achieved the same result literally for kopeks. They needed only 12 GPU. Unexpected democratization of the platform offered qualitatively new prospects for development of machine learning. As a result of GPU Nvidia for years became the standard de facto.

The Google Brain project put the task to learn to distinguish cats and people in rollers on YouTube. For this purpose ispolzovadis classical servers based on CPU


Paying tribute, the Nature magazine wrote: "The first real achievements in the field of machine learning were possible thanks to graphic processors, they opened an opportunity to increase speed training of networks by 10 — 20 times". And for few years later, according to Nvidia, published in 2016, the speed of a training of networks increased by 50 times. In the next years it will increase by an order.

With unexpectedly arisen interest its products from AI the Nvidia company was inexpressibly lucky. Favorable combination of circumstances delivered it in a successful position as speak "the necessary goods appeared in due time and in the right place". There is no GPU from Nvidia, there would be no rapid start of machine learning which we observe. But, as always does not manage without "but", and later GPU are only adapted to problems of machine learning. In this type of the processor the specifics of AI-applications are initially not considered. Also there is nothing surprising that tens of the companies were engaged in development of specialized processors for machine learning.

Success of Nvidia induced by the beginning of an active race for the leader. Versions of projectible processors for AI the set, a range of solutions extends from the neyromorfny processors modeling behavior of a brain to GPU which are improvement of classical CPU. The list of participants of a race is extremely various, both recognized processor vendors, and giants-giperskeylery, and the numerous Chinese companies here. Variations of solutions a set, but the most important consists in need to solve a traditional problem of an engineering compromise – to select such approach which would be rather conservative to be implementable and economically justified, but at the same time rather revolutionary to provide qualitative and quantitative advantages. Intel copes with this problem successfully and obviously wins first place among persecutors.

Approach of Intel

The Intel corporation with its opportunities defined for itself three directions differing on radicalism degree at once. The first, close, promising the fastest results are a specialized processor for neural networks Intel Nervana Neural Network Processor (NNP). The second is expected the medium term - it is the neyromorfny Loihi processor called "the first chip capable to self-training" (First-of-Its-Kind Self-Learning Chip Codenamed). By the way, Loikhi is the underwater active volcano located in 35 kilometers to the southeast from the island of Hawaii. And the third, expected more remote perspective is development of the quantum chip, joint with the Dutch company QuTech.

The Lake Crest technology was purchased by Intel together with Nervana company

In the program of development of technologies for AI announced by Intel, the key place is taken by the Lake Crest technology purchased in 2016 together with Nervana company. A surprising proportion - Intel from nearly 100 thousand busy (and it at that extent of automation in which semiconductor productions differ) and Nervana with the state less than 50 people and purchased for "ridiculous" $360 million.

The analyst from Moor Insights & Strategy company characterized sense of the transaction the following words:

File:Aquote1.png
It is not attempt of the competition directly at all with Nvidia which became the leader in use of GPU in the field of neural networks and deep training. Intel has multi-core processors Xeon Phi, there is FPGA, but there is no GPU. Nervana acquisition — it is perfect the different way of entry into the market of technologies for deep training which is not copying the idea of application of universal GPU for these purposes, and focused on use of new type of the coprocessors created especially for such tasks. For the purposes of deep training smaller accuracy, than that which is provided by universal GPU therefore the specialized processor theoretically can be more productive is required
File:Aquote2.png

Two years later, in 2018, promises of 2016 purchase visible lines. Navin Rao, the founder of Nervana, and nowadays the head of Intel Artificial Intelligence Products Group, characterized Intel Nervana NNP as follows:

File:Aquote1.png
It is the chip called for machine learning. Such architecture which would combine flexibility with the maximum use of all components was our purpose. We designed Intel Nervana NNP from scratch, being free from any restrictions connected with the existing architecture only for AI purposes. Special attention is paid to two main transaction types on matrixes – to umnozheniye and parcels which, on the requests to the processor, significantly differ from traditional loadings, it is difficult to predict their actions and data movements. For this reason in Intel Nervana NNP there is no standard hierarchy of stacks and on other is organized by memory management, being on the chip
File:Aquote2.png

The new format of data view Flexpoint and space architecture of Spatial Architectures saved Intel Nervana NNP from two genetic shortcomings inherent in GPU and CPU which, first, too precisely consider, is useless spending transistors and energy, and, secondly,[3] remain faithful to von-Neumann's scheme from it "a bottle throat" between the processor and memory[4]that limits parallelism.

Local revolution

Development of microprocessors was followed decades by digit capacity growth. Intel, very first from them, 4004 was only 4-bit, to 8, 16, 32 (single float or FP32) and the 64th (double float or FP64) bit processors appeared then. Accelerators for HPC, including based on GPU, have digit capacity 128 and even 256. Such accuracy of the account is required for many applications, but not for machine learning, in any case in most cases at all. Therefore Nvidia was offered the special shortened FP16 format (half precision), it is enough at a grade level of network (training), and at a stage of productive work with these (inference) it is possible to be limited to 8 discharges[5] in general].

Comparison of floating FP32 formats with FP16 and according to integer Int16 with Int32 shows that longer formats spend more energy for 2 orders and require 10 times more area on a crystal, than shorter. The used word is shorter, the bigger number of the transactions executed in unit of time the less specific costs of energy are quicker executed transactions over matrixes. Therefore for machine learning where performance improvement from processors requires the high level of parallelism, it is more preferable the processors supporting short formats. In the new Nvidia models also 16 and 8-bit formats are supported, and there are corresponding versions of CUDA.

If it is possible to call emergence of the shortened formats an evolutionary step, then the innovations implemented in NNP are that other as local revolution. First, a new format of data in addition to two usual with the fixed and floating point. Secondly, accomplishment of commands not in the sequence attributed by the counter.

In the Intel processors Nervana NNP developers went further. They offered essentially new format which was called by Flexpoint. Follows from the name that it is a flexible and adaptive format, it is made such for optimization of work of neural networks. It is not simple to deal with its entity. For those to whom it is interesting there is most detailed article "Flexpoint: An Adaptive Numerical Format for Efficient Training of Deep Neural Networks". More popular statement of philosophy of Flexpoint can be found in article "Flexpoint: numerical innovation underlying the Intel Nervana Neural Network Processor".

Extremely simplifying, one may say, that Flexpoint is qualitatively new format of tensor type (tensor format). It is created on the basis of the researches executed earlier in Cornell University in which the technique of work with the Autoflex formats was offered. Flexpoint combines at the same time properties of an integer format and a floating format that does it to more effective, than, for example, FP16. For a narrow circle of the specialists understanding intra machine data view where everything very long time ago developed and decades nothing changes, emergence of a new format - this sensation. It is possible to tell with confidence that people with mathematical mentality will be interested in the puzzling analysis of an essence of Flexpoint. The user is enough to believe in its efficiency.

The Flexpoint format in combination with Autoflex open a possibility of automatic adjustment of the processor on a specific format of data view which is optimal for them. In the drawing the flowchart of work Autoflex is shown. It is separated into two stages. On the first the preferable format, and on the second – work with this format is selected.

Flowchart of work Autoflex

Overcoming "bottle throat Neumann's background"

Between a brain and the computer, in addition to many others, there is one more major distinction. In a biological brain there is no separation into processors and memory, and in the computer the processor and memory are two main making von-Neumann's schemes, the third - input-output devices. Existence of the channel between processors and memory imposes insuperable restrictions for showiness of this scheme, it is called by "a bottle throat" of von-Neumann[6][7]. For overcoming it in modern processors there is multi-level a cache memory which are designed to compensate the described restriction. However opportunities of caches are enough for calculating tasks, but they do not cope with requirements of machine learning where it is necessary to deal with significantly large volumes of data. For this reason GPU in which von-Neumann's scheme is implemented, have no long-term perspective.

In understanding of limitation of the processors created according to von-Neumann's scheme from 90th years schemes, like a brain, both the memories, and processors combining on one crystal are developed. For example, the asynchronous array of simple AsAP processors (Asynchronous Array of Simple Processors) was one of the first solutions of this type. The Intel processor Nervana NNP is created on one of modern schemes of this type which is called Coarse-Grained Spatial Architectures where the small processor PE elements form space (Spatial) structure. The architecture is too difficult for the description in the short overview, persons interested to understand can recommend several articles:

On October 17, 2017, when the CEO of Intel corporation Brian Krzanich announced release[8], he promised that the product transforms the IT industry and the whole world.

Intel noted that the new processor should make a revolution in the field of the artificial intelligence (AI) in the most different industries. Using Intel Nervana of the company will be able to develop absolutely new classes of the AI-applications allowing to process even more data and transforming business.

Robotics



Notes