Developers: | Groq |
Date of the premiere of the system: | February 2024 |
Branches: | Electrical and Microelectronics |
Technology: | Processors |
2024: Product Announcement
At the end of February 2024 startup Groq , he presented a specialized processor LPU (Language Processing Unit), designed to speed up the work of large language models (LLM). The product is expected to revolutionize the market. artificial intelligence
Groq LPU is based on the tensor stream processor (TSP) architecture. The solution is endowed with 230 MB local SRAM with 80 TB/s bandwidth. It is claimed that the performance on INT8 operations reaches 750 TOPS, on FP16 operations - 188 Tflops. When working with the Mixtral 8x7B model, the Groq LPU accelerator provides an interference rate of up to 480 tokens per second, which is one of the best indicators in the industry as of the end of February 2024. In models such as Llama 2 70B with a context length of 4096 tokens, the new chip demonstrates performance at 300 tokens per second, while in the smaller Llama 2 7B model with 2048 context tokens, the rate of interference reaches 750 tokens per second.
In general, as noted, the Groq LPU accelerator outperforms competing products from NVIDIA, AMD and Intel. In fact, we are talking about rethinking the efficiency of AI computing. The Groq LPU product isn't just a chip: it's a harbinger of a new era where AI can easily integrate into everyday life, overcoming existing delay barriers that make it difficult for systems to interact with the user in real time.
Unlike GPUs, LPUs use a simplified approach that eliminates the need for complex scheduling hardware and provides constant latency and high throughput. In addition, the new product has high energy efficiency, which reduces the total cost of maintaining AI systems.[1]