RSS
Логотип
Баннер в шапке 1
Баннер в шапке 2

DeepSeek Coder

Product
Developers: DeepSeek
Date of the premiere of the system: June 2024
Branches: Information Technology
Technology: Application Development Tools

2024: Product Announcement

In mid-June 2024, a Chinese startup in the region artificial intelligence DeepSeek announced DeepSeek Coder V2, an open programming model. It is claimed to outperform closed counterparts such as GPT-4 Turbo the Claude 3 Opus and. Gemini 1.5 Pro

The first version of DeepSeek Coder had 33 billion parameters, supported 86 programming languages ​ ​ and had a context window for 16 thousand tokens. The DeepSeek Coder V2 model surpasses the first generation solution in key characteristics: it uses 338 programming languages, and the size of the context window has been increased to 128 thousand tokens.

DeepSeek Coder V2 screenshot

When tested in the MBPP +, HumanEval and Aider benchmarks, designed to evaluate the capabilities of large language models (LLMs) to generate code and solve problems, DeepSeek Coder V2 scored 76.2, 90.2 and 73.7 points, respectively, ahead of most other models, including GPT-4 Turbo, Claude 3 Opus, Gemini 1.5 Pro, Codestral and Llama-3 70B. High performance is also demonstrated in tests designed to evaluate the mathematical capabilities of the model (MATH and GSM8K). The only model that managed to surpass DeepSeek Coder V2 was the GPT-4o, which received higher ratings in HumanEval, LiveCode Bench, MATH and GSM8K.

DeepSeek was able to achieve such high performance thanks to the Mixture of Experts (MoE) approach, which implies that when a request is sent, only part of the overall model is launched, and not all. In addition, additional training was performed on the 6 trillion token base model of DeepSeek V2, including program code and mathematical data from GitHub and CommonCrawl. As a result, a model with 16 or 236 billion parameters can activate only 2.4 or 21 billion "expert" parameters to effectively solve the task.[1]

Notes