RSS
Логотип
Баннер в шапке 1
Баннер в шапке 2

Qwen

Product
Developers: Alibaba Group
Date of the premiere of the system: August 2023
Branches: Information Technology

Content

2025

A family of models is presented Qwen3

On April 28, 2025, Alibaba introduced a family of Qwen3 models that, according to synthetic benchmarks, break into the group of leaders, but is not the leader in terms of the totality of parameters, although the application is very strong.

Integrally, Qwen3 competes with Gemini 2.5 flash in terms of price/quality/performance, ahead of GPT o3 and o4-mini due to better availability, but losing in performance if the goal is to generate a better response/solution.

There are many models, but I will highlight the flagship one - the Qwen3-235B-A22B, which activates 22 billion active parameters out of 235 billion potentially available, reducing the requirements for computing resources by 85%, while maintaining the quality of output tokens.

Qwen3 are based on the Mix-of-Experts (MoE) architecture. This is an approach in machine learning that divides the model into specialized subnets ("experts"), activated dynamically for each input request. Its key idea is to increase the efficiency and quality of the model through conditional calculations, when only part of the parameters are used to process a specific input, allowing you to generate tokens faster and cheaper without losing quality.

To understand the effectiveness of the architecture, the Qwen3-4B (4 billion parameters) exceeds the Qwen2.5-72B-Instruct in the tasks of general understanding, and the MoE version Qwen3-30B-A3B bypasses the QwQ-32B, activating only 10% of the parameters.

For users, this means the ability to deploy local very powerful and productive models with limited resources, literally on home computers, Spydell Finance wrote.

Resource Efficiency: 235B-A22B requires 65-70% less VRAM than Llama4 Maverick 402B.

Output speed: 320 tokens/sec on RTX 4090 versus 180 for DeepSeek-R1.

Long context accuracy: 98% on the 32k context window versus 75-95% in the latest competitor models. This parameter shows the accuracy of holding the context window, while old LLMs "poured in" with a large window, forgetting the narrative thread and details.

Qwen3 was originally created as a thinking model (Thinking Mode), while this module was screwed to the Qwen2.5 through crutches.

The amount of training data has been increased by 3 times - up to 36 trillion tokens, with an emphasis on STEM disciplines (35% of data) and synthetic reasoning sets.

  • + 42% accuracy in mathematical benchmarks (MATH, AIME25)
  • + 37% efficiency in programming tasks (LiveCodeBench)
  • Support for 119 languages ​ ​ versus 32 in Qwen2.5.

In comparison with the previous version and the main competitors, Qwen3 demonstrates a breakthrough in the efficiency of resource consumption while maintaining a leading position in mathematical and coding problems.

The possibilities in multimodality (video and image processing) have been significantly improved, the ability to absorb video with a duration of up to 1 hours with an accuracy of a second without losing details has been declared.

The preliminary cut allows us to judge that now Qwen3-235B-A22B in third place, behind the best LLM in the world - GPT o3 and the Gemini 2.5 Pro located next to it, but winning against Grok 3, and exactly ahead of DeepSeek R1, which made a splash in January-February.

A very worthy answer from the Chinese, we are waiting for DeepSeek R2, which is due out on May 15-25. In early May, Elon Musk promised to introduce Grok 3.5. The competition is escalating.

Qwen3 is available for free on the official website.

Qwen version announcement 2.5-Max

On January 29, 2025, Alibaba Cloud, the cloud division of the Chinese corporation Alibaba, introduced the large Qwen 2.5-Max language model. It is claimed that this neural network is superior in capabilities to the powerful open-source artificial intelligence model DeepSeek V3, which, in turn, is ahead of most open and closed counterparts, including ChatGPT.

Qwen 2.5-Max uses the Mix-of-Experts (MoE) architecture. It involves the use of many submodels (experts), each of which specializes in different aspects of input data or types of tasks. This approach allows you to significantly increase the speed, as well as improve the quality of processing requests and generated results.

Alibaba Cloud has released a free neural network that is more powerful than DeepSeek

The Qwen neural network 2.5-Max previously trained on more than 20 trillion tokens. Additionally, controlled fine tuning (SFT) and reinforcement training based on human feedback (RLHF) were conducted. The Qwen 2.5-Max model is claimed to outperform the DeepSeek V3 in benchmarks such as Arena-Hard, LiveBench, LiveCodeBench and GPQA-Diamond, while also performing competitively in other scores including the MMLU-Pro.

File:Aquote1.png
Qwen 2.5-Max surpasses GPT-4o, DeepSeek V3 and Llama-3.1-405B in almost all indicators. Our base models have shown significant advantages in most tests, and we are optimistic that improvements in tapping techniques will take the next version of Qwen to the next level, Alibaba says.
File:Aquote2.png

The Qwen 2.5-Max model is available through the Qwen Chat service, which can be used to test the capabilities of a neural network, assess its effectiveness, etc. In the future, Alibaba Cloud plans to integrate Qwen 2.5-Max into its cloud services, which will expand their functionality.[1]

2023: Neural Network Launch

On August 25, 2023, the Chinese corporation Alibaba introduced two artificial intelligence models - Qwen-VL[2] (Qwen Large Vision Language Model) and Qwen-VL-Chat, which provide advanced capabilities in terms of image analysis and natural language dialogue.

The released solutions are open source, which means that independent researchers, scientific organizations and companies around the world will be able to use them to create their own AI applications without the need to train their systems. This will save hardware resources, time and money. In addition, the final products will accelerate their entry into the commercial market.

Chinese corporation Alibaba unveils two models of artificial intelligence

The Qwen-VL model can recognize images and text. The algorithm is capable of processing requests related to graphic files, generating responses, image signatures, etc. In turn, the Qwen-VL-Chat model is designed for more complex interaction: it can compare several graphic files, answer a series of questions, and also generate narratives. AI algorithms make it possible to form images based on photographs provided by the user, as well as solve mathematical problems shown in the picture. For example, you can ask artificial intelligence a question about the location of a particular company by uploading a photo of its signs.

The announced AI models, as noted, are designed to improve user interaction by providing more accurate and up-to-date information. At the same time, experts say, there are issues related to ensuring confidentiality. AI algorithms with the ability to visually localize theoretically allow you to determine the location of people captured in photographs: this information can be used for surveillance or for criminal purposes.[3]

Notes