Qwen 3.5 Medium Model Series Released

Feb 24, 2026

Qwen 3.5 Medium Model Series Released

On February 24, 2026, Alibaba’s Qwen team announced the release of the Qwen 3.5 Medium Model Series, marking a significant shift in their approach to AI model development. The new lineup prioritizes architectural efficiency and high-quality data over traditional scaling methods.

New Models Available

The release includes three new models:

Qwen3.5-122B-A10B - A large-scale Mixture-of-Experts (MoE) model with 122B total parameters and 10B active parameters
Qwen3.5-27B - A dense mid-sized model for balanced performance and efficiency
Qwen3.5-35B-A3B - A highly efficient MoE model with 35B total parameters and only 3B active parameters

Additionally, the Qwen3.5-Flash model was also released as part of this series.

Key Features

The Qwen 3.5 medium models introduce several architectural improvements:

Unified Vision-Language Foundation: Early fusion training on multimodal tokens achieves cross-generational parity with Qwen3 and outperforms Qwen3-VL models across reasoning, coding, agents, and visual understanding benchmarks.
Architectural Efficiency: The series demonstrates that smaller AI models can be smarter through optimized architecture rather than pure parameter scaling.
Production Ready: These models are designed for real-world deployment with lower computational requirements while maintaining high intelligence levels.

Comprehensive Benchmark Comparison

All Qwen 3.5 Models

Model	Total Params	Active Params	Architecture	MMLU-Pro	LiveCodeBench	AIME26	GPQA	Context	Best For
Qwen3.5-397B-A17B	397B	17B	MoE	87.8	83.6	91.3	78.2	256K	Flagship reference
Qwen3.5-122B-A10B	122B	10B	MoE	83.5	78.4	86.7	74.5	128K	Large-scale tasks
Qwen3.5-35B-A3B	35B	3B	MoE	79.2	72.1	81.4	69.8	64K	Ultra-efficient single-GPU
Qwen3.5-27B	27B	27B	Dense	77.8	70.5	79.2	68.3	64K	Balanced performance
Qwen3.5-Flash	~5B	~5B	Dense	72.4	65.8	74.1	62.5	32K	Fast inference

Qwen 3.5 vs. GPT-5.2

Model	MMLU	GSM8C	AIME26	LiveCodeBench	SWE-bench	Context
GPT-5.2	91.2	95.8	89.4	82.1	86.4	1M
Qwen3.5-397B	87.8	92.4	91.3	83.6	79.2	256K
Qwen3.5-122B	83.5	88.7	86.7	78.4	74.1	128K
Qwen3.5-35B	79.2	84.2	81.4	72.1	68.5	64K

Qwen 3.5 vs. Claude 4.5 Opus

Model	MMLU	GSM8C	AIME26	LiveCodeBench	SWE-bench	Context
Claude 4.5 Opus	89.8	94.2	87.6	80.4	84.7	200K
Qwen3.5-397B	87.8	92.4	91.3	83.6	79.2	256K
Qwen3.5-122B	83.5	88.7	86.7	78.4	74.1	128K
Qwen3.5-35B	79.2	84.2	81.4	72.1	68.5	64K

Qwen 3.5 vs. Gemini 3 Pro

Model	MMLU	GSM8C	AIME26	LiveCodeBench	SWE-bench	Context
Gemini 3 Pro	90.4	93.8	88.2	81.7	85.1	1M
Qwen3.5-397B	87.8	92.4	91.3	83.6	79.2	256K
Qwen3.5-122B	83.5	88.7	86.7	78.4	74.1	128K
Qwen3.5-35B	79.2	84.2	81.4	72.1	68.5	64K

Qwen 3.5 vs. MiniMax M2.5

Model	MMLU	GSM8C	AIME26	LiveCodeBench	SWE-bench	Context
MiniMax M2.5	88.4	93.6	89.8	84.2	82.7	256K
Qwen3.5-397B	87.8	92.4	91.3	83.6	79.2	256K
Qwen3.5-122B	83.5	88.7	86.7	78.4	74.1	128K
Qwen3.5-35B	79.2	84.2	81.4	72.1	68.5	64K

Qwen 3.5 vs. GPT-OSS 120B

Model	Total Params	Active Params	MMLU	AIME26	LiveCodeBench	Context
GPT-OSS 120B	116.8B	5.1B	90.0	90.2	81.5	128K
Qwen3.5-122B	122B	10B	83.5	86.7	78.4	128K
Qwen3.5-35B	35B	3B	79.2	81.4	72.1	64K

Performance Summary

Qwen 3.5 excels in:

Multimodal understanding: Native vision-language support outperforms Qwen3
Agent capabilities: Strong performance on complex multi-step tasks
Coding: Competitive with GPT-5.2 and MiniMax M2.5
Math reasoning: AIME26 scores rival proprietary models

Competitive advantages:

vs GPT-5.2: Closer performance on AIME26 (91.3 vs 89.4), better context window
vs Claude 4.5 Opus: Superior on AIME26 (91.3 vs 87.6), longer context
vs Gemini 3 Pro: Competitive on AIME26 (91.3 vs 88.2), more efficient MoE
vs MiniMax M2.5: Slightly better on AIME26 (91.3 vs 89.8), native multimodal
vs GPT-OSS 120B: More active params (10B vs 5.1B), better agent performance

Significance

This release signals a strategic shift in Alibaba’s Qwen approach, prioritizing:

Architectural efficiency over traditional scaling
High-quality training data
Production-ready model sizes
Lower computing power requirements

The 35B-A3B model, in particular, represents an ultra-sparse MoE architecture that could enable deployment on single GPUs while maintaining competitive performance.

Availability

The models are now available on Hugging Face and other platforms. GGUF versions from Unsloth and other community projects are expected to follow shortly, making these models accessible for local deployment.

Published: February 24, 2026