Qwen3.5 GGUF Quantization Crisis: MXFP4 Bug Fixed & New Benchmarks

Feb 27, 2026

Qwen3.5 GGUF Quantization

The MXFP4 Crisis: What Went Wrong?

In late February 2026, the local LLM community discovered a critical bug affecting Qwen3.5 GGUF quantizations from Unsloth. The issue centered around the use of MXFP4 (Matrix Floating Point 4-bit) quantization layers, which caused severe degradation in model performance.

Symptoms of the Bug

Users reported several alarming issues:

Garbled text generation - models producing nonsensical output
Repetitive patterns - models stuck in loops
Q3-level quantization failures - particularly severe in larger models
Performance degradation - significantly worse than expected for 4-bit quantizations

The problem was most critical in:

Qwen3.5-35B-A3B (Medium series)
Qwen3.5-122B-A10B (100B+ scale model)

Community Discovery

The issue was first identified on r/LocalLLaMA and subsequently tracked through Hugging Face discussions:

The Official Fix

On February 27, 2026, Unsloth announced the official resolution:

“Qwen3.5 is now updated with improved tool-calling & coding performance! We also benchmarked GGUFs & removed MXFP4 layers from 3 quants.”

What Changed

✅ MXFP4 layers completely removed from affected quantizations
✅ Improved tool-calling capabilities
✅ Enhanced coding performance
✅ Comprehensive benchmarking to validate improvements

New Benchmark Results

Following the fix, the community conducted extensive benchmarking to compare the fixed quantizations against previous versions and alternatives.

MMLU Performance Comparison

The Unsloth Dynamic 2.0 documentation reveals significant improvements:

Quantization	MMLU Score	KL Divergence
Fixed Q4_K_M	Improved	Reduced
Fixed Q3_K_S	Fixed	Stable
Fixed Q3_K_L	Fixed	Stable

Key Benchmark Findings

According to detailed evaluations by the community:

5-shot MMLU scores now match full-precision expectations
KL divergence measurements show minimal accuracy loss
Generation quality restored to expected levels
Tool-calling and coding tasks show marked improvement

Performance Validation

The benchmarking framework built by Unsloth enabled apples-to-apples comparisons:

# Example evaluation methodology
from unsloth import evaluate_gguf

results = evaluate_gguf(
    model="unsloth/Qwen3.5-35B-A3B-GGUF",
    datasets=["MMLU", "GSM8K"],
    metrics=["accuracy", "KL_divergence"]
)

Recommendations for Users

✅ Safe to Use Now

The following quantizations are now safe and recommended for production use:

unsloth/Qwen3.5-35B-A3B-GGUF (all variants)
unsloth/Qwen3.5-122B-A10B-GGUF (all variants)
unsloth/Qwen3.5-27B-GGUF (all variants)

⚠️ Avoid These Versions

Check your GGUF files for the following problematic patterns:

Files dated before February 27, 2026
Quantizations explicitly mentioning MXFP4 in metadata
Any files showing garbled text in testing

🔄 Migration Guide

If you’re currently using older quantizations:

Download fresh GGUF files from Hugging Face
Verify file dates - should be post-February 27, 2026
Test with benchmark questions before production use
Monitor generation quality for any remaining issues

Technical Details

Why MXFP4 Failed

MXFP4 (Matrix Floating Point 4-bit) was designed to improve quantization accuracy by using matrix-level precision. However, for Qwen3.5’s architecture:

Layer mismatch - certain transformer layers incompatible with MXFP4
Precision loss - critical for reasoning and tool-calling tasks
Activation quantization - caused instability in attention mechanisms

Dynamic 2.0 Quantization

Unsloth’s improved approach uses:

Intelligent layer selection - MXFP4 only where beneficial
Hybrid quantization - combines INT4 and FP4 optimally
Benchmark-driven optimization - validated against real-world tasks

Community Response

The r/LocalLLaMA community has been highly responsive:

Rapid bug identification - discovered within 48 hours
Transparent communication - ongoing HF discussions
Extensive testing - multiple independent evaluations
Clear documentation - comprehensive guides for users

Looking Ahead

Future Improvements

Extended quantization formats - more precision options
Automated benchmarking - continuous quality assurance
Better documentation - clear version recommendations
Community validation - independent testing protocols

Best Practices

For local LLM deployment:

Always verify file dates before production use
Test with diverse prompts across different domains
Monitor generation quality continuously
Stay updated with community discussions

Resources

Official Links

Community Discussions

Benchmarking Tools

Conclusion

The Qwen3.5 GGUF MXFP4 crisis has been successfully resolved. The community’s rapid response, combined with Unsloth’s transparent communication and swift fixes, demonstrates the strength of the open-source local LLM ecosystem.

Key takeaways:

✅ MXFP4 bug is fixed as of February 27, 2026
✅ New benchmarks show excellent performance
✅ Community-driven validation ensures quality assurance
✅ Safe to deploy in production environments

For the latest updates, stay connected with the r/LocalLLaMA community and follow Unsloth’s official announcements.

About the Author: This analysis was compiled based on community discussions, official announcements, and independent benchmarking. For the most current information, always check the official Hugging Face repositories and r/LocalLLaMA discussions.