Qwen3.5 GGUF Quantization Crisis: MXFP4 Bug Fixed & New Benchmarks

The MXFP4 Crisis: What Went Wrong?
In late February 2026, the local LLM community discovered a critical bug affecting Qwen3.5 GGUF quantizations from Unsloth. The issue centered around the use of MXFP4 (Matrix Floating Point 4-bit) quantization layers, which caused severe degradation in model performance.
Symptoms of the Bug
Users reported several alarming issues:
- Garbled text generation - models producing nonsensical output
- Repetitive patterns - models stuck in loops
- Q3-level quantization failures - particularly severe in larger models
- Performance degradation - significantly worse than expected for 4-bit quantizations
The problem was most critical in:
- Qwen3.5-35B-A3B (Medium series)
- Qwen3.5-122B-A10B (100B+ scale model)
Community Discovery
The issue was first identified on r/LocalLLaMA and subsequently tracked through Hugging Face discussions:
The Official Fix
On February 27, 2026, Unsloth announced the official resolution:
“Qwen3.5 is now updated with improved tool-calling & coding performance! We also benchmarked GGUFs & removed MXFP4 layers from 3 quants.”
What Changed
- ✅ MXFP4 layers completely removed from affected quantizations
- ✅ Improved tool-calling capabilities
- ✅ Enhanced coding performance
- ✅ Comprehensive benchmarking to validate improvements
New Benchmark Results
Following the fix, the community conducted extensive benchmarking to compare the fixed quantizations against previous versions and alternatives.
MMLU Performance Comparison
The Unsloth Dynamic 2.0 documentation reveals significant improvements:
| Quantization | MMLU Score | KL Divergence |
|---|---|---|
| Fixed Q4_K_M | Improved | Reduced |
| Fixed Q3_K_S | Fixed | Stable |
| Fixed Q3_K_L | Fixed | Stable |
Key Benchmark Findings
According to detailed evaluations by the community:
- 5-shot MMLU scores now match full-precision expectations
- KL divergence measurements show minimal accuracy loss
- Generation quality restored to expected levels
- Tool-calling and coding tasks show marked improvement
Performance Validation
The benchmarking framework built by Unsloth enabled apples-to-apples comparisons:
# Example evaluation methodology
from unsloth import evaluate_gguf
results = evaluate_gguf(
model="unsloth/Qwen3.5-35B-A3B-GGUF",
datasets=["MMLU", "GSM8K"],
metrics=["accuracy", "KL_divergence"]
)
Recommendations for Users
✅ Safe to Use Now
The following quantizations are now safe and recommended for production use:
unsloth/Qwen3.5-35B-A3B-GGUF(all variants)unsloth/Qwen3.5-122B-A10B-GGUF(all variants)unsloth/Qwen3.5-27B-GGUF(all variants)
⚠️ Avoid These Versions
Check your GGUF files for the following problematic patterns:
- Files dated before February 27, 2026
- Quantizations explicitly mentioning MXFP4 in metadata
- Any files showing garbled text in testing
🔄 Migration Guide
If you’re currently using older quantizations:
- Download fresh GGUF files from Hugging Face
- Verify file dates - should be post-February 27, 2026
- Test with benchmark questions before production use
- Monitor generation quality for any remaining issues
Technical Details
Why MXFP4 Failed
MXFP4 (Matrix Floating Point 4-bit) was designed to improve quantization accuracy by using matrix-level precision. However, for Qwen3.5’s architecture:
- Layer mismatch - certain transformer layers incompatible with MXFP4
- Precision loss - critical for reasoning and tool-calling tasks
- Activation quantization - caused instability in attention mechanisms
Dynamic 2.0 Quantization
Unsloth’s improved approach uses:
- Intelligent layer selection - MXFP4 only where beneficial
- Hybrid quantization - combines INT4 and FP4 optimally
- Benchmark-driven optimization - validated against real-world tasks
Community Response
The r/LocalLLaMA community has been highly responsive:
- Rapid bug identification - discovered within 48 hours
- Transparent communication - ongoing HF discussions
- Extensive testing - multiple independent evaluations
- Clear documentation - comprehensive guides for users
Looking Ahead
Future Improvements
- Extended quantization formats - more precision options
- Automated benchmarking - continuous quality assurance
- Better documentation - clear version recommendations
- Community validation - independent testing protocols
Best Practices
For local LLM deployment:
- Always verify file dates before production use
- Test with diverse prompts across different domains
- Monitor generation quality continuously
- Stay updated with community discussions
Resources
Official Links
Community Discussions
Benchmarking Tools
Conclusion
The Qwen3.5 GGUF MXFP4 crisis has been successfully resolved. The community’s rapid response, combined with Unsloth’s transparent communication and swift fixes, demonstrates the strength of the open-source local LLM ecosystem.
Key takeaways:
- ✅ MXFP4 bug is fixed as of February 27, 2026
- ✅ New benchmarks show excellent performance
- ✅ Community-driven validation ensures quality assurance
- ✅ Safe to deploy in production environments
For the latest updates, stay connected with the r/LocalLLaMA community and follow Unsloth’s official announcements.
About the Author: This analysis was compiled based on community discussions, official announcements, and independent benchmarking. For the most current information, always check the official Hugging Face repositories and r/LocalLLaMA discussions.