Qwen3.5-9B

GGUF视觉模型深度思考

3 Download options available

README

Qwen3.5-9B

[!Note] This repository contains model weights and configuration files for the post-trained model in the Hugging Face Transformers format.

These artifacts are compatible with Hugging Face Transformers, vLLM, SGLang, KTransformers, etc.

Qwen3.5 Highlights

Qwen3.5 features the following enhancement:

Unified Vision-Language Foundation: Early fusion training on multimodal tokens achieves cross-generational parity with Qwen3 and outperforms Qwen3-VL models across reasoning, coding, agents, and visual understanding benchmarks.
Efficient Hybrid Architecture: Gated Delta Networks combined with sparse Mixture-of-Experts deliver high-throughput inference with minimal latency and cost overhead.
Scalable RL Generalization: Reinforcement learning scaled across million-agent environments with progressively complex task distributions for robust real-world adaptability.
Global Linguistic Coverage: Expanded support to 201 languages and dialects, enabling inclusive, worldwide deployment with nuanced cultural and regional understanding.
Next-Generation Training Infrastructure: Near-100% multimodal training efficiency compared to text-only training and asynchronous RL frameworks supporting massive-scale agent scaffolds and environment orchestration.

Benchmark Results

Model Overview

Type: Causal Language Model with Vision Encoder
Training Stage: Pre-training & Post-training
Language Model
- Number of Parameters: 9B
- Hidden Dimension: 4096
- Token Embedding: 248320 (Padded)
- Number of Layers: 32
- Hidden Layout: 8 × (3 × (Gated DeltaNet → FFN) → 1 × (Gated Attention → FFN))
- Gated DeltaNet:
  - Number of Linear Attention Heads: 32 for V and 16 for QK
  - Head Dimension: 128
- Gated Attention:
  - Number of Attention Heads: 16 for Q and 4 for KV
  - Head Dimension: 256
  - Rotary Position Embedding Dimension: 64
- Feed Forward Network:
  - Intermediate Dimension: 12288
- LM Output: 248320 (Padded)
- MTP: trained with multi-steps
Context Length: 262,144 natively and extensible up to 1,010,000 tokens.