全部模型

Qwen3.5-9B

GGUF视觉模型深度思考

3 Download options available

README

Qwen3.5-9B

[!Note] This repository contains model weights and configuration files for the post-trained model in the Hugging Face Transformers format.

These artifacts are compatible with Hugging Face Transformers, vLLM, SGLang, KTransformers, etc.

Qwen3.5 Highlights

Qwen3.5 features the following enhancement:

  • Unified Vision-Language Foundation: Early fusion training on multimodal tokens achieves cross-generational parity with Qwen3 and outperforms Qwen3-VL models across reasoning, coding, agents, and visual understanding benchmarks.

  • Efficient Hybrid Architecture: Gated Delta Networks combined with sparse Mixture-of-Experts deliver high-throughput inference with minimal latency and cost overhead.

  • Scalable RL Generalization: Reinforcement learning scaled across million-agent environments with progressively complex task distributions for robust real-world adaptability.

  • Global Linguistic Coverage: Expanded support to 201 languages and dialects, enabling inclusive, worldwide deployment with nuanced cultural and regional understanding.

  • Next-Generation Training Infrastructure: Near-100% multimodal training efficiency compared to text-only training and asynchronous RL frameworks supporting massive-scale agent scaffolds and environment orchestration.

Benchmark Results

Model Overview

  • Type: Causal Language Model with Vision Encoder
  • Training Stage: Pre-training & Post-training
  • Language Model
    • Number of Parameters: 9B
    • Hidden Dimension: 4096
    • Token Embedding: 248320 (Padded)
    • Number of Layers: 32
    • Hidden Layout: 8 × (3 × (Gated DeltaNet → FFN) → 1 × (Gated Attention → FFN))
    • Gated DeltaNet:
      • Number of Linear Attention Heads: 32 for V and 16 for QK
      • Head Dimension: 128
    • Gated Attention:
      • Number of Attention Heads: 16 for Q and 4 for KV
      • Head Dimension: 256
      • Rotary Position Embedding Dimension: 64
    • Feed Forward Network:
      • Intermediate Dimension: 12288
    • LM Output: 248320 (Padded)
    • MTP: trained with multi-steps
  • Context Length: 262,144 natively and extensible up to 1,010,000 tokens.
OmniMind

万象智维

Omni Studio 公众号二维码

公众号

Omni Studio 小红书二维码

小红书

© 2025 万象智维科技有限公司. All rights reserved.

京ICP备2025136340号-1