全部模型

Qwen3.5-0.8B

GGUF视觉模型深度思考

3 Download options available

README

Qwen3.5-0.8B

[!Note] This repository contains model weights and configuration files for the post-trained model in the Hugging Face Transformers format.

These artifacts are compatible with Hugging Face Transformers, vLLM, SGLang, KTransformers, etc.

In light of its parameter scale, the intended use cases are prototyping, task-specific fine-tuning, and other research or development purposes.

Qwen3.5 Highlights

Qwen3.5 features the following enhancement:

  • Unified Vision-Language Foundation: Early fusion training on multimodal tokens achieves cross-generational parity with Qwen3 and outperforms Qwen3-VL models across reasoning, coding, agents, and visual understanding benchmarks.

  • Efficient Hybrid Architecture: Gated Delta Networks combined with sparse Mixture-of-Experts deliver high-throughput inference with minimal latency and cost overhead.

  • Scalable RL Generalization: Reinforcement learning scaled across million-agent environments with progressively complex task distributions for robust real-world adaptability.

  • Global Linguistic Coverage: Expanded support to 201 languages and dialects, enabling inclusive, worldwide deployment with nuanced cultural and regional understanding.

  • Next-Generation Training Infrastructure: Near-100% multimodal training efficiency compared to text-only training and asynchronous RL frameworks supporting massive-scale agent scaffolds and environment orchestration.

Model Overview

  • Type: Causal Language Model with Vision Encoder
  • Training Stage: Pre-training & Post-training
  • Language Model
    • Number of Parameters: 0.8B
    • Hidden Dimension: 1024
    • Token Embedding: 248320 (Padded)
    • Number of Layers: 24
    • Hidden Layout: 6 × (3 × (Gated DeltaNet → FFN) → 1 × (Gated Attention → FFN))
    • Gated DeltaNet:
      • Number of Linear Attention Heads: 16 for V and 16 for QK
      • Head Dimension: 128
    • Gated Attention:
      • Number of Attention Heads: 8 for Q and 2 for KV
      • Head Dimension: 256
      • Rotary Position Embedding Dimension: 64
    • Feed Forward Network:
      • Intermediate Dimension: 3584
    • LM Output: 248320 (Tied to token embedding)
    • MTP: trained with multi-steps
  • Context Length: 262,144 natively
OmniMind

万象智维

Omni Studio 公众号二维码

公众号

Omni Studio 小红书二维码

小红书

© 2025 万象智维科技有限公司. All rights reserved.

京ICP备2025136340号-1