Q

qwen3-vl-235b-a22b

輸入:$75/M
輸出:$300/M
上下文:2M
最大輸出:30K
qwen3-vl-235b-a22b is a multimodal model that unifies strong text generation with visual understanding for images and videos. Its Instruct variant optimizes instruction-following for general multimodal tasks. It excels in perception of real-world/synthetic categories, 2D/3D spatial grounding, and long-form visual comprehension, achieving competitive multimodal benchmark results.
商業用途