Balance

Qwen: Qwen3 VL 30B A3B Instruct

Name: Qwen: Qwen3 VL 30B A3B Instruct
Brand: qwen
Price: 130 USD
Rating: 3.9 (1 reviews)

Qwen3-VL-30B-A3B-Instruct is a cutting-edge multimodal AI model designed to unify robust text generation with sophisticated visual understanding across both images and videos. This Instruct variant is specifically optimized for following instructions across a wide array of general multimodal tasks, demonstrating exceptional performance in perception of real-world and synthetic categories, precise 2D/3D spatial grounding, and comprehensive long-form visual comprehension. It consistently achieves competitive results on leading multimodal benchmarks. Beyond its core capabilities, Qwen3-VL-30B-A3B-Instruct is highly suitable for agentic applications. It adeptly handles multi-image, multi-turn instructions, facilitates video timeline alignments, supports GUI automation, and can even generate visual coding from sketches to debugged UI. Its text performance rivals flagship Qwen3 models, making it ideal for document AI, OCR, UI assistance, spatial tasks, and advanced agent research. With a context window of 131K tokens and a max output of 4K tokens, it offers extensive processing power. Pricing is $0.15/$0.60 per 1M tokens (input/output) and it's available in the STARTER access tier.

Multimodal AIVision AIInstruction FollowingVideo AnalysisText Generation

78%Quality

131KContext Window

70%Speed