Q3
Balance

Qwen: Qwen3 VL 30B A3B Instruct

by qwen

Qwen3-VL-30B-A3B-Instruct is a cutting-edge multimodal AI model designed to unify robust text generation with sophisticated visual understanding across both images and videos. This Instruct variant is specifically optimized for following instructions across a wide array of general multimodal tasks, demonstrating exceptional performance in perception of real-world and synthetic categories, precise 2D/3D spatial grounding, and comprehensive long-form visual comprehension. It consistently achieves competitive results on leading multimodal benchmarks. Beyond its core capabilities, Qwen3-VL-30B-A3B-Instruct is highly suitable for agentic applications. It adeptly handles multi-image, multi-turn instructions, facilitates video timeline alignments, supports GUI automation, and can even generate visual coding from sketches to debugged UI. Its text performance rivals flagship Qwen3 models, making it ideal for document AI, OCR, UI assistance, spatial tasks, and advanced agent research. With a context window of 131K tokens and a max output of 4K tokens, it offers extensive processing power. Pricing is $0.15/$0.60 per 1M tokens (input/output) and it's available in the STARTER access tier.

Multimodal AIVision AIInstruction FollowingVideo AnalysisText Generation
78%Quality
131KContext Window
70%Speed
Category
Economy
API access
Unified context
RAG + Knowledge Base
24/7 Support
Try This ModelCompare models

Best For

Chat
Code Generation
Math

🚀 Capabilities

Long context
Vision
Structured output
JSON mode
Functions
Code
Streaming

Limitations

No image generation
No internet access

Specifications

Providerqwen
Context Window131,072 tokens
Max Output32,768 tokens
Minimum PlanBalance

Pricing

Input Price$0.1300 / 1M tokens
Output Price$0.5200 / 1M tokens

💡 With PRO subscription, cost is reduced by 20%

Ready to try Qwen: Qwen3 VL 30B A3B Instruct?

Get 1,000 tokens free on signup

Start for free