Q2
Economy

Qwen: Qwen2.5-VL 7B Instruct

by qwen

Qwen2.5-VL 7B Instruct, from the Qwen Team, is a highly advanced multimodal large language model designed for superior visual understanding. It achieves state-of-the-art performance across various visual benchmarks, including MathVista, DocVQA, RealWorldQA, and MTVQA, demonstrating exceptional comprehension of images regardless of resolution or aspect ratio. Beyond static images, Qwen2.5-VL 7B Instruct can understand videos exceeding 20 minutes, enabling high-quality video-based question answering, dialogue, and content creation. Its advanced reasoning and decision-making capabilities allow it to function as an agent, operating mobile devices or robots based on visual environments and text instructions. The model also offers robust multilingual support, understanding texts in images across numerous languages, including European languages, Japanese, Korean, Arabic, and Vietnamese. It features a 32K token context window and a 4K token max output, priced at $0.20/0.20 per 1M tokens (input/output). Access this powerful vision model for free on Multi AI. Usage of this model is subject to Tongyi Qianwen LICENSE AGREEMENT.

Vision AIMultimodalVideo AnalysisAgentic AIFree Tier
67%Quality
33KContext Window
75%Speed
Category
Economy
API access
Unified context
RAG + Knowledge Base
24/7 Support
Try This ModelCompare models

Best For

Chat
Code
Math

🚀 Capabilities

Vision
Streaming

Limitations

No image generation
No internet access

Specifications

Providerqwen
Context Window32,768 tokens
Max Output4,096 tokens
Minimum PlanEconomy

Pricing

Input Price$0.2000 / 1M tokens
Output Price$0.2000 / 1M tokens

💡 With PRO subscription, cost is reduced by 20%

Ready to try Qwen: Qwen2.5-VL 7B Instruct?

Get 1,000 tokens free on signup

Start for free