Economy

Qwen: Qwen2.5-VL 7B Instruct

Name: Qwen: Qwen2.5-VL 7B Instruct
Brand: qwen
Price: 200 USD
Rating: 3.4 (1 reviews)

Qwen2.5-VL 7B Instruct, from the Qwen Team, is a highly advanced multimodal large language model designed for superior visual understanding. It achieves state-of-the-art performance across various visual benchmarks, including MathVista, DocVQA, RealWorldQA, and MTVQA, demonstrating exceptional comprehension of images regardless of resolution or aspect ratio. Beyond static images, Qwen2.5-VL 7B Instruct can understand videos exceeding 20 minutes, enabling high-quality video-based question answering, dialogue, and content creation. Its advanced reasoning and decision-making capabilities allow it to function as an agent, operating mobile devices or robots based on visual environments and text instructions. The model also offers robust multilingual support, understanding texts in images across numerous languages, including European languages, Japanese, Korean, Arabic, and Vietnamese. It features a 32K token context window and a 4K token max output, priced at $0.20/0.20 per 1M tokens (input/output). Access this powerful vision model for free on Multi AI. Usage of this model is subject to Tongyi Qianwen LICENSE AGREEMENT.

Vision AIMultimodalVideo AnalysisAgentic AIFree Tier

67%Quality

33KContext Window

75%Speed