GLM-4.6V is a cutting-edge large multimodal model engineered for exceptional visual understanding and advanced long-context reasoning. It excels at processing diverse inputs, including images, complex documents, and mixed media, making it ideal for intricate analytical tasks. This model boasts a substantial context window of 131K tokens and a max output of 4K tokens, enabling it to handle extensive information. It processes complex page layouts and charts directly as visual inputs and integrates native multimodal function calling, seamlessly connecting perception with downstream tool execution. Additionally, GLM-4.6V supports interleaved image-text generation and UI reconstruction workflows, such as screenshot-to-HTML synthesis and iterative visual editing. Pricing is set at $0.30 per 1M input tokens and $0.90 per 1M output tokens, accessible via the STARTER tier.
✅ Best For
🚀 Capabilities
❌ Limitations
Specifications
| Provider | z-ai |
| Context Window | 131,072 tokens |
| Max Output | 4,096 tokens |
| Minimum Plan | Balance |
Pricing
| Input Price | $0.3000 / 1M tokens |
| Output Price | $0.9000 / 1M tokens |
💡 With PRO subscription, cost is reduced by 20%