GLM-4.5-Air is the lightweight variant of Z.AI's latest flagship model family, purpose-built for agent-centric applications. Like GLM-4.5, it adopts the Mixture-of-Experts (MoE) architecture but with a more compact parameter size, making it efficient while retaining powerful capabilities. This model is designed to excel in scenarios requiring intelligent agents. It supports hybrid inference modes, offering a 'thinking mode' for advanced reasoning and tool use, and a 'non-thinking mode' for real-time interaction. Users can control the reasoning behaviour with the `reasoning` `enabled` boolean. With a generous 131K token context window and 4K token max output, GLM-4.5-Air is ideal for chat and complex conversational flows. It supports functions, code, and streaming capabilities. Access this powerful model for free on Multi AI, with pricing at $0.00 per 1M tokens for both input and output. It's a cost-effective solution for developers and businesses looking to integrate advanced AI into their applications without incurring costs.
✅ Best For
🚀 Capabilities
❌ Limitations
Specifications
| Provider | z-ai |
| Context Window | 131,072 tokens |
| Max Output | 96,000 tokens |
| Minimum Plan | Economy |
Pricing
| Input Price | Free / 1M tokens |
| Output Price | Free / 1M tokens |
💡 With PRO subscription, cost is reduced by 20%