GLM-4.5-Air is the lightweight variant of Z.AI's latest flagship model family, purpose-built for agent-centric applications. Like GLM-4.5, it adopts the Mixture-of-Experts (MoE) architecture but with a more compact parameter size, making it efficient for various tasks. This model excels in scenarios requiring quick, responsive AI. It supports hybrid inference modes, offering a "thinking mode" for advanced reasoning and tool use, and a "non-thinking mode" for real-time interaction. Users can control the reasoning behaviour with the `reasoning` `enabled` boolean. With a context window of 131K tokens and a max output of 4K tokens, GLM-4.5-Air is priced at $0.05/0.22 per 1M tokens (input/output) and is available on the STARTER access tier. It supports functions, code, and streaming capabilities.
✅ Best For
🚀 Capabilities
❌ Limitations
Specifications
| Provider | z-ai |
| Context Window | 131,072 tokens |
| Max Output | 98,304 tokens |
| Minimum Plan | Balance |
Pricing
| Input Price | $0.1300 / 1M tokens |
| Output Price | $0.8500 / 1M tokens |
💡 With PRO subscription, cost is reduced by 20%