Gemini 2.5 Flash-Lite is a cutting-edge, lightweight reasoning model within the Gemini 2.5 family, specifically engineered for ultra-low latency and exceptional cost efficiency. This model significantly improves throughput and offers faster token generation compared to its predecessors, making it a prime choice for applications where speed is paramount. While 'thinking' (multi-pass reasoning) is disabled by default to maximize speed, developers have the flexibility to enable it via the Reasoning API parameter, allowing for a strategic trade-off between cost and intelligence. This powerful vision model boasts a substantial Context Window of 1048K tokens and a Max Output of 4K tokens, providing ample capacity for complex tasks. It supports a wide range of capabilities including vision, functions, code, and streaming, making it versatile for various use cases. Pricing is highly competitive at $0.10 per 1M input tokens and $0.40 per 1M output tokens, accessible via the STARTER tier on Multi AI. It excels in applications like chat, code generation, data analysis, and document processing.
✅ Best For
🚀 Capabilities
❌ Limitations
Specifications
| Provider | |
| Context Window | 1,048,576 tokens |
| Max Output | 4,096 tokens |
| Minimum Plan | Balance |
Pricing
| Input Price | $0.1000 / 1M tokens |
| Output Price | $0.4000 / 1M tokens |
💡 With PRO subscription, cost is reduced by 20%
Ready to try Google: Gemini 2.5 Flash Lite Preview 09-2025?
Get 1,000 tokens free on signup
Start for free