Inception: Mercury represents a breakthrough in large language models, being the first to utilize a discrete diffusion approach. This innovative architecture allows Mercury to achieve unparalleled speed, running 5-10 times faster than even highly optimized models such as GPT-4.1 Nano and Claude 3.5 Haiku, all while maintaining comparable performance levels. This exceptional speed makes Mercury an ideal choice for developers aiming to create highly responsive user experiences. It excels in applications requiring rapid interaction, including voice agents, dynamic search interfaces, and real-time chatbots. With a generous 128K token context window and a 4K token max output, Mercury supports complex conversations and detailed responses. It offers capabilities like functions, code generation, and streaming, making it versatile for various development needs. Pricing is competitive at $0.25/1.00 per 1M tokens (input/output), available on the STARTER access tier.
✅ Best For
🚀 Capabilities
❌ Limitations
Specifications
| Provider | inception |
| Context Window | 128,000 tokens |
| Max Output | 4,096 tokens |
| Minimum Plan | Balance |
Pricing
| Input Price | $0.2500 / 1M tokens |
| Output Price | $1.0000 / 1M tokens |
💡 With PRO subscription, cost is reduced by 20%