Molmo2-8B is an advanced open vision-language model developed by the Allen Institute for AI (Ai2) as a key part of the Molmo2 family. This model is specifically designed to support comprehensive image, video, and multi-image understanding, along with robust grounding capabilities. Built upon the powerful Qwen3-8B architecture and utilizing SigLIP 2 as its vision backbone, Molmo2-8B sets a new standard for open-weight, open-data models. It significantly outperforms competitors in tasks involving short videos, counting, and captioning, while maintaining competitive performance on longer video tasks. With a generous context window of 36K tokens and a maximum output of 36K tokens, it offers extensive processing capacity. Pricing is competitive at $0.20 per 1M input tokens and $0.20 per 1M output tokens. This model is available on a FREE access tier, making advanced AI vision capabilities accessible to all.
✅ Best For
🚀 Capabilities
Specifications
| Provider | allenai |
| Context Window | 36,864 tokens |
| Max Output | 36,864 tokens |
| Minimum Plan | Economy |
Pricing
| Input Price | $0.2000 / 1M tokens |
| Output Price | $0.2000 / 1M tokens |
💡 With PRO subscription, cost is reduced by 20%