The InternVL3 series, specifically the 78B variant, represents an advanced multimodal large language model (MLLM) developed by OpenGVLab. This model significantly enhances multimodal perception and reasoning capabilities compared to its predecessor, InternVL 2.5. It is designed to handle complex tasks requiring deep understanding across various data types. InternVL3 78B is benchmarked against the Qwen2.5 Chat models, utilizing their pre-trained base models for its language component. Thanks to Native Multimodal Pre-Training, the InternVL3 series outperforms the Qwen2.5 series in overall text performance. It supports a context window of 32K tokens and a maximum output of 4K tokens. Capabilities include vision, code, and streaming. Pricing is set at $0.10 per 1M input tokens and $0.39 per 1M output tokens, available on the STARTER access tier. It is best suited for analysis and documents, though it does not support image generation.
✅ Best For
🚀 Capabilities
❌ Limitations
Specifications
| Provider | opengvlab |
| Context Window | 32,768 tokens |
| Max Output | 4,096 tokens |
| Minimum Plan | Balance |
Pricing
| Input Price | $0.1000 / 1M tokens |
| Output Price | $0.3900 / 1M tokens |
💡 With PRO subscription, cost is reduced by 20%