B4
Balance

Baidu: ERNIE 4.5 VL 424B A47B

by baidu

Baidu ERNIE 4.5 VL 424B A47B is a cutting-edge multimodal Mixture-of-Experts (MoE) model from Baidu’s ERNIE 4.5 series. With 424B total parameters and 47B active per token, it is jointly trained on text and image data using a heterogeneous MoE architecture and modality-isolated routing. This enables exceptional cross-modal reasoning, detailed image understanding, and long-context generation, supporting up to 131,000 tokens. Fine-tuned with advanced techniques including SFT, DPO, UPO, and RLVR, ERNIE 4.5 VL 424B A47B supports both “thinking” and non-thinking inference modes. It is specifically designed for complex vision-language tasks in both English and Chinese, offering optimized performance and efficient scaling. The model can operate under 4-bit/8-bit quantization, making it versatile for various applications. It has a context window of 123K tokens and a max output of 4K tokens. Pricing is set at $0.42 per 1M input tokens and $1.25 per 1M output tokens, available on the STARTER access tier. Key capabilities include vision and streaming, making it ideal for analysis and document processing. Please note that this model does not support image generation.

Multimodal AIVision-LanguageERNIE 4.5Baidu AILarge Language Model
70%Quality
123KContext Window
70%Speed
Category
Standard
API access
Unified context
RAG + Knowledge Base
24/7 Support
Try This ModelCompare models

Best For

Analysis
Documents

🚀 Capabilities

Long context
Vision
Streaming

Limitations

No image generation

Specifications

Providerbaidu
Context Window123,000 tokens
Max Output16,000 tokens
Minimum PlanBalance

Pricing

Input Price$0.4200 / 1M tokens
Output Price$1.2500 / 1M tokens

💡 With PRO subscription, cost is reduced by 20%

Ready to try Baidu: ERNIE 4.5 VL 424B A47B?

Get 1,000 tokens free on signup

Start for free