N1
Balance

NVIDIA: Nemotron Nano 12B 2 VL

by nvidia

NVIDIA Nemotron Nano 12B 2 VL is a cutting-edge 12-billion-parameter open multimodal reasoning model, specifically engineered for advanced video understanding and document intelligence tasks. This model introduces an innovative hybrid Transformer-Mamba architecture, which masterfully combines the high accuracy of traditional Transformers with the memory-efficient sequence modeling capabilities of Mamba. This results in significantly higher throughput and remarkably lower latency, making it ideal for demanding applications. The model processes both text and multi-image documents, generating natural-language outputs. It has been rigorously trained on high-quality, NVIDIA-curated synthetic datasets, meticulously optimized for optical-character recognition (OCR), complex chart reasoning, and comprehensive multimodal comprehension. Nemotron Nano 2 VL achieves leading results on OCRBench v2 and scores an impressive ≈ 74 average across key benchmarks like MMMU, MathVista, AI2D, OCRBench, OCR-Reasoning, ChartQA, DocVQA, and Video-MME, consistently outperforming prior open VL baselines. With Efficient Video Sampling (EVS), it adeptly handles long-form videos while substantially reducing inference costs. Key specifications include a generous Context Window of 131K tokens and a Max Output of 4K tokens. Pricing is competitive at $0.20 per 1M input tokens and $0.60 per 1M output tokens. It supports vision and streaming capabilities, making it an excellent choice for analysis and document processing. Open-weights, training data, and fine-tuning recipes are available under a permissive NVIDIA open license, with deployment supported across NeMo, NIM, and major inference runtimes. Access this STARTER tier model on Multi AI today.

multimodalvisiondocument AIvideo analysisopen source
72%Quality
131KContext Window
70%Speed
Category
Economy
API access
Unified context
RAG + Knowledge Base
24/7 Support
Try This ModelCompare models

Best For

Analysis
Documents

🚀 Capabilities

Vision
Streaming

Limitations

No image generation

Specifications

Providernvidia
Context Window131,072 tokens
Max Output4,096 tokens
Minimum PlanBalance

Pricing

Input Price$0.2000 / 1M tokens
Output Price$0.6000 / 1M tokens

💡 With PRO subscription, cost is reduced by 20%

Ready to try NVIDIA: Nemotron Nano 12B 2 VL?

Get 1,000 tokens free on signup

Start for free