Economy

NVIDIA: Nemotron Nano 12B 2 VL (free)

Name: NVIDIA: Nemotron Nano 12B 2 VL (free)
Brand: nvidia
Rating: 3.8 (1 reviews)

NVIDIA Nemotron Nano 2 VL is a powerful, open 12-billion-parameter multimodal reasoning model, specifically engineered for advanced video understanding and comprehensive document intelligence. This model introduces an innovative hybrid Transformer-Mamba architecture, which skillfully combines the precision of transformers with the memory-efficient sequence modeling of Mamba. This results in significantly higher throughput and remarkably lower latency, making it ideal for demanding applications. Capable of processing both text and multi-image documents, Nemotron Nano 2 VL generates natural-language outputs. It has been rigorously trained on high-quality, NVIDIA-curated synthetic datasets, meticulously optimized for optical-character recognition (OCR), intricate chart reasoning, and broad multimodal comprehension. The model achieves leading results on OCRBench v2 and an impressive average score of ≈ 74 across benchmarks like MMMU, MathVista, AI2D, OCRBench, OCR-Reasoning, ChartQA, DocVQA, and Video-MME, outperforming previous open VL baselines. With Efficient Video Sampling (EVS), it effectively handles long-form videos while minimizing inference costs. This model is available for free, offering a generous 128K token context window and a 4K token max output. Its open-weights, training data, and fine-tuning recipes are released under a permissive NVIDIA open license, ensuring broad accessibility. Deployment is supported across NeMo, NIM, and major inference runtimes. Discover its capabilities for analysis and document processing today on Multi AI.

MultimodalVisionFreeDocument AI

75%Quality

128KContext Window

70%Speed