NVIDIA: Llama 3.3 Nemotron Super 49B V1.5 is a powerful 49B-parameter, English-centric reasoning and chat model. It's built upon Meta’s Llama-3.3-70B-Instruct and features an expansive 128K context window. This model is meticulously post-trained for agentic workflows, including Retrieval Augmented Generation (RAG) and robust tool calling, through Supervised Fine-Tuning (SFT) across diverse domains like math, code, science, and multi-turn chat. Further refinement comes from multiple Reinforcement Learning (RL) stages, including Reward-aware Preference Optimization (RPO) for alignment, RL with Verifiable Rewards (RLVR) for step-wise reasoning, and iterative DPO to perfect tool-use behavior. The model boasts impressive internal evaluation results, such as MATH500 pass@1 = 97.4 and LiveCodeBench = 73.58, demonstrating strong reasoning and coding capabilities. It's designed for practical inference efficiency, offering high tokens/s and reduced VRAM, supporting single-GPU (H100/H200) deployment via Transformers/vLLM. With a context window of 131K tokens and a max output of 4K tokens, it's ideal for building agents, assistants, and long-context retrieval systems where balanced accuracy-to-cost and reliable tool use are critical. Pricing is $0.10/0.40 per 1M tokens (input/output) under the STARTER access tier.
✅ Best For
🚀 Capabilities
❌ Limitations
Specifications
| Provider | nvidia |
| Context Window | 131,072 tokens |
| Max Output | 4,096 tokens |
| Minimum Plan | Balance |
Pricing
| Input Price | $0.1000 / 1M tokens |
| Output Price | $0.4000 / 1M tokens |
💡 With PRO subscription, cost is reduced by 20%
Ready to try NVIDIA: Llama 3.3 Nemotron Super 49B V1.5?
Get 1,000 tokens free on signup
Start for free