Balance

NVIDIA: Llama 3.3 Nemotron Super 49B V1.5

Name: NVIDIA: Llama 3.3 Nemotron Super 49B V1.5
Brand: nvidia
Price: 100 USD
Rating: 3.6 (1 reviews)

NVIDIA: Llama 3.3 Nemotron Super 49B V1.5 is a powerful 49B-parameter, English-centric reasoning and chat model. It's built upon Meta’s Llama-3.3-70B-Instruct and features an expansive 128K context window. This model is meticulously post-trained for agentic workflows, including Retrieval Augmented Generation (RAG) and robust tool calling, through Supervised Fine-Tuning (SFT) across diverse domains like math, code, science, and multi-turn chat. Further refinement comes from multiple Reinforcement Learning (RL) stages, including Reward-aware Preference Optimization (RPO) for alignment, RL with Verifiable Rewards (RLVR) for step-wise reasoning, and iterative DPO to perfect tool-use behavior. The model boasts impressive internal evaluation results, such as MATH500 pass@1 = 97.4 and LiveCodeBench = 73.58, demonstrating strong reasoning and coding capabilities. It's designed for practical inference efficiency, offering high tokens/s and reduced VRAM, supporting single-GPU (H100/H200) deployment via Transformers/vLLM. With a context window of 131K tokens and a max output of 4K tokens, it's ideal for building agents, assistants, and long-context retrieval systems where balanced accuracy-to-cost and reliable tool use are critical. Pricing is $0.10/0.40 per 1M tokens (input/output) under the STARTER access tier.

TextAgentic AIReasoningTool Use

72%Quality

131KContext Window

70%Speed