Model profile · NVIDIA

Nemotron 3 Ultra

Name: Nemotron 3 Ultra
Author: NVIDIA

CurrentReleased Jun 4, 2026Open WeightReasoning1M context

Nemotron 3 Ultra scores 43.6 out of 100 and ranks #160 of 214. This profile shows 19 source-displayable benchmark rows; its strongest eligible category is Instruction Following at #5. Published weights can be self-hosted, but infrastructure cost varies and is not a comparable API token rate.

Data as of July 31, 2026 · How the score is built

Compare Nemotron 3 Ultra Find alternatives

Strongest published evidence

Instruction Following ranks #5. A well-rounded choice across a range of tasks.

Validate before choosing

19 published rows leave some tracked benchmark slots empty. No comparable first-party API token rate is published.

Decision snapshot

Each value carries a field reference instead of floating alone. Markers compare this model with the current ranked and priced catalog; they are not absolute quality thresholds.

Capability

43.6/100

field median 57.3

#160 of 214 ranked models

Price

Self-hosted; infrastructure cost varies

input median $1

No comparable first-party hosted token rate

Speed

Not measured

field median 108 tok/s

Time to first token not measured

Context

1Mtokens

field median 256,000

Reported for this model; direct source link not stored

Capability shape

Each axis shows percentile within that category’s eligible cohort. The comparison outline is the median of the six nearest public-score peers; a collapsed vertex means the category is not rank-eligible.

Agentic8th percentile
Coding25th percentile
ReasoningNot eligible
Knowledge9th percentile
MathNot eligible
Multilingual50th percentile
MultimodalNot eligible
Instruction following87th percentile

The dashed outline is median of 6 nearest peers.

Top decileTop quartileMid-fieldNot eligible

How much of this is verified

Coverage is split by category so a strong number never hides a thin evidence base. Verified means the row is tied to a published source; provisional rows remain visible but separate.

Agentic5/5 verified
Coding5/5 verified
Reasoning2/2 verified
Knowledge5/5 verified
MathNot measured
Multilingual1/1 verified
MultimodalNot measured
Inst. Following1/1 verified

Verified sourceProvisionalNot measured

Spec sheet

Each documented value carries its source. Missing fields stay visible as not sourced or not published, rather than disappearing from the page.

API model ID: Not published
Context window: 1M
Maximum output: Not sourced yet
Knowledge cutoff: Not sourced yet
Input modalities: Not sourced yet
Output modalities: Not sourced yet
Parameters: Not sourced yet

Availability: Not sourced yet
Cloud regions: Not tracked yet
Lifecycle: Current
API capabilities: Tool calling, structured outputs, and batch support are not tracked yet
Prompt caching: Not documented in the pricing record
Self-host: Open weights available; hardware estimate not sourced
Rate limits: Not tracked yet

Deployment options

Self-host and provider-specific paths stay separate from benchmark evidence so operating constraints are visible before a score becomes the whole decision.

Published weights are available, but BenchLM does not yet have a sourced parameter and VRAM profile for this exact model. Hardware cost estimates stay unavailable until that sizing record is complete.

Estimate VRAM from known parameters

Category score record

Scores and ranks appear only where published evidence can be displayed. The table keeps the score, weight, cohort, and evidence state together.

Category scores, ranks, weighting, benchmark coverage, and evidence status
Category	Score	Rank	Percentile	Weight	Benchmarks	Evidence
AgenticRank #118 of 128Percentile 8thWeight 22%5 benchmarksVerified	33.0	#118 of 128	8th	22%	5 benchmarks	Verified
CodingRank #97 of 129Percentile 25thWeight 20%5 benchmarksVerified	43.4	#97 of 129	25th	20%	5 benchmarks	Verified
ReasoningRank Not rankedWeight 17%2 benchmarksVerified	85.1	Not ranked	Not available	17%	2 benchmarks	Verified
KnowledgeRank #50 of 55Percentile 9thWeight 12%5 benchmarksVerified	53.1	#50 of 55	9th	12%	5 benchmarks	Verified
MathWeight 5%0 benchmarksNot measured	Not measured	Not ranked	Not available	5%	0 benchmarks	Not measured
MultilingualRank #7 of 13Percentile 50thWeight 7%1 benchmarkVerified	47.4	#7 of 13	50th	7%	1 benchmark	Verified
MultimodalWeight 12%0 benchmarksNot measured	Not measured	Not ranked	Not available	12%	0 benchmarks	Not measured
Inst. FollowingRank #5 of 32Percentile 87thWeight 5%1 benchmarkVerified	91.9	#5 of 32	87th	5%	1 benchmark	Verified

Benchmark ledger

Coding opens by default. The marker compares each value with the best source-verified result in the catalog; provisional leaders do not set the reference. Expand the remaining categories for every published row.

Coding5 rows

Coding benchmark values, best verified comparison, weight, and source status
Benchmark	Score	Versus best verified row	Gap	Weight	Evidence
SWE-bench VerifiedSoftware Engineering Benchmark Verified	Score71.9%	Versus best verified row Best verified: Claude Opus 5 · 96%	Gap24.1 behind	WeightWeighted 16%	Provider exact NVIDIA: Nemotron 3 Ultra Hugging Face model card
SciCodeScientific Code Benchmark	Score44.6%	Versus best verified row Best verified: Sakana Fugu · 60.1%	Gap15.5 behind	WeightWeighted 16%	Provider exact NVIDIA: Nemotron 3 Ultra Hugging Face model card
SWE Multilingual	Score67.7%	Versus best verified row Best verified: Claude Opus 5 · 89.5%	Gap21.8 behind	WeightDisplay only	Provider exact NVIDIA: Nemotron 3 Ultra Hugging Face model card
LiveCodeBench v6	Score89.0%	Versus best verified row Best verified: Sakana Fugu-Ultra · 93.2%	Gap4.2 behind	WeightDisplay only	Provider exact NVIDIA: Nemotron 3 Ultra Hugging Face model card
Terminal-Bench 2.0	Score56.4%	Versus best verified row Best verified: GPT-5.6 Sol · 91.9%	Gap35.5 behind	WeightDisplay only	Provider exact NVIDIA: Nemotron 3 Ultra Hugging Face model card

Agentic5 rows

Agentic benchmark values, best verified comparison, weight, and source status
Benchmark	Score	Versus best verified row	Gap	Weight	Evidence
Terminal-Bench 2.0	Score56.4%	Versus best verified row Best verified: GPT-5.6 Sol · 91.9%	Gap35.5 behind	WeightWeighted 38%	Provider exact NVIDIA: Nemotron 3 Ultra Hugging Face model card
BrowseComp	Score44.4%	Versus best verified row Best verified: GPT-5.6 Sol · 92.2%	Gap47.8 behind	WeightWeighted 28%	Provider exact NVIDIA: Nemotron 3 Ultra Hugging Face model card
PinchBench	Score90.0%	Versus best verified row Best verified: Nemotron 3 Ultra · 90.0%	GapBest verified	WeightDisplay only	Provider exact NVIDIA: Nemotron 3 Ultra Hugging Face model card
τ³-bench resultsτ³-Bench Tool-Agent-User Evaluation	Score70.9%	Versus best verified row Best verified: Mistral Medium 3.5 128B · 91.4%	Gap20.5 behind	WeightDisplay only	Provider exact NVIDIA: Nemotron 3 Ultra Hugging Face model card
HLE w/ toolsHumanity's Last Exam with tools	Score37.4%	Versus best verified row Best verified: Claude Opus 5 · 64.7%	Gap27.3 behind	WeightDisplay only	Provider exact NVIDIA: Nemotron 3 Ultra Hugging Face model card

Reasoning2 rows

Reasoning benchmark values, best verified comparison, weight, and source status
Benchmark	Score	Versus best verified row	Gap	Weight	Evidence
LongBench v2	Score61.9%	Versus best verified row Best verified: Qwen3.5 397B · 63.2%	Gap1.3 behind	WeightWeighted 38%	Provider exact NVIDIA: Nemotron 3 Ultra Hugging Face model card
CritPtCritical Physics Tasks	Score3.1%	Versus best verified row Best verified: GLM-5.2 · 20.9%	Gap17.8 behind	WeightDisplay only	Provider exact NVIDIA: Nemotron 3 Ultra Hugging Face model card

Knowledge5 rows

Knowledge benchmark values, best verified comparison, weight, and source status
Benchmark	Score	Versus best verified row	Gap	Weight	Evidence
HLEHumanity's Last Exam	Score26.7%	Versus best verified row Best verified: Claude Opus 5 · 64.7%	Gap38 behind	WeightWeighted 45%	Provider exact NVIDIA: Nemotron 3 Ultra Hugging Face model card
MMLU-ProMassive Multitask Language Understanding Professional	Score86.8%	Versus best verified row Best verified: Qwen3.7 Max · 89.6%	Gap2.8 behind	WeightWeighted 30%	Provider exact NVIDIA: Nemotron 3 Ultra Hugging Face model card
GPQAGraduate-Level Google-Proof Q&A	Score87%	Versus best verified row Best verified: Sakana Fugu-Ultra · 95.5%	Gap8.5 behind	WeightWeighted 7%	Provider exact NVIDIA: Nemotron 3 Ultra Hugging Face model card
GPQA-DGPQA Diamond	Score87.0%	Versus best verified row Best verified: Sakana Fugu-Ultra · 95.5%	Gap8.5 behind	WeightDisplay only	Provider exact NVIDIA: Nemotron 3 Ultra Hugging Face model card
HLE w/o toolsHumanity's Last Exam without tools	Score26.7%	Versus best verified row Best verified: Claude Mythos 5 · 59%	Gap32.3 behind	WeightDisplay only	Provider exact NVIDIA: Nemotron 3 Ultra Hugging Face model card

Multilingual1 row

Multilingual benchmark values, best verified comparison, weight, and source status
Benchmark	Score	Versus best verified row	Gap	Weight	Evidence
MMLU-ProX	Score83%	Versus best verified row Best verified: Qwen3.7 Max · 87%	Gap4 behind	WeightWeighted 100%	Provider exact NVIDIA: Nemotron 3 Ultra Hugging Face model card

Inst. Following1 row

Inst. Following benchmark values, best verified comparison, weight, and source status
Benchmark	Score	Versus best verified row	Gap	Weight	Evidence
IFBenchInstruction Following Benchmark	Score81.7%	Versus best verified row Best verified: MAI-Thinking-1 · 85%	Gap3.3 behind	WeightWeighted 65%	Provider exact NVIDIA: Nemotron 3 Ultra Hugging Face model card

Lineage

The sequence follows explicit supersedes links. Scores and prices remain blank when the corresponding public row or first-party rate is unavailable.

Jun 4, 2026 · you are here

Nemotron 3 Ultra

Score 43.6 · Price not listed

Base entry

How to read this profile

The visual layer above carries the decisions. These notes preserve the model, ranking, coverage, and family context behind the numbers.

Nemotron 3 Ultra ranks #160 of 214 on the public leaderboard with a score of 43.56/100. It does not yet have enough sourced coverage for a verified position.

Nemotron 3 Ultra is a open weight model with a 1M context window. It uses an explicit reasoning mode, which can improve complex problem solving while adding latency and token use.

Official exact-value snapshot from the nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16 Hugging Face model card. BenchLM maps directly comparable values from NVIDIA's table and keeps provider-specific rows display-only where appropriate. NVIDIA's earlier announcement used approximate 500B/50B wording; the released checkpoint is 550B total / 55B active.

19 of 376 tracked benchmark slots currently have displayable evidence. Missing categories stay blank.

Its strongest eligible category is Instruction Following at #5, while its lowest eligible position is Agentic at #118. a well-rounded choice across a range of tasks.

Radar

Nemotron 3 Ultra release history

Full release history

Nemotron 3 Ultra released

NVIDIA · Model release

Source confirmedJun 4, 2026

Frequently asked questions

How does Nemotron 3 Ultra perform overall in AI benchmarks?

Nemotron 3 Ultra ranks #160 out of 214 models on the public BenchAlign leaderboard, with a score of 43.56/100. Its evidence status is Estimated, and this profile shows 19 source-displayable benchmark rows. The label describes evidence depth, not a provider quality claim; inspect category rows before choosing a workload.

Is Nemotron 3 Ultra good for knowledge and understanding?

Nemotron 3 Ultra ranks #50 out of 55 eligible models for knowledge and understanding, with a public category score of 53.1/100. Higher-ranked alternatives are available for workloads where this category decides the choice. Check the underlying rows before treating the aggregate as a workload guarantee.

Is Nemotron 3 Ultra good for coding and programming?

Nemotron 3 Ultra ranks #97 out of 129 eligible models for coding and programming, with a public category score of 43.4/100. Higher-ranked alternatives are available for workloads where this category decides the choice. Check the underlying rows before treating the aggregate as a workload guarantee.

Is Nemotron 3 Ultra good for reasoning and logic?

Nemotron 3 Ultra has source-displayable benchmark coverage for reasoning and logic, but the public category table does not assign it a rank there. The individual rows remain available for inspection. A missing category position means the evidence threshold was not met; it does not convert the model's unmeasured work into a zero.

Is Nemotron 3 Ultra good for agentic tool use and computer tasks?

Nemotron 3 Ultra ranks #118 out of 128 eligible models for agentic tool use and computer tasks, with a public category score of 33/100. Higher-ranked alternatives are available for workloads where this category decides the choice. Check the underlying rows before treating the aggregate as a workload guarantee.

Is Nemotron 3 Ultra good for instruction following?

Nemotron 3 Ultra ranks #5 out of 32 eligible models for instruction following, with a public category score of 91.9/100. That places it in the current top ten for this category. Check the underlying rows before treating the aggregate as a workload guarantee.

Is Nemotron 3 Ultra good for multilingual tasks?

Nemotron 3 Ultra ranks #7 out of 13 eligible models for multilingual tasks, with a public category score of 47.4/100. That places it in the current top ten for this category. Check the underlying rows before treating the aggregate as a workload guarantee.

Is Nemotron 3 Ultra open source?

Nemotron 3 Ultra is an open-weight model from NVIDIA. Its weights can be downloaded for local or hosted deployment, subject to the published license. Open weight does not automatically mean open source: training data and training code may remain private, and commercial restrictions can still apply.

Does Nemotron 3 Ultra have full benchmark coverage on BenchLM?

No. Nemotron 3 Ultra currently has 39 source-displayable rows across 376 tracked benchmark slots. The profile exposes published, non-generated evidence and leaves missing categories blank until an exact evaluation is available. Coverage describes how much was measured; it is not a penalty added to an individual benchmark result.

What is the context window size of Nemotron 3 Ultra?

Nemotron 3 Ultra has a reported context window of 1M in the exact-model catalog record. The value stays visible, but the profile marks its source link as unavailable instead of presenting it as directly documented. Maximum output length remains separate because providers often publish a different limit.

Last updated July 31, 2026. Runtime fields remain blank until a sourced snapshot exists.

Watch Nemotron 3 Ultra in the weekly brief

Get one weekly email when material rank, price, availability, or benchmark evidence changes are worth revisiting.

Read a sample issue

Join 2,000+ readers.

Nemotron 3 Ultra

Strongest published evidence

Validate before choosing

Decision snapshot

Capability shape

Eligible category ranks

How much of this is verified

Spec sheet

Deployment options

Category score record

Benchmark ledger

Lineage

How to read this profile

Nemotron 3 Ultra release history

Frequently asked questions

Watch Nemotron 3 Ultra in the weekly brief