Model profile

Claude Mythos 5

Name: Claude Mythos 5
Author: Anthropic

AnthropicCurrentReleased Jun 9, 2026

Data verified July 24, 2026

Overall Score

83.01Public #2 of 215Verified #1 of 101

Arena Elo

Not listed

Eligible category ranks

3of 8

Price (1M tokens)

$10 in / $50 out

API pricing

Speed

Not listed

Context

1M+

Evidence coverage

15 of 369 tracked benchmarks are published. 15 are verified and 0 provisional. 6 of 8 categories are measured.

Updated July 24, 2026Methodology

Published / tracked: 15 / 369
Verified: 15
Provisional: 0
Categories with evidence: 6 / 8

Agentic4 benchmarks
Verified
Coding3 benchmarks
Verified
Reasoning0 benchmarks
Not measured
Knowledge3 benchmarks
Verified
Math1 benchmark
Verified
Multilingual1 benchmark
Verified
Multimodal3 benchmarks
Verified
Inst. Following0 benchmarks
Not measured

ProprietaryReasoning

Confidence:

Medium

restricted

Claude Mythos 5 ranks #2 out of 215 models on the public leaderboard with an overall score of 83.01/100. It also ranks #1 out of 101 on the verified leaderboard. This places it among the top tier of AI models available in 2026, competing directly with the strongest models from leading AI labs.

Claude Mythos 5 is a proprietary model with a 1M+ token context window. It uses explicit chain-of-thought reasoning, which typically improves performance on math and complex reasoning tasks at the cost of higher latency and token usage.

Claude Mythos 5 is restricted to Glasswing partners and trusted-access programs. Claude Fable 5 is the same underlying model made generally available with additional safeguards for cybersecurity, biology, chemistry, distillation, and frontier-AI-development requests.

This profile currently has 15 of 369 tracked benchmarks. BenchLM only exposes non-generated benchmark rows publicly, so missing categories stay blank until a sourced evaluation is available.

Its strongest category is Coding (#1), while its weakest is Agentic (#7). This performance profile makes it particularly well-suited for software development and code generation tasks.

Peer position

Exact provisional scores and ranks for the closest listed peers. A score can appear before a model clears the evidence threshold for a rank, so equal scores can have different rank states.

Range 76.6–85.88

Claude Opus 5
Anthropic
#185.88
Claude Opus 5 is #1 with a score of 85.88.
Compare
Claude Mythos 5Current model
Anthropic
#283.01
Claude Mythos 5 is #2 with a score of 83.01.
Claude Fable 5
Anthropic
#382.76
Claude Fable 5 is #3 with a score of 82.76.
Compare
GPT-5.6 Sol
OpenAI
#481.46
GPT-5.6 Sol is #4 with a score of 81.46.
Compare
Kimi K3
Moonshot AI
#579.98
Kimi K3 is #5 with a score of 79.98.
Compare
Claude Opus 4.8
Anthropic
#677.44
Claude Opus 4.8 is #6 with a score of 77.44.
Compare
Muse Spark 1.1
Meta
#776.6
Muse Spark 1.1 is #7 with a score of 76.6.
Compare

Category percentile

Relative position among models eligible for each sourced category. A higher percentile means a stronger position within that category's ranked cohort; 100 is highest.

Coding100%
Eligible cohort rank #1 of 130Category score 79.8
Agentic95%
Eligible cohort rank #7 of 129Category score 66.4
Multilingual0%
Eligible cohort rank #13 of 13Category score 0.0

Category evidence

Scores and ranks appear only where this model has published benchmark evidence. Categories without displayable source records remain not measured.

Category scores, ranks, weighting, benchmark coverage, and evidence status
Category	Score	Rank	Percentile	Weight	Benchmarks	Evidence
AgenticRank #7 of 129Percentile 95thWeight 22%4 benchmarksVerified	66.4	#7 of 129	95th	22%	4 benchmarks	Verified
CodingRank #1 of 130Percentile 100thWeight 20%3 benchmarksVerified	79.8	#1 of 130	100th	20%	3 benchmarks	Verified
ReasoningWeight 17%0 benchmarksNot measured	Not measured	Not ranked	Not available	17%	0 benchmarks	Not measured
KnowledgeRank Not rankedWeight 12%3 benchmarksVerified	93.8	Not ranked	Not available	12%	3 benchmarks	Verified
MathRank Not rankedWeight 5%1 benchmarkVerified	97.6	Not ranked	Not available	5%	1 benchmark	Verified
MultilingualRank #13 of 13Percentile 0thWeight 7%1 benchmarkVerified	0.0	#13 of 13	0th	7%	1 benchmark	Verified
MultimodalRank Not rankedWeight 12%3 benchmarksVerified	86.1	Not ranked	Not available	12%	3 benchmarks	Verified
Inst. FollowingWeight 5%0 benchmarksNot measured	Not measured	Not ranked	Not available	5%	0 benchmarks	Not measured

Benchmark Details

Rows below have a displayable published verification record. Each source link and provenance note remains in the page HTML while its category is closed. Source-unverified manual rows and generated rows stay hidden.

Agentic4 benchmarks

Terminal-Bench 2.0Provider exact

88%Weighted 38%

Source: Anthropic: Claude Fable 5 and Claude Mythos 5 system cardProvenance: Table 8.1.A reports Mythos 5 at 88.0% on Terminal-Bench 2.1. BenchLM maps this into the existing Terminal-Bench 2 slot until a separate 2.1 key exists.

OSWorld-VerifiedProvider exact

85%Weighted 34%

Source: Anthropic: Claude Fable 5 and Claude Mythos 5 system cardProvenance: Table 8.1.A reports Mythos 5 at 85.0% on OSWorld-Verified.

BrowseCompProvider exact

88%Weighted 28%

Source: Anthropic: Claude Fable 5 and Claude Mythos 5 system cardProvenance: Table 8.1.A and Section 8.14.2 report Mythos 5 at 88.0% on single-agent BrowseComp.

ExploitGymBenchmark exact

17.5%Display only

Source: ExploitGym paperProvenance: ExploitGym reports Claude Mythos Preview with Claude Code produced 157 successful exploits out of 898 tasks. BenchLM stores this as a percentage of the full suite.

Coding3 benchmarks

SWE-bench VerifiedProvider exact

Software Engineering Benchmark Verified

95.5%Weighted 16%

Source: Anthropic: Claude Fable 5 and Claude Mythos 5 system cardProvenance: Section 8.2 reports Mythos 5 at 95.5% on SWE-bench Verified, averaged over 5 trials.

SWE-bench ProProvider exact

80.3%Weighted 10%

Source: Anthropic: Claude Fable 5 and Claude Mythos 5 system cardProvenance: Section 8.2 reports Mythos 5 at 80.3% on SWE-bench Pro, averaged over 5 trials.

Terminal-Bench 2.0Provider exact

88.0%Display only

Source: Anthropic: Claude Fable 5 and Claude Mythos 5 system cardProvenance: Table 8.1.A reports Mythos 5 at 88.0% on Terminal-Bench 2.1. BenchLM stores this in the existing Terminal-Bench display key.

Knowledge3 benchmarks

HLEProvider exact

Humanity's Last Exam

64.5%Weighted 45%

Source: Anthropic: Claude Fable 5 and Claude Mythos 5 system cardProvenance: Table 8.1.A and Section 8.14.1 report Mythos 5 at 64.5% on Humanity's Last Exam with tools.

GPQAProvider exact

Graduate-Level Google-Proof Q&A

94.1%Weighted 7%

Source: Anthropic: Claude Fable 5 and Claude Mythos 5 system cardProvenance: Section 8.8 reports Mythos 5 at 94.1% on GPQA Diamond; BenchLM stores the exact value on the weighted GPQA lane.

HLE w/o toolsProvider exact

Humanity's Last Exam without tools

59%Display only

Source: Anthropic: Claude Fable 5 and Claude Mythos 5 system cardProvenance: Table 8.1.A and Section 8.14.1 report Mythos 5 at 59.0% on Humanity's Last Exam without tools.

Math1 benchmark

USAMO 2026Provider exact

United States of America Mathematical Olympiad 2026

97.6%Weighted 10%

Source: Anthropic: Claude Fable 5 and Claude Mythos 5 system cardProvenance: Section 8.10 reports Mythos 5 at 99.8% on USAMO 2026 at medium, high, and xhigh reasoning effort, averaged over 10 attempts per problem.

Multilingual1 benchmark

SWE MultilingualProvider exact

SWE-bench Multilingual

92.2%Display only

Source: Anthropic: Claude Fable 5 and Claude Mythos 5 system cardProvenance: Section 8.2 reports Mythos 5 at 92.2% on SWE-bench Multilingual, averaged over 5 trials.

Multimodal3 benchmarks

CharXivProvider exact

CharXiv Reasoning

93.5%Weighted 25%

Source: Anthropic: Claude Fable 5 and Claude Mythos 5 system cardProvenance: Table 8.1.A reports Mythos 5 at 93.5% on CharXiv Reasoning with tools.

SWE-bench MultimodalProvider exact

54.9%Display only

Source: Anthropic: Claude Fable 5 and Claude Mythos 5 system cardProvenance: Section 8.2 reports Mythos 5 at 54.9% on SWE-bench Multimodal.

CharXiv w/o toolsProvider exact

CharXiv Reasoning without tools

88.9%Display only

Source: Anthropic: Claude Fable 5 and Claude Mythos 5 system cardProvenance: Table 8.1.A reports Mythos 5 at 88.9% on CharXiv Reasoning without tools.

Frequently Asked Questions

How does Claude Mythos 5 perform overall in AI benchmarks?

Claude Mythos 5 currently ranks #2 out of 215 models on BenchLM's provisional leaderboard with an overall score of 83.01. It also ranks #1 out of 101 on the verified leaderboard. It is created by Anthropic. Its published context window is 1M+.

Is Claude Mythos 5 good for knowledge and understanding?

Claude Mythos 5 has visible benchmark coverage in knowledge and understanding, but BenchLM does not currently assign it a global category rank there.

Is Claude Mythos 5 good for coding and programming?

Claude Mythos 5 ranks #1 out of 130 models in coding and programming benchmarks with an average score of 79.8. It is among the top performers in this category.

Is Claude Mythos 5 good for mathematics?

Claude Mythos 5 has visible benchmark coverage in mathematics, but BenchLM does not currently assign it a global category rank there.

Is Claude Mythos 5 good for agentic tool use and computer tasks?

Claude Mythos 5 ranks #7 out of 129 models in agentic tool use and computer tasks benchmarks with an average score of 66.4. It is among the top performers in this category.

Is Claude Mythos 5 good for multimodal and grounded tasks?

Claude Mythos 5 has visible benchmark coverage in multimodal and grounded tasks, but BenchLM does not currently assign it a global category rank there.

Is Claude Mythos 5 good for multilingual tasks?

Claude Mythos 5 ranks #13 out of 13 models in multilingual tasks benchmarks with an average score of 0. There are stronger options in this category.

Does Claude Mythos 5 have full benchmark coverage on BenchLM?

Not yet. Claude Mythos 5 currently has 15 published benchmark scores out of the 369 benchmarks BenchLM tracks. BenchLM only exposes non-generated public benchmark rows, so missing categories stay blank until a sourced evaluation is available.

What is the context window size of Claude Mythos 5?

Claude Mythos 5 has a published context window of 1M+, which determines how much text it can process in a single interaction.

Related Resources

Last updated: July 24, 2026 · Runtime metrics stay blank until BenchLM has a sourced snapshot.