Vals-hosted LiveCodeBench mirror (Vals LiveCodeBench mirror)

Name: Vals-hosted LiveCodeBench mirror
Creator: BenchLM

Vals AI implementation of LiveCodeBench with easy, medium, and hard task splits.

How BenchLM shows Vals LiveCodeBench mirror

BenchLM mirrors the public Vals AI Vals LiveCodeBench mirror leaderboard captured from https://www.vals.ai/benchmarks/lcb?suggested=open-weights-table and updated by Vals on June 17, 2026. The snapshot preserves overall scores, uncertainty, latency, cost-per-test metadata, and task-level scores where Vals publishes them.

Vals LiveCodeBench mirror is display only on BenchLM. Vals proprietary or Vals-hosted aggregate views are useful context, but BenchLM does not use them as weighted ranking inputs or as a replacement for benchmark-native source records.

122 Vals rows4 task viewspublic datasetTasks: Overall, Easy, Medium, HardDisplay only

Vals LiveCodeBench mirror on Vals AI Vals methodology Vals home

Vals LiveCodeBench score on Vals LiveCodeBench mirror — June 17, 2026

BenchLM mirrors the published vals livecodebench score view for Vals LiveCodeBench mirror. Claude Fable 5 leads the public snapshot at 89.78% , followed by Gemini 3.1 Pro Preview (88.48%) and GPT-5.2 Codex (87.99%). BenchLM does not use these results to rank models overall.

Claude Fable 5

Anthropic

anthropic/claude-fable-5

89.78%

Overall —

Gemini 3.1 Pro Preview

Google

google/gemini-3.1-pro-preview

88.48%

Overall —

GPT-5.2 Codex

OpenAI

openai/gpt-5.2-codex

87.99%

Overall —

122 modelsExternal benchmark mirrorsCurrentDisplay onlyUpdated June 17, 2026

The published Vals LiveCodeBench mirror snapshot is tightly clustered at the top: Claude Fable 5 sits at 89.78%, while the third row is only 1.79 points behind. The broader top-10 spread is 3.17 points, so many of the published scores sit in a relatively narrow band.

122 models have been evaluated on Vals LiveCodeBench mirror. The benchmark falls in the External benchmark mirrors category. BenchLM tracks this category separately from its weighted global scoring system, so these results are best compared on the dedicated Korean benchmark views. Vals LiveCodeBench mirror is currently displayed for reference but excluded from the scoring formula, so it does not directly affect overall rankings.

About Vals LiveCodeBench mirror

Year

2026

Tasks

Coding problem difficulty splits

Format

Accuracy score

Difficulty

Contamination-resistant coding problems

BenchLM keeps this separate from its canonical LiveCodeBench rows because it is a Vals-hosted implementation snapshot.

Vals LiveCodeBench Public benchmark source

BenchLM freshness & provenance

Version

Vals LiveCodeBench mirror 2026

Refresh cadence

Quarterly

Staleness state

Current

Question availability

Public benchmark set

CurrentDisplay only

BenchLM uses freshness metadata to decide whether a benchmark should still be treated as a strong differentiator, a benchmark to watch, or a display-only reference. For the full scoring policy, see the BenchLM methodology page.

Vals LiveCodeBench score table (122 models)

Claude Fable 5anthropic/claude-fable-5

Anthropic

89.78%

Gemini 3.1 Pro Previewgoogle/gemini-3.1-pro-preview

Google

88.48%

GPT-5.2 Codexopenai/gpt-5.2-codex

OpenAI

87.99%

Claude Opus 4.8anthropic/claude-opus-4-8

Anthropic

87.82%

Gemini 3.5 Flashgoogle/gemini-3.5-flash

Google

87.60%

DeepSeek V4 Prodeepseek/deepseek-v4-pro

DeepSeek

87.48%

GPT-5.3 Codexopenai/gpt-5.3-codex

OpenAI

87.31%

Qwen3.7 Maxalibaba/qwen3.7-max

Alibaba

87.06%

Kimi K2.6kimi/kimi-k2.6

Moonshot AI

86.77%

GPT-5 Miniopenai/gpt-5-mini-2025-08-07

OpenAI

86.61%

GPT-5.1openai/gpt-5.1-2025-11-13

OpenAI

86.49%

Gemini 3 Pro Previewgoogle/gemini-3-pro-preview

Google

86.41%

Nemotron 3 Ultra 550b A55bnvidia/nemotron-3-ultra-550b-a55b

Nvidia

85.98%

Qwen3.6 Plusalibaba/qwen3.6-plus

Alibaba

85.95%

GPT-5openai/gpt-5-2025-08-07

OpenAI

85.91%

Gemini 3 Flash Previewgoogle/gemini-3-flash-preview

Google

85.59%

GPT-5.1 Codexopenai/gpt-5.1-codex

OpenAI

85.55%

GPT-5.2openai/gpt-5.2-2025-12-11

OpenAI

85.36%

Qwen3.5 Plus Thinkingalibaba/qwen3.5-plus-thinking

Alibaba

85.33%

GPT-5.5openai/gpt-5.5

OpenAI

85.30%

Claude Opus 4.7anthropic/claude-opus-4-7

Anthropic

85.07%

GPT-5 Codexopenai/gpt-5-codex

OpenAI

84.72%

Claude Opus 4.6 Thinkinganthropic/claude-opus-4-6-thinking

Anthropic

84.68%

Grok 4.3grok/grok-4.3

xAI

84.49%

Grok 4.20 0309 Reasoninggrok/grok-4.20-0309-reasoning

xAI

84.27%

GPT-5.4openai/gpt-5.4-2026-03-05

OpenAI

84.14%

GPT-5.4 Nanoopenai/gpt-5.4-nano-2026-03-17

OpenAI

84.01%

O3openai/o3-2025-04-16

OpenAI

83.91%

Kimi K2.5 Thinkingkimi/kimi-k2.5-thinking

Moonshot AI

83.87%

Claude Opus 4.5 20251101 Thinkinganthropic/claude-opus-4-5-20251101-thinking

Anthropic

83.67%

GPT-5.1 Codex Maxopenai/gpt-5.1-codex-max

OpenAI

83.56%

Qwen3.5 Flashalibaba/qwen3.5-flash

Alibaba

83.28%

Grok 4 0709grok/grok-4-0709

xAI

83.25%

GPT Oss 120bfireworks/gpt-oss-120b

Fireworks AI

83.23%

GLM 4.7zai/glm-4.7

Zhipu AI

82.23%

O4 Miniopenai/o4-mini-2025-04-16

OpenAI

82.21%

MiniMax M3minimax/MiniMax-M3

MiniMax

82.15%

Claude Sonnet 4.6anthropic/claude-sonnet-4-6

Anthropic

82.09%

Kimi K2.7 Codekimi/kimi-k2.7-code

Moonshot AI

82.05%

GLM 5 Thinkingzai/glm-5-thinking

Zhipu AI

81.87%

MiniMax M2.1minimax/MiniMax-M2.1

MiniMax

81.76%

Mimo V2.5xiaomi/mimo-v2.5

Xiaomi

81.51%

GPT-5.4 Miniopenai/gpt-5.4-mini-2026-03-17

OpenAI

81.47%

GLM 5.1zai/glm-5.1

Zhipu AI

81.38%

Mimo V2.5 Proxiaomi/mimo-v2.5-pro

Xiaomi

81.35%

GLM 4.6zai/glm-4.6

Zhipu AI

81.04%

DeepSeek V3p2 Thinkingfireworks/deepseek-v3p2-thinking

Fireworks AI

80.69%

Grok 4.1 Fast Reasoninggrok/grok-4-1-fast-reasoning

xAI

80.64%

GPT Oss 20bfireworks/gpt-oss-20b

Fireworks AI

80.39%

Gemini 3.1 Flash Lite Previewgoogle/gemini-3.1-flash-lite-preview

Google

80.12%

MiniMax M2.7minimax/MiniMax-M2.7

MiniMax

79.93%

Command A Plus 05 2026cohere/command-a-plus-05-2026

Cohere

79.38%

MiniMax M2.5minimax/MiniMax-M2.5

MiniMax

79.21%

Gemini 2.5 Pro Preview 03 25google/gemini-2.5-pro-preview-03-25

Google

79.16%

Grok 4 Fast Reasoninggrok/grok-4-fast-reasoning

xAI

78.97%

Qwen3 Maxalibaba/qwen3-max

Alibaba

78.22%

Grok 3 Mini Fast High Reasoninggrok/grok-3-mini-fast-high-reasoning

xAI

76.22%

Gemini 2.5 Flash Preview 09 2025 Thinkinggoogle/gemini-2.5-flash-preview-09-2025-thinking

Google

76.21%

Gemini 2.5 Flash Preview 09 2025google/gemini-2.5-flash-preview-09-2025

Google

75.06%

Claude Opus 4.5anthropic/claude-opus-4-5-20251101

Anthropic

75.03%

Magistral Medium 2509mistralai/magistral-medium-2509

Mistral AI

74.86%

Claude Sonnet 4.5 20250929 Thinkinganthropic/claude-sonnet-4-5-20250929-thinking

Anthropic

73.00%

Magistral Small 2509mistralai/magistral-small-2509

Mistral AI

72.13%

O3 Miniopenai/o3-mini-2025-01-31

OpenAI

71.48%

Gemini 2.5 Flash Lite Preview 09 2025 Thinkinggoogle/gemini-2.5-flash-lite-preview-09-2025-thinking

Google

71.39%

Qwen3 235b A22bfireworks/qwen3-235b-a22b

Fireworks AI

70.62%

Moonshotai Kimi K2 Instructtogether/moonshotai/Kimi-K2-Instruct

Together AI

70.45%

DeepSeek R1fireworks/deepseek-r1

Fireworks AI

70.22%

GPT-5 Nanoopenai/gpt-5-nano-2025-08-07

OpenAI

70.22%

Claude Opus 4 20250514 Thinkinganthropic/claude-opus-4-20250514-thinking

Anthropic

70.19%

DeepSeek V3p2fireworks/deepseek-v3p2

Fireworks AI

69.86%

GLM 5.2zai/glm-5.2

Zhipu AI

69.50%

Gemini 2.5 Flash Lite Preview 09 2025google/gemini-2.5-flash-lite-preview-09-2025

Google

67.67%

GLM 4.5zai/glm-4.5

Zhipu AI

67.45%

Qwen3 Max Previewalibaba/qwen3-max-preview

Alibaba

66.91%

Laguna Xs.2poolside/laguna-xs.2

Poolside

66.61%

Claude Opus 4.1 20250805 Thinkinganthropic/claude-opus-4-1-20250805-thinking

Anthropic

66.46%

Grok 3 Mini Fast Low Reasoninggrok/grok-3-mini-fast-low-reasoning

xAI

66.27%

DeepSeek V3 0324fireworks/deepseek-v3-0324

Fireworks AI

65.48%

Claude Opus 4.1anthropic/claude-opus-4-1-20250805

Anthropic

64.56%

Kimi K2 Thinkingkimi/kimi-k2-thinking

Moonshot AI

63.15%

Claude Opus 4anthropic/claude-opus-4-20250514

Anthropic

62.63%

Claude Sonnet 4 20250514 Thinkinganthropic/claude-sonnet-4-20250514-thinking

Anthropic

62.39%

Grok Code Fast 1grok/grok-code-fast-1

xAI

61.97%

Laguna M.1poolside/laguna-m.1

Poolside

61.44%

Claude 3.7 Sonnet 20250219 Thinkinganthropic/claude-3-7-sonnet-20250219-thinking

Anthropic

60.44%

Claude Sonnet 4anthropic/claude-sonnet-4-20250514

Anthropic

59.67%

Langston Nim Nvidia Llama 3.3 Nemotron Super 49b V1 42e84561 Thinkingtogether/langston/nim/nvidia/llama-3.3-nemotron-super-49b-v1-42e84561-thinking

Together AI

58.37%

GPT-4.1 Miniopenai/gpt-4.1-mini-2025-04-14

OpenAI

58.16%

Gemini 2.5 Flash Preview 04 17google/gemini-2.5-flash-preview-04-17

Google

56.94%

Claude 3.7 Sonnetanthropic/claude-3-7-sonnet-20250219

Anthropic

56.66%

Mistral Large 2512mistralai/mistral-large-2512

Mistral AI

55.34%

GPT-4.1openai/gpt-4.1-2025-04-14

OpenAI

54.67%

Grok 3grok/grok-3

xAI

52.90%

Devstral 2512mistralai/devstral-2512

Mistral AI

51.84%

O1openai/o1-2024-12-17

OpenAI

50.26%

Claude 3.5 Sonnetanthropic/claude-3-5-sonnet-20241022

Anthropic

49.63%

Llama4 Maverick Instruct Basicfireworks/llama4-maverick-instruct-basic

Fireworks AI

47.25%

Gemini 2.5 Flash Preview 04 17 Thinkinggoogle/gemini-2.5-flash-preview-04-17-thinking

Google

46.87%

100

Grok 4 Fast Non Reasoninggrok/grok-4-fast-non-reasoning

xAI

46.09%

101

Mistral Medium 2505mistralai/mistral-medium-2505

Mistral AI

44.84%

102

Gemini 2.0 Flash 001google/gemini-2.0-flash-001

Google

43.61%

103

GPT-4oopenai/gpt-4o-2024-11-20

OpenAI

43.44%

104

Labs Devstral Small 2512mistralai/labs-devstral-small-2512

Mistral AI

43.18%

105

GPT-4.1 Nanoopenai/gpt-4.1-nano-2025-04-14

OpenAI

42.72%

106

Grok 4.1 Fast Non Reasoninggrok/grok-4-1-fast-non-reasoning

xAI

42.62%

107

Claude 3.5 Haikuanthropic/claude-3-5-haiku-20241022

Anthropic

41.92%

108

Gemini 1.5 Pro 002google/gemini-1.5-pro-002

Google

41.72%

109

Claude Haiku 4.5 20251001 Thinkinganthropic/claude-haiku-4-5-20251001-thinking

Anthropic

41.17%

110

Grok 2 1212grok/grok-2-1212

xAI

38.68%

111

Meta Llama Llama 4 Scout 17B 16E Instructtogether/meta-llama/Llama-4-Scout-17B-16E-Instruct

Together AI

38.54%

112

Mistral Large 2411mistralai/mistral-large-2411

Mistral AI

37.09%

113

Gemini 1.5 Flash 002google/gemini-1.5-flash-002

Google

36.91%

114

Meta Llama Llama 3.3 70B Instruct Turbotogether/meta-llama/Llama-3.3-70B-Instruct-Turbo

Together AI

36.34%

115

Langston Nim Nvidia Llama 3.3 Nemotron Super 49b V1 42e84561together/langston/nim/nvidia/llama-3.3-nemotron-super-49b-v1-42e84561

Together AI

36.31%

116

Command A 03 2025cohere/command-a-03-2025

Cohere

35.07%

117

Mistral Small 2503mistralai/mistral-small-2503

Mistral AI

31.82%

118

GPT-4o Miniopenai/gpt-4o-mini-2024-07-18

OpenAI

26.42%

119

Jamba Large 1.6ai21labs/jamba-large-1.6

AI21 Labs

22.32%

120

Command R Pluscohere/command-r-plus

Cohere

18.24%

121

Mistral Small 2402mistralai/mistral-small-2402

Mistral AI

15.78%

122

Jamba Mini 1.6ai21labs/jamba-mini-1.6

AI21 Labs

9.92%

FAQ

What does Vals LiveCodeBench mirror measure?

Vals AI implementation of LiveCodeBench with easy, medium, and hard task splits.

Which model leads the published Vals LiveCodeBench mirror snapshot?

Claude Fable 5 currently leads the published Vals LiveCodeBench mirror snapshot with 89.78% vals livecodebench score. BenchLM shows this benchmark for display only and does not use it in overall rankings.

How many models are evaluated on Vals LiveCodeBench mirror?

122 AI models are included in BenchLM's mirrored Vals LiveCodeBench mirror snapshot, based on the public leaderboard captured on June 17, 2026.

Last updated: June 17, 2026 · mirrored from the public benchmark leaderboard

The AI models change fast. We track them for you.

For engineers, researchers, and the plain curious — a weekly brief on new models, ranking shifts, and pricing changes.

Free. No spam. Unsubscribe anytime.