Skip to main content

LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code (LiveCodeBench)

A continuously updated benchmark using fresh competitive programming problems from LeetCode, Codeforces, and AtCoder to provide contamination-free code generation evaluation.

Top models on LiveCodeBench — June 2, 2026

As of June 2, 2026, DeepSeek V4 Pro (Max) leads the LiveCodeBench leaderboard with 93.5% , followed by Qwen3.7 Max (91.6%) and DeepSeek V4 Flash (Max) (91.6%).

14 modelsCoding23% of category scoreCurrentUpdated June 2, 2026

According to BenchLM.ai, DeepSeek V4 Pro (Max) leads the LiveCodeBench benchmark with a score of 93.5%, followed by Qwen3.7 Max (91.6%) and DeepSeek V4 Flash (Max) (91.6%). The top models are clustered within 1.9 points, suggesting this benchmark is nearing saturation for frontier models.

14 models have been evaluated on LiveCodeBench. The benchmark falls in the Coding category. This category carries a 20% weight in BenchLM.ai's overall scoring system. Within that category, LiveCodeBench contributes 23% of the category score, so strong performance here directly affects a model's overall ranking.

About LiveCodeBench

Year

2024

Tasks

Continuously updated

Format

Competitive programming

Difficulty

Competitive programming level

LiveCodeBench addresses data contamination concerns by continuously sourcing new problems from competitive programming platforms. It evaluates code generation, self-repair, code execution, and test output prediction.

BenchLM freshness & provenance

Version

Rolling 2026 set

Refresh cadence

Rolling

Staleness state

Current

Question availability

Delayed public release

Current

BenchLM uses freshness metadata to decide whether a benchmark should still be treated as a strong differentiator, a benchmark to watch, or a display-only reference. For the full scoring policy, see the BenchLM methodology page.

Leaderboard (14 models)

1
93.5%
2
91.6%
3
91.6%
4
89.8%
5
89.6%
6
88.4%
7
85%
8
84.9%
9
83.9%
10
80.4%
12
56.8%
13
55.2%
14
37.6%

FAQ

What does LiveCodeBench measure?

A continuously updated benchmark using fresh competitive programming problems from LeetCode, Codeforces, and AtCoder to provide contamination-free code generation evaluation.

Which model scores highest on LiveCodeBench?

DeepSeek V4 Pro (Max) by DeepSeek currently leads with a score of 93.5% on LiveCodeBench.

How many models are evaluated on LiveCodeBench?

14 AI models have been evaluated on LiveCodeBench on BenchLM.

Last updated: June 2, 2026 · BenchLM version Rolling 2026 set

The AI models change fast. We track them for you.

For engineers, researchers, and the plain curious — a weekly brief on new models, ranking shifts, and pricing changes.

Free. No spam. Unsubscribe anytime.