Skip to main content

LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code (LiveCodeBench)

A continuously updated benchmark using fresh competitive programming problems from LeetCode, Codeforces, and AtCoder to provide contamination-free code generation evaluation.

Top models on LiveCodeBench — April 21, 2026

As of April 21, 2026, Kimi 2.6 leads the LiveCodeBench leaderboard with 89.6% , followed by Kimi K2.5 (85%) and GLM-4.7 (84.9%).

5 modelsCoding23% of category scoreCurrentUpdated April 21, 2026

According to BenchLM.ai, Kimi 2.6 leads the LiveCodeBench benchmark with a score of 89.6%, followed by Kimi K2.5 (85%) and GLM-4.7 (84.9%). There is significant spread across the leaderboard, making this benchmark effective at differentiating model capabilities.

5 models have been evaluated on LiveCodeBench. The benchmark falls in the Coding category. This category carries a 20% weight in BenchLM.ai's overall scoring system. Within that category, LiveCodeBench contributes 23% of the category score, so strong performance here directly affects a model's overall ranking.

About LiveCodeBench

Year

2024

Tasks

Continuously updated

Format

Competitive programming

Difficulty

Competitive programming level

LiveCodeBench addresses data contamination concerns by continuously sourcing new problems from competitive programming platforms. It evaluates code generation, self-repair, code execution, and test output prediction.

BenchLM freshness & provenance

Version

Rolling 2026 set

Refresh cadence

Rolling

Staleness state

Current

Question availability

Delayed public release

Current

BenchLM uses freshness metadata to decide whether a benchmark should still be treated as a strong differentiator, a benchmark to watch, or a display-only reference. For the full scoring policy, see the BenchLM methodology page.

Leaderboard (5 models)

1
89.6%
2
85%
3
84.9%
4
80.4%
5
37.6%

FAQ

What does LiveCodeBench measure?

A continuously updated benchmark using fresh competitive programming problems from LeetCode, Codeforces, and AtCoder to provide contamination-free code generation evaluation.

Which model scores highest on LiveCodeBench?

Kimi 2.6 by Moonshot AI currently leads with a score of 89.6% on LiveCodeBench.

How many models are evaluated on LiveCodeBench?

5 AI models have been evaluated on LiveCodeBench on BenchLM.

Last updated: April 21, 2026 · BenchLM version Rolling 2026 set

The AI models change fast. We track them for you.

For engineers, researchers, and the plain curious — a weekly brief on new models, ranking shifts, and pricing changes.

Free. No spam. Unsubscribe anytime.