A continuously updated benchmark using fresh competitive programming problems from LeetCode, Codeforces, and AtCoder to provide contamination-free code generation evaluation.
Year
2024
Tasks
Continuously updated
Format
Competitive programming
Difficulty
Competitive programming level
LiveCodeBench addresses data contamination concerns by continuously sourcing new problems from competitive programming platforms. It evaluates code generation, self-repair, code execution, and test output prediction.
LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for CodeA continuously updated benchmark using fresh competitive programming problems from LeetCode, Codeforces, and AtCoder to provide contamination-free code generation evaluation.
GPT-5.3 Codex by OpenAI currently leads with a score of 85 on LiveCodeBench.
88 AI models have been evaluated on LiveCodeBench on BenchLM.