A Moonshot AI internal coding-agent benchmark for realistic software-engineering tasks across mainstream programming languages and production technology stacks.
BenchLM mirrors the published score view for Kimi Code Bench v2. Kimi K2.7 Code leads the public snapshot at 62.0%. BenchLM does not use these results to rank models overall.
Year
2026
Tasks
Realistic coding-agent tasks
Format
Coding-agent pass rate
Difficulty
Production software engineering
Moonshot describes Kimi Code Bench v2 as an in-house coding-agent benchmark covering backend services, infrastructure, performance engineering, systems programming, security, frontend development, and ML/data engineering. BenchLM stores provider-reported exact values as display-only launch evidence.
Version
Kimi Code Bench v2 2026
Refresh cadence
Quarterly
Staleness state
Current
Question availability
Public benchmark set
BenchLM uses freshness metadata to decide whether a benchmark should still be treated as a strong differentiator, a benchmark to watch, or a display-only reference. For the full scoring policy, see the BenchLM methodology page.
A Moonshot AI internal coding-agent benchmark for realistic software-engineering tasks across mainstream programming languages and production technology stacks.
Kimi K2.7 Code by Moonshot AI currently leads with a score of 62.0% on Kimi Code Bench v2.
1 AI models have been evaluated on Kimi Code Bench v2 on BenchLM.
For engineers, researchers, and the plain curious — a weekly brief on new models, ranking shifts, and pricing changes.
Free. No spam. Unsubscribe anytime.