Benchmark profile

MCPMark-Verified (MCP Mark Verified)

A human-verified edition of MCPMark for MCP tool use across Notion, GitHub, Filesystem, Postgres, and Playwright server environments.

Data verified July 27, 2026

Benchmark score on MCP Mark Verified — July 27, 2026

BenchLM mirrors the published score view for MCP Mark Verified. Kimi K2.7 Code leads the public snapshot at 81.1%. BenchLM does not use these results to rank models overall.

1Open

Kimi K2.7 Code

Moonshot AI

kimi-k2-7-code

81.1%

Overall 54.03Context 256K

1 modelAgenticCurrentDisplay onlyUpdated July 27, 2026

Benchmark score table (1 model)

Score

Kimi K2.7 CodeMoonshot AI · Open weight

81.1%

About MCP Mark Verified

Year

2026

Tasks

MCP tool-use tasks across five server environments

Format

Interactive MCP task completion

Difficulty

Advanced tool use

Moonshot reports MCPMark-Verified as a human-verified edition of MCPMark and says it will be open-sourced. BenchLM stores provider-reported exact values as display-only launch evidence until a stable public leaderboard is available.

MCPMark

BenchLM freshness & provenance

Version

MCP Mark Verified 2026

Refresh cadence

Quarterly

Staleness state

Current

Question availability

Public benchmark set

CurrentDisplay only

BenchLM uses freshness metadata to decide whether a benchmark should still be treated as a strong differentiator, a benchmark to watch, or a display-only reference. For the full scoring policy, see the BenchLM methodology page.

FAQ

What does MCP Mark Verified measure?

A human-verified edition of MCPMark for MCP tool use across Notion, GitHub, Filesystem, Postgres, and Playwright server environments.

Which model scores highest on MCP Mark Verified?

Kimi K2.7 Code by Moonshot AI currently leads with a score of 81.1% on MCP Mark Verified.

How many models are evaluated on MCP Mark Verified?

1 AI models have been evaluated on MCP Mark Verified on BenchLM.

Last updated: July 27, 2026 · BenchLM version MCP Mark Verified 2026

Know when it’s worth switching models

The model to choose, the cheaper alternative, and the release we would wait on.

One email each week. Unsubscribe anytime.