Skip to main content

MCPMark-Verified (MCP Mark Verified)

A human-verified edition of MCPMark for MCP tool use across Notion, GitHub, Filesystem, Postgres, and Playwright server environments.

Benchmark score on MCP Mark Verified — June 12, 2026

BenchLM mirrors the published score view for MCP Mark Verified. Kimi K2.7 Code leads the public snapshot at 81.1%. BenchLM does not use these results to rank models overall.

1 modelsAgenticCurrentDisplay onlyUpdated June 12, 2026

About MCP Mark Verified

Year

2026

Tasks

MCP tool-use tasks across five server environments

Format

Interactive MCP task completion

Difficulty

Advanced tool use

Moonshot reports MCPMark-Verified as a human-verified edition of MCPMark and says it will be open-sourced. BenchLM stores provider-reported exact values as display-only launch evidence until a stable public leaderboard is available.

BenchLM freshness & provenance

Version

MCP Mark Verified 2026

Refresh cadence

Quarterly

Staleness state

Current

Question availability

Public benchmark set

CurrentDisplay only

BenchLM uses freshness metadata to decide whether a benchmark should still be treated as a strong differentiator, a benchmark to watch, or a display-only reference. For the full scoring policy, see the BenchLM methodology page.

Benchmark score table (1 models)

1
81.1%

FAQ

What does MCP Mark Verified measure?

A human-verified edition of MCPMark for MCP tool use across Notion, GitHub, Filesystem, Postgres, and Playwright server environments.

Which model scores highest on MCP Mark Verified?

Kimi K2.7 Code by Moonshot AI currently leads with a score of 81.1% on MCP Mark Verified.

How many models are evaluated on MCP Mark Verified?

1 AI models have been evaluated on MCP Mark Verified on BenchLM.

Last updated: June 12, 2026 · BenchLM version MCP Mark Verified 2026

The AI models change fast. We track them for you.

For engineers, researchers, and the plain curious — a weekly brief on new models, ranking shifts, and pricing changes.

Free. No spam. Unsubscribe anytime.