Multi-Environment Web Challenge (MEWC)

Name: Multi-Environment Web Challenge
Creator: BenchLM

A benchmark that evaluates AI agents on multi-environment web challenges, testing navigation and task completion across diverse live web environments.

How BenchLM shows MEWC right now

BenchLM is tracking MEWC in the local dataset, but exact-source verification records for these rows are still being attached. To avoid a blank benchmark page, BenchLM shows the current tracked rows below as a display-only reference table.

These tracked rows are useful for inspection and spot-checking, but until exact-source attachments are completed they should not be treated as fully verified public benchmark rows.

1 tracked modelsLocal tracked rowsAwaiting exact-source attachmentsDisplay only

MiniMax M2.5 benchmark release surface

Tracked score on MEWC — May 22, 2026

BenchLM mirrors the published tracked score view for MEWC. MiniMax M2.5 leads the public snapshot at 74.4%. BenchLM does not use these results to rank models overall.

MiniMax M2.5

MiniMax

minimax-m2-5

74.4%

Overall —

1 modelsAgenticCurrentDisplay onlyUpdated May 22, 2026

About MEWC

Year

2026

Tasks

Web-agent tasks

Format

Browser task completion

Difficulty

Open-web agent workflows

MEWC is useful as an agentic browsing benchmark because it focuses on open-web interaction and multi-environment task execution rather than single-site scripted browsing.

MiniMax M2.5 benchmark release surface Public benchmark source

BenchLM freshness & provenance

Version

MEWC 2026

Refresh cadence

Quarterly

Staleness state

Current

Question availability

Public benchmark set

CurrentDisplay only

BenchLM uses freshness metadata to decide whether a benchmark should still be treated as a strong differentiator, a benchmark to watch, or a display-only reference. For the full scoring policy, see the BenchLM methodology page.

Tracked score table (1 models)

MiniMax M2.5minimax-m2-5

MiniMax

74.4%

FAQ

What does MEWC measure?

A benchmark that evaluates AI agents on multi-environment web challenges, testing navigation and task completion across diverse live web environments.

Which model leads the published MEWC snapshot?

MiniMax M2.5 currently leads the published MEWC snapshot with 74.4% tracked score. BenchLM shows this benchmark for display only and does not use it in overall rankings.

How many models are evaluated on MEWC?

1 AI models are included in BenchLM's mirrored MEWC snapshot, based on the public leaderboard captured on May 22, 2026.

Last updated: May 22, 2026 · mirrored from the public benchmark leaderboard

The AI models change fast. We track them for you.

For engineers, researchers, and the plain curious — a weekly brief on new models, ranking shifts, and pricing changes.

Free. No spam. Unsubscribe anytime.