Skip to main content

Vals CorpFin v2 (CorpFin v2)

Vals AI private benchmark for understanding long-context credit agreements.

How BenchLM shows CorpFin v2

BenchLM mirrors the public Vals AI CorpFin v2 leaderboard captured from https://www.vals.ai/benchmarks/corp_fin_v2 and updated by Vals on May 16, 2026. The snapshot preserves overall scores, uncertainty, latency, cost-per-test metadata, and task-level scores where Vals publishes them.

CorpFin v2 is display only on BenchLM. Vals proprietary or Vals-hosted aggregate views are useful context, but BenchLM does not use them as weighted ranking inputs or as a replacement for benchmark-native source records.

105 Vals rows4 task viewsprivate datasetTasks: Overall, Exact Pages, Max Fitting Context, Shared Max ContextDisplay only

CorpFin v2 score on CorpFin v2 — May 16, 2026

BenchLM mirrors the published corpfin v2 score view for CorpFin v2. Grok 4.3 leads the public snapshot at 68.53% , followed by GPT-5.5 (68.42%) and Kimi K2.5 Thinking (68.26%). BenchLM does not use these results to rank models overall.

105 modelsExternal benchmark mirrorsCurrentDisplay onlyUpdated May 16, 2026

The published CorpFin v2 snapshot is tightly clustered at the top: Grok 4.3 sits at 68.53%, while the third row is only 0.27 points behind. The broader top-10 spread is 2.45 points, so many of the published scores sit in a relatively narrow band.

105 models have been evaluated on CorpFin v2. The benchmark falls in the External benchmark mirrors category. BenchLM tracks this category separately from its weighted global scoring system, so these results are best compared on the dedicated Korean benchmark views. CorpFin v2 is currently displayed for reference but excluded from the scoring formula, so it does not directly affect overall rankings.

About CorpFin v2

Year

2026

Tasks

Credit-agreement understanding tasks

Format

Accuracy score

Difficulty

Professional finance document reasoning

The Vals CorpFin v2 page reports overall, exact-page, max-fitting-context, and shared-max-context task views. BenchLM keeps it display only.

BenchLM freshness & provenance

Version

CorpFin v2 2026

Refresh cadence

Quarterly

Staleness state

Current

Question availability

Public benchmark set

CurrentDisplay only

BenchLM uses freshness metadata to decide whether a benchmark should still be treated as a strong differentiator, a benchmark to watch, or a display-only reference. For the full scoring policy, see the BenchLM methodology page.

CorpFin v2 score table (105 models)

1
Grok 4.3grok/grok-4.3
68.53%
2
GPT-5.5openai/gpt-5.5
68.42%
3
Kimi K2.5 Thinkingkimi/kimi-k2.5-thinking
68.26%
4
Qwen3 Maxalibaba/qwen3-max-2026-01-23
68.03%
5
Claude Opus 4.6 Thinkinganthropic/claude-opus-4-6-thinking
67.02%
6
Grok 4 Fast Reasoninggrok/grok-4-fast-reasoning
66.90%
7
Kimi K2.6 Thinkingkimi/kimi-k2.6-thinking
66.74%
8
Qwen3.6 Max Previewalibaba/qwen3.6-max-preview
66.47%
9
Gemini 3 Flash Previewgoogle/gemini-3-flash-preview
66.43%
10
Claude Opus 4.7anthropic/claude-opus-4-7
66.08%
11
Grok 4 0709grok/grok-4-0709
66.05%
12
Grok 4.1 Fast Reasoninggrok/grok-4-1-fast-reasoning
65.97%
13
GPT-5.2openai/gpt-5.2-2025-12-11
65.89%
14
Claude Sonnet 4.6anthropic/claude-sonnet-4-6
65.31%
15
Qwen3.5 Plus Thinkingalibaba/qwen3.5-plus-thinking
65.31%
16
GPT-5.4openai/gpt-5.4-2026-03-05
65.27%
17
Muse Sparkmeta/muse_spark
65.11%
18
Claude Opus 4.5 20251101 Thinkinganthropic/claude-opus-4-5-20251101-thinking
65.07%
19
Gemini 3.5 Flashgoogle/gemini-3.5-flash
64.69%
20
Gemini 3.1 Pro Previewgoogle/gemini-3.1-pro-preview
64.49%
21
GLM 5.1 Thinkingzai/glm-5.1-thinking
64.45%
22
GPT-5.1openai/gpt-5.1-2025-11-13
63.83%
23
Grok 4.20 0309 Reasoninggrok/grok-4.20-0309-reasoning
63.67%
24
Gemini 3 Pro Previewgoogle/gemini-3-pro-preview
63.67%
25
Qwen3.5 Flashalibaba/qwen3.5-flash
63.56%
26
GPT-4.1openai/gpt-4.1-2025-04-14
63.05%
27
GLM 5 Thinkingzai/glm-5-thinking
62.90%
28
Qwen3.6 27balibaba/qwen3.6-27b
62.31%
29
Claude Sonnet 4.5 20250929 Thinkinganthropic/claude-sonnet-4-5-20250929-thinking
61.97%
30
Qwen3.6 Plusalibaba/qwen3.6-plus
61.93%
31
DeepSeek V4 Prodeepseek/deepseek-v4-pro
61.38%
32
Claude Opus 4.5anthropic/claude-opus-4-5-20251101
61.30%
33
Claude Sonnet 4 20250514 Thinkinganthropic/claude-sonnet-4-20250514-thinking
61.23%
34
GPT-5.4 Nanoopenai/gpt-5.4-nano-2026-03-17
61.19%
35
MiniMax M2.7minimax/MiniMax-M2.7
61.19%
36
Grok 3 Mini Fast High Reasoninggrok/grok-3-mini-fast-high-reasoning
61.11%
37
GPT-5openai/gpt-5-2025-08-07
61.07%
38
Mistral Large 2512mistralai/mistral-large-2512
61.03%
39
GLM 4.5zai/glm-4.5
60.96%
40
GPT-5.4 Miniopenai/gpt-5.4-mini-2026-03-17
60.92%
41
Claude Sonnet 4.5anthropic/claude-sonnet-4-5-20250929
60.80%
42
Gemini 2.5 Progoogle/gemini-2.5-pro
60.80%
43
Claude Haiku 4.5 20251001 Thinkinganthropic/claude-haiku-4-5-20251001-thinking
60.61%
44
Kimi K2 Thinkingkimi/kimi-k2-thinking
60.57%
45
Claude 3.7 Sonnet 20250219 Thinkinganthropic/claude-3-7-sonnet-20250219-thinking
60.41%
46
Claude Haiku 4.5anthropic/claude-haiku-4-5-20251001
60.30%
47
GPT-5 Miniopenai/gpt-5-mini-2025-08-07
60.18%
48
Gemini 2.5 Pro Exp 03 25google/gemini-2.5-pro-exp-03-25
59.83%
49
Gemini 2.5 Flash Preview 09 2025 Thinkinggoogle/gemini-2.5-flash-preview-09-2025-thinking
59.75%
50
O3openai/o3-2025-04-16
59.71%
51
Grok 3grok/grok-3
59.71%
52
MiniMax M2.5 Lightningminimax/MiniMax-M2.5-Lightning
59.60%
53
Grok 3 Mini Fast Low Reasoninggrok/grok-3-mini-fast-low-reasoning
59.48%
54
Gemini 3.1 Flash Lite Previewgoogle/gemini-3.1-flash-lite-preview
59.36%
55
O4 Miniopenai/o4-mini-2025-04-16
58.97%
56
Gemini 2.5 Flash Preview 09 2025google/gemini-2.5-flash-preview-09-2025
58.97%
57
MiniMax M2.1minimax/MiniMax-M2.1
58.90%
58
Mistral Medium 3.5mistralai/mistral-medium-3.5
58.78%
59
Grok 4 Fast Non Reasoninggrok/grok-4-fast-non-reasoning
58.39%
60
GPT Oss 120bfireworks/gpt-oss-120b
58.24%
61
GPT-4.1 Miniopenai/gpt-4.1-mini-2025-04-14
57.93%
62
Gemini 2.5 Flash Lite Preview 09 2025 Thinkinggoogle/gemini-2.5-flash-lite-preview-09-2025-thinking
57.58%
63
GLM 4.6zai/glm-4.6
56.84%
64
Gemini 2.5 Flash Lite Preview 09 2025google/gemini-2.5-flash-lite-preview-09-2025
56.29%
65
Qwen3 Maxalibaba/qwen3-max
55.94%
66
DeepSeek V3 0324fireworks/deepseek-v3-0324
54.74%
67
Claude Sonnet 4anthropic/claude-sonnet-4-20250514
54.70%
68
Trinity Large Thinkingarcee-ai/trinity-large-thinking
54.66%
69
Gemini 2.5 Flash Preview 04 17google/gemini-2.5-flash-preview-04-17
54.16%
70
DeepSeek R1fireworks/deepseek-r1
54.12%
71
Claude 3.5 Sonnetanthropic/claude-3-5-sonnet-20241022
53.61%
72
GPT Oss 20bfireworks/gpt-oss-20b
53.15%
73
Qwen3 Max Previewalibaba/qwen3-max-preview
52.95%
74
Grok 4.1 Fast Non Reasoninggrok/grok-4-1-fast-non-reasoning
52.49%
75
DeepSeek V3fireworks/deepseek-v3
52.49%
76
DeepSeek V3p1fireworks/deepseek-v3p1
51.48%
77
Grok 2 1212grok/grok-2-1212
51.13%
78
DeepSeek V3p2 Thinkingfireworks/deepseek-v3p2-thinking
50.97%
79
Claude 3.5 Haikuanthropic/claude-3-5-haiku-20241022
50.82%
80
Mistral Medium 2505mistralai/mistral-medium-2505
50.74%
81
Moonshotai Kimi K2 Instructtogether/moonshotai/Kimi-K2-Instruct
50.39%
82
Llama4 Maverick Instruct Basicfireworks/llama4-maverick-instruct-basic
49.73%
83
DeepSeek V3p2fireworks/deepseek-v3p2
47.94%
84
Magistral Medium 2509mistralai/magistral-medium-2509
47.40%
85
Meta Llama Llama 4 Scout 17B 16E Instructtogether/meta-llama/Llama-4-Scout-17B-16E-Instruct
46.78%
86
GLM 4.7zai/glm-4.7
46.39%
87
Command A 03 2025cohere/command-a-03-2025
45.96%
88
GPT-4oopenai/gpt-4o-2024-11-20
45.92%
89
GPT-4o Miniopenai/gpt-4o-mini-2024-07-18
45.45%
90
O3 Miniopenai/o3-mini-2025-01-31
45.30%
91
Mistral Small 2503mistralai/mistral-small-2503
44.17%
92
Magistral Small 2509mistralai/magistral-small-2509
44.02%
93
Gemini 2.0 Pro Exp 02 05google/gemini-2.0-pro-exp-02-05
43.44%
94
GPT-4.1 Nanoopenai/gpt-4.1-nano-2025-04-14
42.08%
95
Jamba Large 1.6ai21labs/jamba-large-1.6
41.53%
96
Gemini 1.5 Pro 002google/gemini-1.5-pro-002
40.52%
97
GPT-4oopenai/gpt-4o-2024-08-06
39.43%
98
Jamba 1.5 Largeai21labs/jamba-1.5-large
39.43%
99
Meta Llama Meta Llama 3.1 70B Instruct Turbotogether/meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo
38.85%
100
Gemini 1.5 Flash 002google/gemini-1.5-flash-002
38.19%
101
Jamba Mini 1.6ai21labs/jamba-mini-1.6
38.03%
102
Meta Llama Meta Llama 3.1 8B Instruct Turbotogether/meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo
37.80%
103
Jamba 1.5 Miniai21labs/jamba-1.5-mini
33.88%
104
Gemini 2.0 Flash 001google/gemini-2.0-flash-001
33.72%
105
Gemini 1.5 Flash 001google/gemini-1.5-flash-001
28.63%

FAQ

What does CorpFin v2 measure?

Vals AI private benchmark for understanding long-context credit agreements.

Which model leads the published CorpFin v2 snapshot?

Grok 4.3 currently leads the published CorpFin v2 snapshot with 68.53% corpfin v2 score. BenchLM shows this benchmark for display only and does not use it in overall rankings.

How many models are evaluated on CorpFin v2?

105 AI models are included in BenchLM's mirrored CorpFin v2 snapshot, based on the public leaderboard captured on May 16, 2026.

Last updated: May 16, 2026 · mirrored from the public benchmark leaderboard

The AI models change fast. We track them for you.

For engineers, researchers, and the plain curious — a weekly brief on new models, ranking shifts, and pricing changes.

Free. No spam. Unsubscribe anytime.