A benchmark for grounded reasoning over office-style documents, spreadsheets, charts, and business artifacts.
According to BenchLM.ai, GPT-5.4 Pro leads the OfficeQA Pro benchmark with a score of 96, followed by GPT-5.2 Pro (96) and GPT-5.4 (96). The top models are clustered within 0 points, suggesting this benchmark is nearing saturation for frontier models.
121 models have been evaluated on OfficeQA Pro. The benchmark falls in the multimodalGrounded category, which carries a 15% weight in BenchLM.ai's overall scoring system. Strong performance here directly impacts a model's overall ranking.
Year
2026
Tasks
Document and spreadsheet tasks
Format
Grounded QA over office artifacts
Difficulty
Enterprise grounded reasoning
OfficeQA Pro is useful when choosing models for enterprise copilots because it measures whether they can reason correctly over real office content rather than generic chat prompts.
OfficeQA ProA benchmark for grounded reasoning over office-style documents, spreadsheets, charts, and business artifacts.
GPT-5.4 Pro by OpenAI currently leads with a score of 96 on OfficeQA Pro.
121 AI models have been evaluated on OfficeQA Pro on BenchLM.
Get notified when new models drop, benchmark scores change, or the leaderboard shifts. One email per week.
Free. No spam. Unsubscribe anytime. We only store derived location metadata for consent routing.