Best models
Best AI model rankings
Browse BenchLM ranking surfaces by benchmark category, workflow, provider, license, and value.
Core leaderboards
Best LLMs for Coding
Top AI models ranked by coding benchmark performance including HumanEval, SWE-bench Verified, SWE-bench Pro, SWE-Rebench, LiveCodeBench, FLTEval, and ProgramBench.
Best LLMs for Math
Top AI models ranked by mathematics benchmark performance including AIME, HMMT, BRUMO, and MATH-500.
Best LLMs for Knowledge
Top AI models ranked by knowledge benchmarks including MMLU, GPQA, SuperGPQA, MMLU-Pro, HLE, FrontierScience, and SimpleQA.
Best LLMs for Reasoning
Top AI models ranked by reasoning benchmark performance including ARC-AGI-2, MuSR, LongBench v2, MRCRv2, BBH, and LisanBench.
Best Agentic AI Models
Top AI models ranked by agentic benchmark performance including Terminal-Bench 2.0, BrowseComp, and OSWorld-Verified.
Best Multimodal & Grounded AI Models
Top AI models ranked by multimodal and grounded benchmark performance including MMMU-Pro and OfficeQA Pro.
Best LLMs for Instruction Following
Top AI models ranked by instruction following benchmark performance including IFEval.
Best Multilingual LLMs
Top AI models ranked by multilingual benchmark performance including MGSM and MMLU-ProX.
Use cases
Best Long Context AI Models
AI models ranked on sourced long-context and memory benchmarks including LongBench v2, MRCRv2, AI-Needle, Graphwalks, and MMLongBench-Doc.
Best Tool Use & Function Calling Models
AI models ranked on sourced tool-use benchmarks including BFCL v4, MCP Atlas, Toolathlon, Tau Bench, and related tool-calling evaluations.
Best AI Models for Web Research
AI models ranked on sourced browsing and research benchmarks including BrowseComp, WebArena, and WebVoyager.
Best Computer Use AI Models
AI models ranked on sourced computer-use and GUI benchmarks including OSWorld-Verified, ScreenSpot Pro, WebArena, WebVoyager, and Vision2Web.
Best Document AI Models
AI models ranked on sourced document AI and OCR benchmarks including OfficeQA Pro, OmniDocBench 1.5, CC-OCR, and MMLongBench-Doc.
Best Image Understanding Models
AI models ranked on sourced image-understanding benchmarks including MMMU-Pro, RealWorldQA, AI2D, CountBench, RefCOCO, and related grounding evaluations.
Best Frontend & App Dev Models
AI models ranked on sourced frontend and app-development benchmarks including React Native Evals, Design2Code, Vision2Web, and related web-task evaluations.
Best Factuality AI Models
AI models ranked on sourced factuality and hallucination-adjacent benchmarks including SimpleQA, HLE without tools, and Facts-VLM.
Model groups
Best Open Source LLMs
Top open weight AI models you can download and run locally, ranked by benchmark performance.
Best Proprietary LLMs
Top proprietary/closed-source AI models ranked by benchmark performance.
Best Reasoning AI Models
Top AI models with dedicated reasoning capabilities, ranked by benchmark performance.
Best OpenAI Models
All OpenAI models ranked by benchmark performance — GPT-5, GPT-4o, o1, o3, and more.
Best Anthropic Models
All Anthropic Claude models ranked by benchmark performance.
Best Google AI Models
All Google Gemini and Gemma models ranked by benchmark performance.
Best Meta AI Models
All Meta Llama models ranked by benchmark performance.
Best DeepSeek Models
All DeepSeek models ranked by benchmark performance.
Best AI Models Overall
The top AI models ranked by overall benchmark performance across all categories.
Best Large Context Window LLMs
AI models with the largest context windows (200K+ tokens), ranked by benchmark performance.
Best Chinese AI Models
Top AI models from Chinese labs — DeepSeek, Alibaba Qwen, Zhipu GLM, Moonshot Kimi, and more — ranked by benchmark performance.
European AI Models
European AI models from Mistral, H Company, LightOn, and Aleph Alpha — ranked models first, then tracked sparse rows.
Best Non-Reasoning LLMs
Top standard AI models (no chain-of-thought reasoning) ranked by benchmark performance. Faster and cheaper than reasoning models.
Best Mistral Models
All Mistral AI models ranked by benchmark performance — Mistral Large, Mixtral, and more.
Best xAI Grok Models
All xAI Grok models ranked by benchmark performance.
Best Alibaba Qwen Models
All Alibaba Qwen models ranked by benchmark performance.
Value rankings
Best Value LLM for Coding
Top AI models ranked by coding benchmark performance per dollar. Find the most cost-effective LLM for coding tasks including SWE-bench, LiveCodeBench, and more.
Best Value Agentic AI Model
Top AI models ranked by agentic benchmark performance per dollar. Find the most cost-effective model for AI agents, tool use, and multi-step workflows.
Best Value LLM for Reasoning
Top AI models ranked by reasoning benchmark performance per dollar. Cost-adjusted rankings using ARC-AGI-2, LongBench v2, MRCRv2, and MuSR.
Best Value LLM for Knowledge
Top AI models ranked by knowledge benchmark performance per dollar. Cost-adjusted rankings using GPQA, MMLU-Pro, HLE, and more.
Best Value LLM for Math
Top AI models ranked by math benchmark performance per dollar. Cost-adjusted rankings using AIME 2025, BRUMO, and MATH-500.
Best Value Multimodal AI Model
Top AI models ranked by multimodal benchmark performance per dollar. Cost-adjusted rankings using MMMU-Pro and OfficeQA Pro.
Best Value LLM Overall
Top AI models ranked by overall benchmark performance per dollar. Find the most cost-effective all-around LLM across all categories.