Best Open Source LLMs in 2026

Open-weight models have closed much of the gap with proprietary ones. The best open models now score within 5-10 points of the top closed APIs on most benchmarks. DeepSeek, Meta Llama, Alibaba Qwen, Zhipu GLM, and Mistral all ship strong open options — some of them reasoning models that match proprietary performance on math and coding. The main trade-offs are context window size (most cap at 128K vs 1M+ for top proprietary models) and agentic performance, where proprietary models still hold a wider lead. Self-hosting also shifts infrastructure burden to you, so factor in serving costs.

Unless noted otherwise, ranking surfaces on this page use BenchLM's provisional leaderboard lane rather than the stricter sourced-only verified leaderboard.

Bottom line: Open-weight models are within 5-10 points of the best proprietary APIs. GLM-5 (Reasoning) leads, but DeepSeek and Llama are strong alternatives.

According to BenchLM.ai, DeepSeek V4 Pro (Max) leads this ranking with a score of 87, followed by Kimi K2.6 (84) and DeepSeek V4 Pro (High) (83). There is meaningful separation between the top models, suggesting genuine performance differences.

All models in this ranking are open-weight, meaning they can be self-hosted for maximum control and cost efficiency.

This ranking is based on provisional overall weighted scores across BenchLM.ai's scoring formula tracked by BenchLM.ai. For detailed model profiles, click any model name below. To compare two specific models head-to-head, use the "vs #" links.

1Open

DeepSeek V4 Pro (Max)

DeepSeek · 1M

87prov. overall

2Open

Kimi K2.6

Moonshot AI · 256K

84prov. overall

3Open

DeepSeek V4 Pro (High)

DeepSeek · 1M

83prov. overall

What changed

GLM-5 (Reasoning) leads all open-weight models with the highest overall score.

DeepSeek R1 competitive on reasoning and math benchmarks.

Llama 4 Maverick Meta's strongest entry, good on coding and reasoning.

How to choose

Best open-weight model overall?

GLM-5 (Reasoning) — highest open-weight score

Self-hosting on a budget?

DeepSeek R1 — strong scores at $0 price

Need a large ecosystem?

Llama 4 Maverick — Meta ecosystem support

Coding-focused open model?

GPT-5.3 Codex — best open-weight coding

Full Rankings (55 models)

DeepSeek V4 Pro (Max)

DeepSeek·Open Weight·1M

prov. overall

vs #2

Kimi K2.6

Moonshot AI·Open Weight·256K

prov. overall

vs #3

DeepSeek V4 Pro (High)

DeepSeek·Open Weight·1M

prov. overall

vs #4

GLM-5.1

Z.AI·Open Weight·203K

prov. overall

vs #5

GLM-5 (Reasoning)

Z.AI·Open Weight·200K

prov. overall

vs #6

Qwen3.5 397B (Reasoning)

Alibaba·Open Weight·128K

prov. overall

vs #7

MiniMax M3

MiniMax·Open Weight·1M

prov. overall

vs #8

DeepSeek V4 Flash (Max)

DeepSeek·Open Weight·1M

prov. overall

vs #9

Qwen3.6-27B

Alibaba·Open Weight·262K

prov. overall

vs #10

DeepSeek V4 Flash (High)

DeepSeek·Open Weight·1M

prov. overall

vs #11

DeepSeek V4 Pro

DeepSeek·Open Weight·1M

prov. overall

vs #12

GLM-4.7

Z.AI·Open Weight·200K

prov. overall

vs #13

GLM-5

Z.AI·Open Weight·200K

prov. overall

vs #14

Qwen3.6-35B-A3B

Alibaba·Open Weight·262K

prov. overall

vs #15

Kimi K2.5

Moonshot AI·Open Weight·256K

prov. overall

vs #16

Qwen3.5-122B-A10B

Alibaba·Open Weight·262K

prov. overall

vs #17

Qwen3.5 397B

Alibaba·Open Weight·128K

prov. overall

vs #18

Qwen3.5-27B

Alibaba·Open Weight·262K

prov. overall

vs #19

DeepSeek V3.2 (Thinking)

DeepSeek·Open Weight·128K

prov. overall

vs #20

MiMo-V2-Flash

Xiaomi·Open Weight·256K

prov. overall

vs #21

DeepSeek V4 Flash

DeepSeek·Open Weight·1M

prov. overall

vs #22

DeepSeek V3.2

DeepSeek·Open Weight·128K

prov. overall

vs #23

Qwen3.5-35B-A3B

Alibaba·Open Weight·262K

prov. overall

vs #24

MiniMax M2.7

MiniMax·Open Weight·200K

prov. overall

vs #25

DeepSeek Coder 2.0

DeepSeek·Open Weight·128K

prov. overall

vs #26

DeepSeek LLM 2.0

DeepSeek·Open Weight·128K

prov. overall

vs #27

Qwen2.5-1M

Alibaba·Open Weight·1M

prov. overall

vs #28

DeepSeekMath V2

DeepSeek·Open Weight·128K

prov. overall

vs #29

Qwen2.5-72B

Alibaba·Open Weight·128K

prov. overall

vs #30

Qwen3 235B 2507 (Reasoning)

Alibaba·Open Weight·128K

prov. overall

vs #31

Nemotron 3 Super 100B

NVIDIA·Open Weight·1M

prov. overall

vs #32

Llama 3.1 405B

Meta·Open Weight·128K

prov. overall

vs #33

Sarvam 105B

Sarvam·Open Weight·128K

prov. overall

vs #34

DeepSeek V3

DeepSeek·Open Weight·128K

prov. overall

vs #35

GPT-OSS 120B

OpenAI·Open Weight·128K

prov. overall

vs #36

MiniCPM5-1B

OpenBMB·Open Weight·131K

prov. overall

vs #37

DeepSeek-R1

DeepSeek·Open Weight·128K

prov. overall

vs #38

DBRX Instruct

Databricks·Open Weight·32K

prov. overall

vs #39

Qwen3 235B 2507

Alibaba·Open Weight·128K

prov. overall

vs #40

DeepSeek V3.1 (Reasoning)

DeepSeek·Open Weight·128K

prov. overall

vs #41

Phi-4

Microsoft·Open Weight·16K

prov. overall

vs #42

Llama 3 70B

Meta·Open Weight·128K

prov. overall

vs #43

DeepSeek V3.1

DeepSeek·Open Weight·128K

prov. overall

vs #44

Nemotron 3 Nano 30B

NVIDIA·Open Weight·32K

prov. overall

vs #45

Mistral 8x7B

Mistral·Open Weight·32K

prov. overall

vs #46

Llama 4 Scout

Meta·Open Weight·10M

prov. overall

vs #47

Mixtral 8x22B Instruct v0.1

Mistral·Open Weight·64K

prov. overall

vs #48

Nemotron Ultra 253B

NVIDIA·Open Weight·32K

prov. overall

vs #49

Nemotron-4 15B

NVIDIA·Open Weight·32K

prov. overall

vs #50

Gemma 3 27B

Google·Open Weight·32K

prov. overall

vs #51

GPT-OSS 20B

OpenAI·Open Weight·128K

prov. overall

vs #52

Llama 4 Maverick

Meta·Open Weight·1M

prov. overall

vs #53

Llama 4 Behemoth

Meta·Open Weight·32K

prov. overall

vs #54

Mistral 7B v0.3

Mistral·Open Weight·32K

prov. overall

vs #55

Mistral 8x7B v0.2

Mistral·Open Weight·32K

prov. overall

These rankings update weekly

Get notified when models move. One email a week with what changed and why.

Free. No spam. Unsubscribe anytime.

Key Takeaways

The top model is DeepSeek V4 Pro (Max) by DeepSeek with a provisional score of 87.

The best open-weight model is DeepSeek V4 Pro (Max) at position #1.

55 models are included in this ranking.

Score in Context

What these scores mean

Open-weight models are ranked by the same overall BenchLM score as proprietary ones. The gap has closed significantly — the best open models score within 5-10 points of the top closed APIs.

Known limitations

Open-weight models typically have smaller context windows (128K vs 1M+), which matters for long-document and agentic tasks. Self-hosting costs (GPU, inference optimization) are not reflected in benchmark scores.

Explore More

Price vs Performance Chart Compare Pricing Which LLM Should I Use? Benchmark Explainers

Last updated: June 2, 2026

The AI models change fast. We track them for you.

For engineers, researchers, and the plain curious — a weekly brief on new models, ranking shifts, and pricing changes.

Free. No spam. Unsubscribe anytime.