BenchLM is tracking Trinity-Large-Thinking by Arcee AI. Some benchmark data is visible, but not enough non-generated coverage is available for a leaderboard rank yet.
BenchLM is tracking Trinity-Large-Thinking, but this profile is currently excluded from the public leaderboard because it still lacks enough verified benchmark coverage to rank safely. Only verified public benchmark rows appear below.
Trinity-Large-Thinking is a open weight model with a 512K token context window. It uses explicit chain-of-thought reasoning, which typically improves performance on math and complex reasoning tasks at the cost of higher latency and token usage.
This profile currently has 9 of 83 tracked benchmarks. BenchLM only exposes verified benchmark rows publicly, so missing categories stay blank until a sourced evaluation is available.
Provider
Arcee AISource Type
Open WeightReasoning
ReasoningContext Window
512K
Model Status
Tracked
Overall Score
Unranked
Pricing
$0.25 / $0.90
Input / output per 1M
Runtime
N/A
Latency unavailable
BenchLM is still missing enough verified benchmark coverage to rank this model across the public leaderboard. Only verified public benchmark rows are shown below.
SWE-bench Verified* 2026 · Quarterly refresh · updated April 1, 2026
AIME25 (Arcee) 2026 · Quarterly refresh · updated April 1, 2026
Tau2-Airline 2026 · Quarterly refresh · updated April 1, 2026
τ²-Bench 2026 · Quarterly refresh · updated April 1, 2026
PinchBench 2026 · Quarterly refresh · updated April 1, 2026
Trinity-Large-Thinking has 9 verified benchmark scores on BenchLM, but it does not yet have enough coverage to receive a global overall rank.
Trinity-Large-Thinking has visible benchmark coverage in knowledge and understanding, but BenchLM does not currently assign it a global category rank there.
Trinity-Large-Thinking has visible benchmark coverage in coding and programming, but BenchLM does not currently assign it a global category rank there.
Trinity-Large-Thinking has visible benchmark coverage in mathematics, but BenchLM does not currently assign it a global category rank there.
Trinity-Large-Thinking has visible benchmark coverage in agentic tool use and computer tasks, but BenchLM does not currently assign it a global category rank there.
Trinity-Large-Thinking has visible benchmark coverage in instruction following, but BenchLM does not currently assign it a global category rank there.
Yes, Trinity-Large-Thinking is an open weight model created by Arcee AI, meaning it can be downloaded and run locally or fine-tuned for specific use cases.
Not yet. Trinity-Large-Thinking currently has 9 verified benchmark scores out of the 83 benchmarks BenchLM tracks. BenchLM only exposes verified public benchmark rows, so missing categories stay blank until a sourced evaluation is available.
Trinity-Large-Thinking has a context window of 512K, which determines how much text it can process in a single interaction.
New model releases, benchmark scores, and leaderboard changes. Every Friday.
Free. Your signup is stored with a derived country code for compliance routing.