Benchmark analysis of Step 3.5 Flash by StepFun across 32 sourced tests on BenchLM.
According to BenchLM.ai, Step 3.5 Flash ranks #40 out of 123 models with an overall score of 66/100. While not a frontier model, it offers specific advantages depending on the use case.
Step 3.5 Flash is a open weight model with a 256K token context window. It processes queries without explicit chain-of-thought reasoning, offering faster response times and lower token usage.
Its strongest category is Multilingual (#28), while its weakest is Multimodal & Grounded (#64). This performance profile makes it a well-rounded choice across a range of tasks.
Creator
StepFun
Source Type
Open WeightReasoning
Non-ReasoningContext Window
256K
Overall Score
Arena Elo
1266
Step 3.5 Flash ranks #40 out of 123 models with an overall score of 66. It is created by StepFun and features a 256K context window.
Step 3.5 Flash ranks #43 out of 123 models in knowledge and understanding benchmarks with an average score of 60.8. There are stronger options in this category.
Step 3.5 Flash ranks #39 out of 123 models in coding and programming benchmarks with an average score of 47.1. There are stronger options in this category.
Step 3.5 Flash ranks #37 out of 123 models in mathematics benchmarks with an average score of 84.5. There are stronger options in this category.
Step 3.5 Flash ranks #39 out of 123 models in reasoning and logic benchmarks with an average score of 78.3. There are stronger options in this category.
Step 3.5 Flash ranks #45 out of 123 models in agentic tool use and computer tasks benchmarks with an average score of 60.2. There are stronger options in this category.
Step 3.5 Flash ranks #64 out of 123 models in multimodal and grounded tasks benchmarks with an average score of 66.7. There are stronger options in this category.
Step 3.5 Flash ranks #33 out of 123 models in instruction following benchmarks with an average score of 87. There are stronger options in this category.
Step 3.5 Flash ranks #28 out of 123 models in multilingual tasks benchmarks with an average score of 82.8. There are stronger options in this category.
Yes, Step 3.5 Flash is an open weight model created by StepFun, meaning it can be downloaded and run locally or fine-tuned for specific use cases.
Step 3.5 Flash has a context window of 256K, which determines how much text it can process in a single interaction.
New model releases, benchmark scores, and leaderboard changes. Every Friday.
Free. Your signup is stored with a derived country code for compliance routing.