Insight / signal

There’s a free 198B AI model sitting in Hermes right now

Published02/06/2026

FormatInsight

Use Step 3.7 Flash as the fast executor in the loop: cheap where speed is good enough, with premium models reserved for planning, judgement and failure recovery.

StepFun just dropped Step 3.7 Flash onto the Nous Portal. 30 days free. No credit card. No API fees.

This is the kind of thing that gets buried in a Tuesday afternoon X post and then quietly becomes one of the more useful additions to your AI stack. So here’s the full picture.

What Step 3.7 Flash actually is

Step 3.7 Flash is a 198 billion parameter sparse Mixture of Experts model. The “sparse” part matters. It only activates around 11 billion parameters per token, which is why it runs so fast while still having access to a 198B brain.

The headline numbers: 256K context window. Up to 400 tokens per second. Native image and video understanding built in from the ground up, not bolted on. Apache 2.0 license, meaning the weights are open and you can do what you want with them.

It currently sits first on ClawEval 1.1, which is the benchmark designed specifically for agentic use. Not a general chat benchmark. An agent benchmark. That distinction matters.

This is not a model built for polished conversation.

It is a model built to do things.

When to use it

Step 3.7 Flash earns its place in two situations.

The first is agentic workflows. Search, scrape, summarise, plan, call a tool, repeat. Flash-tier speed means it handles lots of small hops without making the whole pipeline feel sluggish. At roughly a ninth of the cost of Claude Opus on SWE-Bench-style tasks, it makes sense as your default executor, with a heavier model on standby for the hard calls.

The second is long document work. 256K context is genuinely large. If you are parsing long reports, full codebases, or multi-document research, this fits.

Where it is less suited: tasks where prose quality is the whole point. It is competitive with flash-tier models on coding quality, but if you need something that writes like a careful editor rather than a fast executor, reach for something else.

The pattern that makes sense: run Step 3.7 Flash as the main agent loop, escalate to a premium model for planning and failure recovery.

Cheap where fast is good enough. Expensive only where it earns it.

How to get it running in Hermes

The Nous Portal is giving you 30 days of access at no cost. Here is how to connect it.

Step 1. Create a free Nous Portal account

Go to portal.nousresearch.com and sign up. No card required.

Step 2. Update Hermes

hermes update

Pull the latest version before setting up the connection.

Step 3. Connect to the Nous Portal

hermes setup --portal

This handles the login and configuration in one command.

Step 4. Select the model

hermes model

Choose stepfun/step-3.7-flash:free from the list.

That is it. You are now running a 198B model in Hermes, free for the next 30 days, with no usage charges on top.

Why this is worth paying attention to

Most of the conversation about frontier AI is about closed models behind APIs. A model this capable, with open weights, available at this cost, is not the norm.

For anyone running Hermes for research pipelines, agentic SEO workflows, or multi-tool orchestration, the case for testing it right now is straightforward. The cost of trying it is zero. The 30-day window is generous. And the reasoning-effort controls, low, medium and high, let you tune latency versus depth per step.

That fits exactly the architecture where you want a fast default and a heavier fallback.

The tools that matter are not always the ones that get the most coverage. Sometimes they are the ones sitting in a terminal command, free, waiting for you to run four lines.

Go get it.