Insight / signal

Stop chasing the model. Build the harness.

Published19/06/2026

FormatInsight

Every business asks the same AI question first.

“Which model should we use?”

Fair enough. Also a slightly lazy question now.

A year or two ago, model choice felt like the whole game. Pick the wrong one and the work fell apart. Pick the right one and you suddenly had a clever assistant sitting inside the business. Some of that is still true. Some models are better at code. Some are better at writing. Some are cheaper. Some are less annoying to work with.

But the gap is not where most businesses think it is.

The useful AI signal this week was not “new model does a magic thing on a benchmark”. We have had plenty of that. The useful signal was infrastructure. The boring stuff. Which usually means it matters.

Look at what actually shipped.

Hermes put out its v0.16 release, and it is basically a control-plane release. Desktop app. Remote gateway. Web admin. Multi-profile sessions. Better model picking. Leaner skills. More visible system checks. Less faff around actually operating agents day to day.

Vercel launched eve, an open-source framework where an agent is just a directory of files. Instructions. Tools. Skills. Subagents. Channels. Schedules. A sandbox. Evals. In other words, agents are being treated less like mystical chat windows and more like applications you can build, run, inspect, and deploy.

OpenAI is pushing workspace agents inside ChatGPT for teams, with shared context, permissions, approvals, Slack, schedules, and long-running jobs.

Then there is the line from Matt Pocock in his agentic engineering interview. People are obsessed with the model, he says, but they should care more about the harness. Prompts, skills, environment, tests, codebase structure, documentation, feedback loops. The stuff around the model.

Different products. Same direction.

Here is the part that matters. The harness is the bit you actually control.

You do not control whether OpenAI, Anthropic, Google, Zhipu, Mistral, or whoever else wins next month’s leaderboard fight. You rent that intelligence. Everyone else can rent it too, usually within days, sometimes hours.

What you can control is the machine you build around it.

The harness is the difference between “we use AI” and “we have an AI operating system for this part of the business”.

It is the difference between a marketer asking ChatGPT for ten LinkedIn ideas, and a weekly content loop that reads customer conversations, checks live search demand, reviews competitors, drafts posts in the founder’s actual voice, routes them for approval, publishes them, measures the response, and improves next week’s brief.

It is the difference between a salesperson asking AI to write a follow-up, and a lead-response system that watches the CRM, pulls the account history, checks offer fit, drafts a reply, flags risk, and never sends anything sensitive without a human signing it off.

It is the difference between an agency producing another pile of AI-assisted assets, and a commercial system that runs every week and gets sharper as it goes.

This is where a lot of businesses are going to get caught out.

They will think they are behind because they have not adopted the newest model. In reality they are behind because their work is not organised enough for any model to help properly.

No clean source of truth. No sensible workflows. No feedback loop. No approval rules. No data layer. No useful documentation. No idea what should happen when the agent is wrong, late, overconfident, or expensive.

So they buy the subscription, run a few experiments, get a burst of novelty, then quietly drift back to the old way of working.

That is not because AI failed. It is because there was no harness.

The agency world is especially exposed here.

The old model sold outputs. Websites, campaigns, social posts, ads, emails, decks, reports. AI makes output cheaper. It does not automatically make outcomes better.

If your offer is “we can make more stuff faster”, you are standing in a very crowded room. It is loud in there. Cheap, too.

The better offer is the operating layer. Build the loop that decides what should be made. Build the research system. Build the campaign machine. Build the AI visibility monitor. Build the reporting cadence. Build the human approval points. Build the client-specific memory. Build the measurement that shows whether any of it actually worked.

That is a different kind of agency. Less output shop, more commercial infrastructure partner. It is also harder. Good. The hard bit is where the value is.

A useful harness has some obvious parts.

A clear trigger, so you know what starts the work. Good context, so the agent knows enough before it acts. Real tools, so it can safely read or write to the systems that matter. A stop condition, so it knows when the job is done. An eval, so you know whether the output is good enough. Human approval where the stakes are high. Memory that improves the next run without turning into a junk drawer. And cost visibility, because token burn is not a strategy.

That list is not glamorous. It will not get the same clicks as a model demo spitting out a 3D game in thirty seconds. But it is the list that decides whether AI becomes part of the business or stays as a clever tab in someone’s browser.

Marketing is the clean example.

Most AI marketing use is still stuck at draft generation. Write a post. Rewrite this email. Give me twenty headlines. Fine. Useful enough.

But the real value is upstream and downstream of the draft.

Upstream: what is worth saying? What are customers asking? Which competitors are showing up in AI answers? Which sources are being cited? What changed in the market this week? Which offer needs proof? What does the founder actually believe?

Downstream: did it reach the right people? Did it create sales conversations? Did AI search cite it? Did it improve the next campaign? Did the team learn anything?

That is not a prompt problem. It is an operating-system problem. And the same pattern applies to sales, support, hiring, client reporting, product research, and internal ops. The work that repeats is where the harness belongs.

There is a decent test for whether you have one.

If your AI workflow depends on one person remembering to open a chat window, paste a load of context, explain the task again, copy the result into another tool, check it by hand, and then maybe save it somewhere, you do not have a system.

You have a person doing admin with a very clever autocomplete.

That is not an insult. It is where most teams start. We all start there. But it is not where the advantage compounds.

The compounding starts when the workflow can run again. When the context gets better. When the judgement gets captured. When the boring steps disappear. When the human is pulled in for taste, risk, and decisions rather than copy-paste glue work.

This is why I keep coming back to the “post-agency” idea.

AI does not kill agencies because clients can suddenly generate more content. That is the shallow version. AI kills the weak agency model because clients will eventually realise that buying disconnected outputs makes less sense when the production cost collapses.

The replacement is not “AI content at scale”. God help us. The replacement is a marketing and commercial operating layer. A system that helps the business sense, decide, create, ship, measure, and learn faster than it could before.

The model matters. Of course it does. But the model is not the moat. The harness is.

So if you are a business owner, the question to put to your team is not “are we using AI?”

Ask this instead. Where does AI already sit inside a repeatable loop that makes the business better every week?

If the answer is nowhere, you have just found the work.

That is the kind of system we build at Cleo, because a model gets rented and a harness compounds.

Jason Sibley is the founder of Cleo, a post-agency marketing and AI company. JasonVsTheNoise is where he writes about what is actually happening with AI, marketing, and how businesses should be thinking about both.