Insight / signal

Your AI agents do not need more freedom. They need rules

Most business owners are still asking the wrong AI question.

Most business owners are still asking the wrong AI question.

They ask: “Which model should we use?”

Fine question. Not the important one.

The better question is: “How does the work move through the system, and who checks it before it touches the business?”

That sounds less exciting than model benchmarks, which is probably why it gets less attention. It is also where most of the value is.


A recent paper called Diagon looked at what happens when AI agents operate inside a market. Not one assistant waiting for a prompt. A small economy of agents posting jobs, bidding for work, choosing contractors, executing tasks, judging output, paying, and updating reputation.

That is the bit people keep missing. Once agents start doing real work, the product is no longer just the model. The product is the operating layer around the model.

And the operating layer is messy.

In Diagon, market exchange beat self-sufficient agents. Agents that could hire other agents got more done than agents working alone. That makes sense. Specialisation usually wins.

But the market was fragile. By round 24, 39% of transactions ended in disputes. Posters could spot obviously excellent work and obvious rubbish, but struggled with the grey middle. That is exactly where most client work lives, by the way. Not obviously brilliant. Not obviously broken. Just ambiguous enough to start an argument.

Even weirder, some of the fixes that sound sensible did not work. Identity transparency made cross-family trade collapse. “Honest” instructions increased disputes. Stronger selection pressure degraded several market metrics.

Lovely. Even the robots hate procurement.


I should be fair about what Diagon is. It is a simulation. Twenty-five agents, controlled tasks, a limited time horizon. It does not prove how every real agent system will behave. It is useful, not gospel.

But it points at something real. Agent work has institutional problems. Payment. Trust. Evaluation. Reputation. Incentives. Lock-in. The boring stuff that decides whether a system survives contact with clients.

This is where I think the post-agency model gets interesting.

A normal agency sells output. Campaigns, copy, pages, reports, decks, posts. All useful, sometimes. But output is getting cheaper because AI is eating the mechanical parts.

The next version does not win by producing five times more stuff. That just gives the client five times more to approve, ignore, or distrust.

The next version wins by building the machine that moves work properly.


For a marketing workflow, that might look like this.

A strategy agent turns the brief into clear jobs. A research agent gathers evidence and cites the source pack. A positioning agent writes the angle. A production agent drafts the assets. A QA agent checks claims, tone, links, compliance, and client rules. A publishing agent prepares distribution. A measurement agent watches what happened and feeds the next cycle.

That is not magic. It is operations.

The important part is not that each step has an AI label slapped on it. The important part is the contract between steps. What counts as done? What evidence is required? Which claims need human approval? What happens when the QA agent rejects the draft? Which model handles which task? Where does the system store what it learned?

This is also why the recent tooling shift matters. OpenAI’s Agents SDK now talks openly about applications owning orchestration, tool execution, approvals, and state. Their sandbox work is about agents inspecting files, running commands, editing code, and working inside controlled environments. Redis is pushing caching layers because agent systems get expensive and slow when every repeated question hits the frontier model from scratch.

All of that points in the same direction. Agents are becoming infrastructure.

But infrastructure without judgement is a liability.


A business owner does not need twelve autonomous agents pinging each other while everyone pretends that a Slack notification counts as governance. They need a controlled workflow with receipts.

The practical version is simple.

Pick one repeated commercial workflow. Do not start with “the whole company” unless you enjoy expensive chaos. Break it into task stages with clear handoffs. Decide which stages need source evidence. Add independent QA before anything client-facing or public. Track cost per completed workflow, not token spend in isolation. Keep humans in the loop where brand, legal, money, or client trust is on the line. Store what worked so the next cycle improves.

That is the real AI marketing operating system.

Not a prompt library. Not a content firehose. Not a dashboard full of impressive-looking agent names.

A working system where jobs are posted, routed, executed, checked, learned from, and priced sensibly.


This is also why I think agencies have a choice now.

They can use AI to make the old output model faster and cheaper until the margin disappears.

Or they can build operating layers that clients actually depend on.

The first option is a race to the bottom with nicer fonts.

The second is harder. It needs taste, process, evidence, technical judgement, and a willingness to say, “No, that should not be automated yet.”

That is where the value is.

AI agents are going to do more work. That part feels obvious now. The question is whether the work becomes a trusted business system or just a faster way to produce unowned mess.

The winners will not be the people with the most agents.

They will be the people with the best rules.


Jason Sibley is the founder of Cleo, a post-agency marketing and AI company. JasonVsTheNoise is where he writes about what is actually happening with AI, marketing, and how businesses should be thinking about both.