Insight / signal

AI agents are leaving the chat window. Businesses need to catch up.

The useful AI story this week is not another model leaderboard.

The useful AI story this week is not another model leaderboard.

It is not a new prompt format. It is not some shiny “ten tools you need before breakfast” nonsense.

It is more boring than that, which usually means it matters.

AI agents are starting to get a proper shape.

Vercel launched eve this week, an open-source framework where an agent is basically a directory. Not a mystical digital employee. A folder with files in it.

There is an instructions file for what the agent is meant to do. A tools folder for what it can actually use. Skills for reusable procedures. Channels for where it lives. Schedules for when it runs. Subagents for delegation. Sandboxing so generated code does not trash the main application. Plus evals, tracing, approvals and durable execution.

Strip out the launch language and the point is simple. Agents are starting to look less like chatbots and more like software projects.

That is the bit business owners should pay attention to.


For the last couple of years, most AI adoption has happened in the chat window. Ask ChatGPT for a post. Ask Claude to rewrite a page. Ask Gemini to summarise a document. Useful enough, but still basically conversation.

The human carries the operating model in their head. What is the job? The human knows. Which source should be trusted? The human decides. When does it run? When the human remembers. Who checks it? Usually the same poor sod who asked for it in the first place.

That is fine for lightweight work. It falls apart the moment you want AI to run a repeatable business process.

And that is exactly where the market is moving.

The signals from the last couple of days all point the same way. Hermes has async subagents now, so a lead agent can push work into the background instead of freezing the main session. The Hermes agent OS discussion is all about scheduled jobs, persistent memory, QA judges, Kanban and model choice. The LangChain crowd is talking about stacked loops: an agent loop, a verification loop, an event-triggered loop, an improvement loop. Eric Siu frames the business version as revenue loops, with triggers, context, action, eval gates and stop conditions. Marketing School is warning that overloaded agents degrade and token costs bite.

Different language. Same direction.

The unit of value is moving from the AI output to the workflow around it.


That matters for agencies, consultants, and any business trying to use AI without turning the place into a haunted spreadsheet.

“We use AI to make content” is already weak. Everyone can do that. Most of it is beige, and a lot of it is actively harmful, because it adds to the pile of stuff without improving the thinking behind it.

“We build AI agents” is not much better. It sounds clever, but it can mean almost anything. A Slack bot. A prompt chain. A browser script held together with string. A demo that works beautifully right up until it meets a real client login, a weird data export, or a half-broken website.

The stronger offer is more specific. We take one repeated commercial workflow and make it faster, safer, more visible, and easier to run.

That might be an SEO and AEO sprint. Crawl the site, check the search data, inspect competitor pages, draft the weekly actions, produce a client-ready report, and flag what needs a human decision.

It might be sales follow-up. Watch for stale deals, read the CRM history, draft a sensible nudge, check tone and context, then wait for approval before anything goes out.

It might be content repurposing. Take a podcast, webinar, sales call or research note, pull the useful ideas out, build a blog draft, produce short posts, link the sources, and log what was used.

It might be client reporting. Check the numbers, explain what changed, spot the oddities, draft the report, and keep observed evidence separate from recommendations.

That is not AI magic. That is operations. And it is worth far more.


The businesses that get this right will not just buy a pile of subscriptions and hope something happens. They will design narrow work loops. They will decide what the agent can read. What it can write. What it can spend. What it can publish. What needs approval. What gets logged. What stops the run. What gets measured afterwards.

That sounds dull if you are chasing demos. It sounds obvious if you have ever had to run client work, fix a broken automation, or explain to a business owner why the robot confidently did the wrong thing at 2am.

The eve release is interesting because it makes the structure visible. An agent is not just a personality and a model. It is instructions, tools, skills, channels, schedules, subagents, sandboxing, observability and approvals.

In other words, the boring bits are becoming first-class. Good. The boring bits are where trust comes from.

This is the part most AI advice still misses. People talk about prompt libraries as if the prompt is the asset. Sometimes it is. More often, the real asset is the workflow you wrap around it. The trigger. The data source. The decision rule. The eval gate. The human approval. The audit trail. The cost control. The feedback loop.

Without those, you do not have an AI system. You have an enthusiastic intern with no manager and a corporate card. Useful, occasionally. Dangerous, eventually.


For a post-agency business, this is the whole point in plain English.

The future agency is not an output shop with AI bolted on. It is an operating layer that turns messy repeated work into controlled systems. Research into briefs. Briefs into campaigns. Campaigns into assets. Assets into distribution. Distribution into learning. Learning back into the next run.

Humans still matter in that model. Maybe more than before. Taste matters. Commercial judgement matters. Knowing when not to automate matters. Knowing when a result is technically correct but strategically stupid matters.

But the workbench changes. A good AI system should not just produce another draft. It should help the business run a loop better than it did yesterday.

That is the line I would use with a client now. Do not start with “where can we use AI?” Start with “what work keeps repeating, who owns it, what makes it good, and where does it break?” Then build the smallest supervised workflow around that.

One job. One owner. Clear inputs. Clear approvals. Clear logs. Clear stop conditions. Measure it. Improve it. Only then make it more autonomous.

Less sexy than “fully autonomous AI employees”. Also far less likely to set fire to the furniture.


The agent story is maturing. Slowly. Messily. With too much hype around it, as usual. But underneath the noise, the direction is useful.

Agents are leaving the chat window.

The businesses that win will not be the ones with the most AI tools. They will be the ones that turn repeated work into visible, supervised, improving workflows.

That is where the next agency model gets interesting. Not more content. Better operating loops. That is the kind of system we build at Cleo, because a campaign expires and a system compounds.


Jason Sibley is the founder of Cleo, a post-agency marketing and AI company. JasonVsTheNoise is where he writes about what is actually happening with AI, marketing, and how businesses should be thinking about both.