Insight / signal
Stop asking AI to run the whole process
A lot of people are building AI agents the way a tired manager briefs a new hire on their worst day.
A lot of people are building AI agents the way a tired manager briefs a new hire on their worst day.
Here are fifteen tools. Here is a giant instruction document. Follow the process. Don’t make mistakes. Use judgement. Ask if you’re unsure. Good luck.
Then everyone acts surprised when the agent skips a step, takes a strange route, burns tokens looping for ten minutes, or quietly decides the failure did not matter.
That is not only a model problem. It is a design problem.
Google’s new ADK 2.0 post puts a useful line in the sand. Moving agents from prototype to production creates a different class of problems: loops, hallucinated business logic, messy error handling, prompt injection, context bloat, and execution nobody can predict. Their answer is not “write a better prompt”. It is workflows. Graph-based routing. Human approval. Retry policies. Durable pauses and resumes. Telemetry.
In other words, software.
That sounds obvious, but the agent market has spent a long time pretending otherwise. Demo culture rewards a particular fantasy: give the model broad access and let it work things out. If it can browse, call tools, write code, send messages, update systems, and explain itself afterwards, surely that is progress.
Sometimes it is. Often it is just expensive improvisation.
Google uses a refund process as its example, which is perfect because it is boring. Boring is where reality lives. A refund might need to check purchase history, read policy, decide eligibility, move money, email the customer, and close the ticket. Some of those steps are predictable. Some need judgement. One of them moves real money.
Checking purchase history does not need creativity. Closing a ticket does not need a model to have a little think. Issuing a refund should not be left to a probabilistic loop because the prompt said “be careful”.
The model can help interpret the complaint, classify the issue, draft the email, or compare a messy case against the policy. But the process itself should not live in the model’s head.
That is the rule I would give a business owner today. If step B always follows step A, do not pay a language model to decide that. Put it in the workflow. Save the model for the parts where the business actually needs judgement.
This matters because the word “agent” is starting to cover too many things. A useful agent is rarely a free-roaming digital worker. Most of the time it is a controlled loop around a specific job.
Lead comes in. Enrich it. Check fit. Draft a reply. Ask for approval if it is above a threshold. Send. Log. Follow up. Escalate if nobody responds.
Support ticket arrives. Classify it. Pull the policy. Draft the answer. Ask a human before refunds, cancellations, legal language, or anything reputationally sensitive. Log the source. Close only when the action is actually done.
None of that needs mystical autonomy. It needs a decent operating loop.
This is also why the latest model launches do not move the argument as much as vendors would like. Anthropic’s Claude Sonnet 5 is pitched as a more agentic model: better tool use, better coding, longer follow-through, lower cost than the bigger models. I want cheaper execution layers. Everyone building real systems does. But a cheaper, more capable model does not remove the need for process. It increases the volume of things the system can attempt. Which means the rails matter more, not less. A model that can get further on its own can also get further in the wrong direction when the system around it is vague.
This is where a lot of businesses are going to get caught. They will commission an “AI employee” without doing the tedious work first. No clean workflow. No source of truth. No permission boundary. No approval rule. No error log. No evaluation set. No owner. Then they will either over-trust it and get burned, or under-trust it and shrink it down to a chat widget nobody uses. Both are a waste.
The better way starts before the agent exists.
Watch the human doing the job. Not the polished SOP version. The real one. The tabs. The shortcuts. The checks. The exceptions. The bits they know but never wrote down. The moments where they stop and ask someone else.
Then split the job into four buckets.
Deterministic steps are for normal software. Fetch this record. Check this field. Apply this rule. Move to this status. Notify this system.
Judgement steps are good places for AI. Interpret a messy message. Classify intent. Summarise evidence. Draft a response. Spot a mismatch.
Approval steps are where a human stays in the loop, at least until the system has earned more trust. Money moving. Legal exposure. Customer promises. Deleting data. Publishing under the brand.
Measurement steps are the ones people forget because they are not flashy. What happened? What did the agent use? Where did it fail? What did the human change? Did it save time? Did quality improve? Did revenue move? Did risk go up?
That is the work. It is not as sexy as “your new AI workforce”. It is also far less likely to make a mess of your business.
The commercial frame I keep coming back to is simple: sell the job disappearing, not another SaaS seat. A business owner does not want an agent because it has a nice dashboard. They want fewer missed leads, faster quotes, cleaner support routing, less admin, better follow-up, fewer dropped balls. But if you are going to sell a job disappearing, you need to know exactly which job you are removing. You cannot skip the workflow mapping and call it innovation. That is just cosplay with API bills.
This is where the agency model changes too.
The old agency sold output: pages, campaigns, ads, content, reports. The AI-slop agency sells more output, faster. More posts. More variants. More fake personalisation. More beige mush for the internet to ignore. The useful post-agency model sells operating capacity. It builds the loop that creates the work, checks the work, routes it, publishes it, learns from it, and keeps a record of what happened.
For marketing, that might be a campaign operating system. Sources in, positioning draft out, claim checks, channel variants, approval, scheduling, performance review, lessons back into the next brief. For sales, lead triage and follow-up. For websites, agent-safe CMS operations: draft on staging, snapshot first, approval, publish, log, roll back if needed.
The shape is the same every time. Trigger. Context. Tools. Rules. AI judgement. Human approval. Action. Log. Review.
If those words sound boring, good. Boring words are usually where production lives.
The thing that makes a system sellable is not autonomy. It is reliability. Anyone can demo an agent completing one happy-path task. The real test is uglier. Give it last month’s messy cases. Borderline enquiries. Duplicate records. Missing fields. Weird customer wording. Out-of-date product data. Policy exceptions. Then count where it got things right, where it escalated, and where it confidently wandered off. That evidence is the product. Not magic. Work removed. Not another dashboard. A process that runs better than it did before.
I think this is the bit many businesses will miss over the next year. They will chase the model name. They will ask whether Sonnet 5 beats GPT-whatever on some benchmark. They will buy tools because the demo looked clever. Meanwhile the useful companies will be doing duller things. Writing down the real workflow. Cleaning the source data. Deciding approval rules. Building eval sets. Putting logs somewhere humans can read. Routing simple steps through code. Using AI for judgement, not for everything. Measuring the result.
That is not a lack of ambition. It is how you get agents out of the toy box and into the business without pretending probability is process.
The next phase of AI agents will not be won by the company with the loosest autonomy story. It will be won by the company that knows exactly where autonomy belongs, and where it absolutely does not.
Jason Sibley is the founder of Cleo, a post-agency marketing and AI company. JasonVsTheNoise is where he writes about what is actually happening with AI, marketing, and how businesses should be thinking about both.