Insight / signal

AI video is becoming code, and agencies should be nervous

The interesting AI video story is not that avatars look better. It is that video is starting to behave like software.

The obvious AI video story is that avatars are getting better.

Fine. They are.

Faces look less dead. Movement is less cursed. The lighting is better. The demos are shinier. We can all clap politely and pretend we have not seen 400 versions of the same talking-head miracle by now.

But that is not the interesting bit.

The interesting bit is that video is starting to behave like software.

This week I was looking at HeyGen’s Cinematic Avatar and Hyperframes stack. The surface story is simple enough. Generate cinematic avatar clips, then use Hyperframes to turn HTML, CSS, media and animations into deterministic MP4 video.

The phrase that matters is on the Hyperframes repo: “Write HTML. Render video. Built for agents.”

That sentence should make agencies pay attention.

Because if video can be defined in HTML, generated by a coding agent, previewed locally and rendered to MP4, then part of video production moves out of the editing suite and into the operating layer.

Not all of it. Calm down.

Taste still matters. Script quality still matters. Brand still matters. Source material still matters. Someone still needs to know whether the thing is good, whether the pacing works, whether the claim is supportable, whether the client sounds like themselves or like a LinkedIn thought-leader hostage note.

But a lot of the repeatable work starts to look very different.


Think about what most client video work actually contains.

Intro frames. Lower-thirds. Captions. Logo lockups. Product screenshots. Feature callouts. Charts. Testimonial cards. Before-and-after overlays. Social crops. End cards. Slightly different versions for LinkedIn, YouTube Shorts, sales pages, webinars, onboarding emails and internal comms.

A lot of that is not genius editing. It is production plumbing.

Historically that plumbing still needed a person inside Premiere, After Effects, CapCut or whatever else. Change the title. Move the caption. Adjust the logo. Export the square version. Export the vertical version. Fix the typo. Re-export. Client has another change. Re-export again. Everyone pretends this is strategy.

Video-as-code attacks that bit.

With Hyperframes, the composition can be a template. The overlays are HTML elements. The animation can be GSAP or CSS. The timing is data attributes. The render is deterministic, meaning the same input should produce the same output. The repo is open source under Apache 2.0. It runs locally with Node and FFmpeg. The docs are explicit that agents can drive the loop.

That matters because agents are much better at editing structured files than dragging things around a timeline.

They can generate variants. They can swap copy. They can add a callout at the right timestamp. They can wire captions. They can insert a product screenshot. They can render three aspect ratios. They can make 20 sales-team versions from one approved source script. They can do the dull repetition without sighing loudly into Slack.

The Cinematic Avatar side adds another primitive: prompt-driven avatar or scene generation through an API. You can use it to create a short presenter or cinematic clip, then drop that MP4 into a Hyperframes composition and add branded overlays, captions and explainers on top.

So the stack starts to look like this. Use AI to generate or capture the base clip. Use an agent to write the composition. Use a brand template instead of starting from scratch. Render the MP4 as an output of the system. Then review it like a deliverable, not like magic.


That is the agency implication.

The lazy take is: great, now we can make more video content.

That is how you end up with a landfill of synthetic clips nobody asked for.

The better take is that video becomes a repeatable asset system.

For a SaaS company, that could mean every product release produces a short feature explainer, a sales enablement clip, a support walkthrough and three social versions from the same source notes.

For a property business, every listing could generate a branded walkthrough with consistent captions, area highlights, pricing callouts and an agent outro.

For a sports organisation, match clips, sponsor messages and academy updates could flow through the same brand frame instead of being rebuilt manually every time.

For an enterprise team, a long internal update could become manager briefings, training snippets and onboarding clips without turning the comms team into a video factory.

That is where the margin is.

Not in claiming AI has replaced editors. It has not. Good editors still have taste, rhythm and judgement. The point is that agencies should stop using skilled humans for the mechanical middle of the job.

A human should decide the story. A system should assemble the boring parts. A human should approve the final version.

That is the shape I keep coming back to with practical AI. The value is not fully autonomous content. Usually that just means nobody is accountable for the rubbish. The value is a controlled workflow where the machine does the repeatable work and the human keeps hold of judgement.


This is also why “AI video” is too weak as an offer.

Clients do not need another vendor saying they can make AI videos. They need someone to ask harder questions.

What videos do you repeat every month? Which ones are too expensive to make regularly? Where does your sales team keep asking for the same explainer? Which onboarding questions come up again and again? Which webinars, demos, podcasts or training sessions are being wasted because nobody has the time to turn them into usable clips? What claims need approval before they go out? What brand rules must never drift?

That is a systems conversation, not a content gimmick.

And this is where post-agency marketing gets interesting.

The old agency model sells outputs. Video, blog, campaign, landing page, deck. Useful sometimes, but slow and expensive when every asset starts from zero.

The better model sells an operating layer. Research in, approved messages in, source material in, reusable templates in, checked assets out.

That is what AI changes.

Not “everyone can make infinite content”. That is a threat as much as an opportunity. Infinite content mostly means infinite noise.

The real commercial shift is that previously expensive formats can become part of a repeatable workflow. Video can sit alongside blog posts, email sequences, social clips, sales docs and support assets as one output from the same system.

That is a very different agency proposition.


It is also uncomfortable, because it removes a familiar excuse.

“Video takes ages” has been true for a long time. It still will be true for high-end creative, campaign films, documentaries, proper brand work and anything where craft is the point.

But for the vast middle of business video, explainers, updates, walkthroughs, recaps, release notes, internal briefings, social clips, the excuse is starting to weaken.

If the input is structured, the brand system is defined, the template exists, the claims are checked and the approval loop is clear, then a lot of video should not take weeks. It should be generated, reviewed, fixed and shipped.

That is where agencies need to decide what business they are in.

If the business is selling manual production hours, this gets threatening quickly. If the business is building commercial operating systems, this is useful.

Because the client still needs the hard parts. Deciding what matters. Turning messy source material into a clear story. Keeping the claims honest. Matching the brand voice. Building the template. Setting approvals. Measuring whether the content did anything.

AI does not remove those jobs. It makes the absence of those jobs more obvious.

That is probably the best test for any AI content system. Does it make the business sharper, or just louder?


Video-as-code can make a business louder very quickly. That is the danger.

But used properly, it can also make useful communication cheaper and more consistent. The product update that never gets a video can get one. The sales explainer that only exists in someone’s head can become a clip. The training session that disappears after the call can become a small library. The founder’s thinking can travel further without turning them into a full-time content machine.

That is the version worth building.

Not synthetic faces for the sake of it. Not avatar spam. Not “look, no humans” theatre.

A proper content operating system where video is one output, generated from real source material, wrapped in a defined brand system, checked by a human and shipped while it is still useful.

That is the quiet shift.

AI video is becoming code. And once something becomes code, it becomes repeatable. Once it becomes repeatable, agencies cannot hide behind the same old production excuses.

Good.

The market does not need more agencies protecting bloated process. It needs teams that can build the system, keep the taste, and ship useful work without making every asset feel like a bespoke opera.

That is where the work is heading.