Thursday, 10 July 2025

Video replicants and the drive for ethical LLMs

IBC

article here

Image generators such as Veo 3 can now convincingly simulate human emotions, interactions and voice but the speed of development leaves production companies crying out for ethical LLMs.

It is an irony of the AI revolution that a photoreal intergalactic space battle - among the hardest and most expensive things to shoot in real life - is among the easiest to reproduce in Generative AI, but simulating a basic dialogue scene over several minutes between two people? Forget it. For now.

“This is a turbulent era,” says digital artist László Gaál. “One of the worst aspects of AI today is that the pace of technology change is so fast no one can adopt it. Everything that I tell a client on day one will be not true in three months’ time.”

The Hungarian gave up his career as a colourist to concentrate full time on producing purely AI generated content. What he has produced, including a mock Volvo commercial and news report from a fake car show, has made him a talent in demand by brands including L’Oreal to guide them in AI experiments.

“I really enjoy experimenting with different kind of workflows, for example, using traditional analogue photos and turning them into something else,” he says.

He claims to be the first to transfer an AI video back to film stock just to see what it would be like to bring together the two ends of the filmmaking spectrum: analogue celluloid film and AI generated footage.

He was one of the first to try Veo 3, churning out the photoreal 70’ AI-generated auto show video in less than 24 hours after Google’s release.

“GenAI is extremely good at delivering almost everything in a very short amount of time but there are technical limitations,” he says. “10-20 per cent of the time there are inaccuracies, especially with product shapes, logos and texts.”

For example, he says, a completely different workflow is required if you just want to have a random car driving in your scene, or if you want to have a specific brand’s card driving in that scene.

The most interesting development for Gaál is Veo 3’s audio generation.  “Previously, Gen-AI video of people with dialogue didn’t just look fake,” he says, “it looked scary to have incredibly realistic people with very strange mouth movements. The progress on lip sync in Veo 3 is huge. It’s the first model that generates realistic talking. Equally as important is the ability to generate any kind of sound effect. For example, if you want to create a nice exterior scene of the British countryside, Veo3 will generate audio to match.”

Other AI creators agree. Hashem Al-Ghaili tested whether Veo 3 could handle macro videography and found that not only did it generate astonishingly detailed ‘footage’ of insects it did so with sound effects for each shot. “It’s going to change documentary filmmaking forever,” he said.

Nonetheless, you can’t supply your own audio to a Veo 3 generated animation. “In a text prompt you can describe what your character should say and then Veo 3 will make it say that but [the software] won’t allow you to upload an image, for example of yourself, and make it talk.”

Another limit is that each generation is only eight seconds long. “That's why so many videos you see made with Veo 3 are of random characters saying one sentence and then we never see them again. Currently, there's no solution for one character to speak longer than 8 seconds. You cannot reference previous generations.”

A new feature in Veo called ‘Ingredients’ goes some way to addressing this. It allows users to upload specific elements such as photos of a character, object or vehicle) which could be maintained in successive generations. This feature improves customisation while maintaining visual consistency but there is - as yet - no audio capability to accompany this. 

“There's a lot of interest from creative agencies but it’s mostly stemming from their lack of knowledge. Currently half of the job is figuring out the workflows so you can you make something happen that no one else has thought [of],” he says.

He thinks agencies are trying to apply their existing conception of ad production to AI without understanding what AI can actually achieve.

“They’ve seen a five second example of an AI video on Instagram and say they want the same, but with our product in 10 different locations and with a consistent character throughout. I am getting some briefs like that where they just want to force someone to do their ideas their way. I don't see any genuine interest in what is possible.”

He says other companies are experimenting with full time ‘AI Engineers’ who are creating tools and workflows; “Those are closer to what I believe.”

Human in the loop

The world’s biggest VFX companies are adding AI into their pipelines. Cinesite has launched an internal technology exploration unit called TechX; Dneg’s AI subsidiary is called Brahma and recently acquired Metaphysic and Canada’s RodeoFX is exploring AI principally in its advertising division where projects tend to be smaller and AI tools can make artists more agile.

“We have a group using AI to go faster, to explore more things we couldn’t do before and to make more tests with clients,” explained Marie Amiot, VP Advertising Services and Original Content, RodeoFX speaking at Annecy International Animation Film Festival. “Our artists who use AI are happy with the result but it is challenging and also raises ethical and environmental questions.”

LA-based AI production company Asteria is also using AI to speed render times and make more room for creative experimentation. “We’re not generating in the style of Studio Ghibli or anyone else”, insisted Senior Director, Arvid Tappert, speaking at Annecy in reference to the viral craze which saw users of OpenAI’s GPT-4 reproduce characters in the revered Japanese animator’s style. “That’s not what we're about. We want to make non-derivative work from our own ideas and we want to control the AI.”

The dilemma for independent video production and postproduction companies is that the pressure to produce more content at far lower cost may be too hard to resist if an AI tool can get to the same result more efficiently. That risks redundancies as well as copyright infringement.

“We are doing a lot of premium productions that are expensive to produce and that limits how many original projects you can make,” says Barbara StephenPresident of Australia’s Flying Bark Productions. “If we can use emerging tech to help talent tell more and different stories it is an opportunity for us. That said, our business is based on the value of IP and copyright. We have no tolerance for infringement of rights of artists.”

Rob Hoffman, Head of Industry Strategy at computer maker Lenovo laid out the macro pressures impacting film and TV producers, “You are all being asked to deliver a larger volume of higher fidelity content than ever before against timelines and budgets that aren't scaling with what you're being asked to do. There is a fundamental gap between your client's expectations and your ability to be able to deliver. Pipelines and production tools are just inherently complex and they're getting more complex. Throw in concerns about technology taking over the creative process or taking away the artist itself and this has rippling effects across the industry.”

The burden of trying to figure out all of these challenges shouldn't fall on the shoulders of individual creators or artisan studios, he said.

“It's the responsibility of those that are creating the tools and technology being used for film, television and game development to be stewards of the industry. We all need to be doing a hell of a lot more than we are today.”

Emily Hsu, Senior Producer at Epic Games sympathised: “Any decent producer wants to empower creators with the best tools to remove as many roadblocks as possible but if you can’t hire more headcount then the only lever you can pull is more tech and better workflows.”

She believes the market will decide. “Audiences should be given more credit for distinguishing AI slop from quality creator driven content. Slop means it is made without intention or skill and without the eyes of a creator.”

Not everyone is so optimistic that humans will remain ‘in the loop’.

Tim Miller, founder of Blur Studio said at Annecy, “I don’t feel it’s safe to say that humans have to be in the process. There’s a lot of slop, true, but I’ve also seen AI tools do things that I’d not thought possible. If any of us think it won’t continue or accelerate then they are running towards a cliff with hands over their eyes.”

He added, “AI doesn't care whether artists were involved or not. If a group of artists has plagiarised you then the industry could shame them into not doing it again, but AI doesn't give a shit. It will use whatever it needs to accomplish the task. That's scary but also the interesting part.”

Ethical image generators

To move forward, the creative industries would like to use AI models that have either been trained on internal (bespoke/owned) content or on licenced data rather than being scraped off the web.

Nicolas Dufresne, independent director and developer said, “Animators want regulation. Those afraid of AI need regulation. Everyone wants regulation except the main developers.  We need to teach how AI works and show what the consequences are of bad AI but the problem is the opacity of the main AI developers.”

Google, OpenAI, Meta, Runway (maker of Stable Diffusion) and Midjourney are accused of a lack of transparency in the data on which their image generators were trained. Many are the subject of ongoing lawsuits.

Alternatives are emerging. Asteria has launched an AI imaging tool “which is 100 per cent” according to Tappert. Called Marey and built with Moonvalley “everything is fully licenced and paid for. It offers customisation options like fine-grained camera and motion controls. OpenAI and Google say it can’t be done and that they need to scrape [the internet] but now we have something positive to show that can use your own material and everyone gets paid, which is the way it should be.”

Spanish stock library Freepik has partnered with Fal.ai to launch Flite, another open-source image model trained on licensed data. This is built on a dataset of 80 million images, which is far smaller than the usual 1 billion+ images of LLMs. Nonetheless it claims it is the largest publicly available text-to-image model “trained entirely on legally sound content.”

There is evidence that the AI giants are mining specialised data sets rather than trawling the internet, even paying companies for those assets.

“Instead of seeking volume the new trend these companies are asking for is specially curated data which should at least be better for the environment because processing power will be less,” notes Dufresne.

Unleash the genAI

In January 2023 Gaál set up an Insta account under a pseudonym showcasing AI generated images he had made of cars but posted them as if shot for real by a stills camera on film. He says he stopped posting after people began to think the AI was in fact real.

“My idea was to follow the AI journey with AI by creating an account of AI generated photographs. made by an invented photographer, Rick Deckard. After a few months no-one could really tell which one was AI generated and which one isn't. The AI became so real that people mistook them for the real thing.”

The nod to Blade Runner hints at the idea that one day, machines might not even know they are machine. “How will it feel for a genAI machine to create the images and videos we prompt?” says Gaal. “Will it fear being turned off?”

Al-Ghaili is similarly awed, even if his tests are tongue in cheek. “Imagine if AI characters became aware they were living in a simulation,” he posed in a Veo 3 generated video.

In a world where AI-generated characters are being introduced to video games, this is not as far-fetched as it sounds.

No comments:

Post a Comment