IBC
article here
Image generators such as Veo 3 can now convincingly simulate
human emotions, interactions and voice but the speed of development leaves
production companies crying out for ethical LLMs.
It is an irony of the AI revolution that a photoreal
intergalactic space battle - among the hardest and most expensive things to
shoot in real life - is among the easiest to reproduce in Generative AI, but
simulating a basic dialogue scene over several minutes between two people?
Forget it. For now.
“This is a turbulent era,” says digital artist László Gaál.
“One of the worst aspects of AI today is that the pace of technology change is
so fast no one can adopt it. Everything that I tell a client on day one will be
not true in three months’ time.”
The Hungarian gave up his career as a colourist to
concentrate full time on producing purely AI generated content. What he has
produced, including a mock Volvo commercial and news report from
a fake car show, has made him a talent in demand by brands including
L’Oreal to guide them in AI experiments.
“I really enjoy experimenting with different kind of
workflows, for example, using traditional analogue photos and turning them into
something else,” he says.
He claims to be the first to transfer an AI video back to
film stock just to see what it would be like to bring together the two ends of
the filmmaking spectrum: analogue celluloid film and AI generated footage.
He was one of the first to try Veo 3, churning out the
photoreal 70’ AI-generated auto show video in less than 24 hours after Google’s
release.
“GenAI is extremely good at delivering almost everything in
a very short amount of time but there are technical limitations,” he says.
“10-20 per cent of the time there are inaccuracies, especially with product
shapes, logos and texts.”
For example, he says, a completely different workflow is
required if you just want to have a random car driving in your scene, or if you
want to have a specific brand’s card driving in that scene.
The most interesting development for Gaál is Veo 3’s audio
generation. “Previously, Gen-AI video of people with dialogue didn’t just
look fake,” he says, “it looked scary to have incredibly realistic people with
very strange mouth movements. The progress on lip sync in Veo 3 is huge. It’s
the first model that generates realistic talking. Equally as important is the
ability to generate any kind of sound effect. For example, if you want to
create a nice exterior scene of the British countryside, Veo3 will generate
audio to match.”
Other AI creators agree. Hashem Al-Ghaili tested whether Veo
3 could handle macro videography and found that not only did it
generate astonishingly detailed ‘footage’ of insects it did so with sound
effects for each shot. “It’s going to change documentary filmmaking forever,”
he said.
Nonetheless, you can’t supply your own audio to a Veo 3
generated animation. “In a text prompt you can describe what your character
should say and then Veo 3 will make it say that but [the software] won’t allow
you to upload an image, for example of yourself, and make it talk.”
Another limit is that each generation is only eight seconds
long. “That's why so many videos you see made with Veo 3 are of random
characters saying one sentence and then we never see them again. Currently,
there's no solution for one character to speak longer than 8 seconds. You
cannot reference previous generations.”
A new feature in Veo called ‘Ingredients’ goes some way to
addressing this. It allows users to upload specific elements such as photos of
a character, object or vehicle) which could be maintained in successive
generations. This feature improves customisation while maintaining visual
consistency but there is - as yet - no audio capability to accompany
this.
“There's a lot of interest from creative agencies but it’s
mostly stemming from their lack of knowledge. Currently half of the job is
figuring out the workflows so you can you make something happen that no one
else has thought [of],” he says.
He thinks agencies are trying to apply their existing
conception of ad production to AI without understanding what AI can actually
achieve.
“They’ve seen a five second example of an AI video on
Instagram and say they want the same, but with our product in 10 different
locations and with a consistent character throughout. I am getting some briefs
like that where they just want to force someone to do their ideas their way. I
don't see any genuine interest in what is possible.”
He says other companies are experimenting with full time ‘AI
Engineers’ who are creating tools and workflows; “Those are closer to what I
believe.”
Human in the loop
The world’s biggest VFX companies are adding AI into their
pipelines. Cinesite has launched an internal technology exploration unit
called TechX; Dneg’s AI subsidiary is called Brahma and recently acquired
Metaphysic and Canada’s RodeoFX is exploring AI principally in its advertising
division where projects tend to be smaller and AI tools can make artists more
agile.
“We have a group using AI to go faster, to explore more
things we couldn’t do before and to make more tests with clients,” explained
Marie Amiot, VP Advertising Services and Original Content, RodeoFX speaking at
Annecy International Animation Film Festival. “Our artists who use AI are happy
with the result but it is challenging and also raises ethical and environmental
questions.”
LA-based AI production company Asteria is also using AI to
speed render times and make more room for creative experimentation. “We’re
not generating in the style of Studio Ghibli or anyone else”, insisted Senior
Director, Arvid Tappert, speaking at Annecy in reference to the viral craze
which saw users of OpenAI’s GPT-4 reproduce characters in the revered
Japanese animator’s style. “That’s not what we're about. We want to make
non-derivative work from our own ideas and we want to control the AI.”
The dilemma for independent video production and
postproduction companies is that the pressure to produce more content at far
lower cost may be too hard to resist if an AI tool can get to the same result
more efficiently. That risks redundancies as well as copyright infringement.
“We are doing a lot of premium productions that are
expensive to produce and that limits how many original projects you can make,”
says Barbara Stephen, President of Australia’s Flying Bark
Productions. “If we can use emerging tech to help talent tell more and
different stories it is an opportunity for us. That said, our business is based
on the value of IP and copyright. We have no tolerance for infringement of
rights of artists.”
Rob Hoffman, Head of Industry Strategy at computer maker
Lenovo laid out the macro pressures impacting film and TV producers, “You are
all being asked to deliver a larger volume of higher fidelity content than ever
before against timelines and budgets that aren't scaling with what you're being
asked to do. There is a fundamental gap between your client's expectations and
your ability to be able to deliver. Pipelines and production tools are just
inherently complex and they're getting more complex. Throw in concerns about
technology taking over the creative process or taking away the artist itself
and this has rippling effects across the industry.”
The burden of trying to figure out all of these challenges
shouldn't fall on the shoulders of individual creators or artisan studios, he
said.
“It's the responsibility of those that are creating the
tools and technology being used for film, television and game development to be
stewards of the industry. We all need to be doing a hell of a lot more than we
are today.”
Emily Hsu, Senior Producer at Epic Games sympathised: “Any
decent producer wants to empower creators with the best tools to remove as many
roadblocks as possible but if you can’t hire more headcount then the only lever
you can pull is more tech and better workflows.”
She believes the market will decide. “Audiences should be
given more credit for distinguishing AI slop from quality creator driven
content. Slop means it is made without intention or skill and without the eyes
of a creator.”
Not everyone is so optimistic that humans will remain ‘in
the loop’.
Tim Miller, founder of Blur Studio said at Annecy, “I don’t
feel it’s safe to say that humans have to be in the process. There’s a lot of
slop, true, but I’ve also seen AI tools do things that I’d not thought
possible. If any of us think it won’t continue or accelerate then they are
running towards a cliff with hands over their eyes.”
He added, “AI doesn't care whether artists were involved or
not. If a group of artists has plagiarised you then the industry could shame
them into not doing it again, but AI doesn't give a shit. It will use whatever
it needs to accomplish the task. That's scary but also the interesting part.”
Ethical image generators
To move forward, the creative industries would like to use
AI models that have either been trained on internal (bespoke/owned) content or
on licenced data rather than being scraped off the web.
Nicolas Dufresne, independent director and developer said,
“Animators want regulation. Those afraid of AI need regulation. Everyone wants
regulation except the main developers. We need to teach how AI works and
show what the consequences are of bad AI but the problem is the opacity of the
main AI developers.”
Google, OpenAI, Meta, Runway (maker of Stable Diffusion) and
Midjourney are accused of a lack of transparency in the data on which their
image generators were trained. Many are the subject of ongoing lawsuits.
Alternatives are emerging. Asteria has launched an AI
imaging tool “which is 100 per cent” according to Tappert. Called Marey and
built with Moonvalley “everything is fully licenced and paid for. It offers
customisation options like fine-grained camera and motion controls. OpenAI and
Google say it can’t be done and that they need to scrape [the internet] but now
we have something positive to show that can use your own material and everyone
gets paid, which is the way it should be.”
Spanish stock library Freepik has partnered with
Fal.ai to launch Flite, another open-source image model trained on licensed
data. This is built on a dataset of 80 million images, which is far smaller
than the usual 1 billion+ images of LLMs. Nonetheless it claims it is the
largest publicly available text-to-image model “trained entirely on legally
sound content.”
There is evidence that the AI giants are mining specialised
data sets rather than trawling the internet, even paying companies for those
assets.
“Instead of seeking volume the new trend these companies are
asking for is specially curated data which should at least be better for the
environment because processing power will be less,” notes Dufresne.
Unleash the genAI
In January 2023 Gaál set up an Insta account under a
pseudonym showcasing AI generated images he had made of cars but posted them as
if shot for real by a stills camera on film. He says he stopped posting after
people began to think the AI was in fact real.
“My idea was to follow the AI journey with AI by creating an
account of AI generated photographs. made by an invented photographer, Rick
Deckard. After a few months no-one could really tell which one was AI generated
and which one isn't. The AI became so real that people mistook them for the
real thing.”
The nod to Blade Runner hints at the idea
that one day, machines might not even know they are machine. “How will it feel
for a genAI machine to create the images and videos we prompt?” says Gaal.
“Will it fear being turned off?”
Al-Ghaili is similarly awed, even if his tests are tongue in
cheek. “Imagine if AI characters became aware they were living in a
simulation,” he posed in a Veo 3 generated video.
In a world where AI-generated characters are being introduced to video games, this is not as far-fetched as it sounds.
No comments:
Post a Comment