NAB
OpenAI seems to
delight in pulling rabbits from a hat and was more than aware of what its
latest research project would do when it alerted the internet. Everyone’s gone
wild for Sora, a new diffusion model being tested which can generate one
minute video clips from just a single text input. To prove what it can do OpenAI
dropped some videos online generated by Sora “without modification.” One
clip highlighted a photorealistic woman walking down a rainy Tokyo street.
article here
“Every single one
of [them] is AI-generated, and if this doesn’t concern you at least a little
bit, nothing will,” tweeted YouTube tech journalist Marques Brownlee.
“This is simultaneously really impressive and really frightening at the same
time,” he added on his YouTube channel.
A blog post on the
website of nonlinear editing software Lightworks declared, “Sora’s
almost magical powers represents yet another seismic shift in the possibilities
of content creation.”
“It’s incredible
and scary” says Erik Naso of Newsshooter.
“Sora is a glimpse
into a future where the lines between creation, imagination, and AI blur into
something truly extraordinary,” gushed Conor Jewiss at Stuff.
Benj Edwards
of Ars Technica thinks OpenAI is on track to deliver a “cultural
singularity” — the moment when truth and fiction in media become
indistinguishable.
“Technology like
Sora pulls the rug out from under that kind of media frame of reference. Very
soon, every photorealistic video you see online could be 100 percent false in
every way. Moreover, every historical video you see could also be false.”
What has excited
the AI and artistic community so much is the cinematic photorealism of the
videos produced by OpenAI’s algorithm which seems “to understand how things
like reflections, and textures, and materials, and physics, all interact with
each other over time,” said Brownlee.
In its research
paper Open AI states the model deeply understands language, enabling it to
accurately interpret prompts and generate compelling characters that express
vibrant emotions.
Sora can also
create multiple shots within a single generated video that accurately persist
characters and visual style.
OpenAI further
states it is teaching the AI to understand and simulate the physical world in
motion, with the goal of training models that help people solve problems that
require real-world interaction.
Two videos in
particular grabbed attention. “This is one of the most convincing AI generated
videos I’ve ever seen, says Brownlee of a video made with this text
prompt: “A movie trailer featuring the adventures of the 30 year old space
man wearing a red wool knitted motorcycle helmet, blue sky, salt desert,
cinematic style, shot on 35mm film, vivid colors.”
“This looks like it
could be an actual film trailer,” says Theoretically Media’s Tim Simmonds.”
I mean that there’s nothing really in here to majorly indicate that this is AI
generated.”
The other,
featuring an aerial flyover, was spun-up from the prompt: “Historical
footage of California during the gold rush.”
“The drone footage
of an old California mining town looks really, really pretty great,” Simmonds
says. “And even as the camera makes this turn here, the buildings stay intact,
they don’t start to shift and warp and morph into weird things.”
Brownlee thinks it
demonstrates “all sorts of implications for the drone pilot that no longer
needs to be hired, and all the photographers and videographers whose footage no
longer needs to be licensed to show up in the ad that’s being made,” he says.
“It’s also very
capable of historical themed footage,” he adds. “This is supposed to be
California during the gold rush. It’s AI generated but it could totally pass
for the opening scene in an old western.
Which begs the
inevitable question, How long until an entire ad with every single shot is
completely generated with AI? Or an entire YouTube video, or an entire movie?
Simmonds still
thinks we are a way out from that “because [Sora] still has flaws and there’s
no sound [no audio/dialogue sync] and there’s a long way to go with the prompt
engineering to iron these things out,” he says.
Naso agrees that
Sora “could change the game for stock footage,” adding that the next stage for
AI prompt filmmaking is dialogue-based scenes. “So far, these examples are more
like b-roll.”
Nonetheless, even
at the pace of AI development it seems OpenAI has caught everyone napping.
Rachel Tobac, a
member of the technical advisory council of the Cybersecurity and
Infrastructure Security Agency (CISA), posted on X (formerly known as
Twitter) that “we need to discuss the risks” of the AI model.
“My biggest concern
is how this content could be used to trick, manipulate, phish, and confuse the
general public,” she said.
OpenAI also
says it is aware of defamation or misinformation problems arising from this
technology and plans to apply the same content filters to Sora as the company
does to DALL-E 3 that prevent “extreme violence, sexual content, hateful
imagery, celebrity likeness, or the IP of others,” as Aminu Abdullahi
reports at TechRepublic.
Others flagged
concerns about copyright and privacy, with Ed Newton-Rex, CEO of non-profit AI
certification company Fairly Trained, maintaining: “You simply cannot argue
that these models don’t or won’t compete with the content they’re trained on,
and the human creators behind that content.”
Anticipating these
concerns, OpenAI plans to watermark content created with Sora with C2PA metadata.
However, OpenAI doesn’t currently have anything in place to prevent users of
its other image generator, DALLE-3, from removing metadata.
OpenAI said it is
engaging with artists, policymakers and others to ensure safety before
releasing the new tool to the public. However, its get-out clause is that
despite extensive research and testing, “we cannot predict all of the
beneficial ways people will use our technology, nor all the ways people will
abuse it.”
The
Microsoft-backed company is valued at $80 billion after a recent
injection of VC funds. “It will become impossible for humans to detect
AI-generated content by human beings,” Gartner analyst Arun Chandrasekaran
warned TechRepublic. “VCs are making investments in startups building
deepfake detection tools, however, there is a need for public-private
partnerships to identify, often at the point of creation, machine-generated
content.”
Sora joins a chorus
of other text to video generators such as Runway and Fliki, the
Meta Make A Video generator, and the yet-to-be-released Google
Lumiere.
Question: Has Apple
taken its eye off the ball? Answer: Maybe not. Its researchers have just
published paper about Keyframer, a design tool for animating static images
with natural language.
As Emilia
David at The Verge points out, Keyframer is one of several generative
AI innovations that Apple has announced in recent months. In December, the
company introduced Human Gaussian Splats (HUGS), which can create
animation-ready human avatars from video clips. Apple also released MGIE,
an AI model that can edit images using text-based descriptions.
No comments:
Post a Comment