Friday 23 February 2024

OpenAI’s Sora: It’s the Beginning or the End of Video and Either Way It’s a Big Deal

NAB 

OpenAI seems to delight in pulling rabbits from a hat and was more than aware of what its latest research project would do when it alerted the internet. Everyone’s gone wild for Sora, a new diffusion model being tested which can generate one minute video clips from just a single text input. To prove what it can do OpenAI dropped some videos online generated by Sora “without modification.” One clip highlighted a photorealistic woman walking down a rainy Tokyo street.

article here

“Every single one of [them] is AI-generated, and if this doesn’t concern you at least a little bit, nothing will,” tweeted YouTube tech journalist Marques Brownlee. “This is simultaneously really impressive and really frightening at the same time,” he added on his YouTube channel.

A blog post on the website of nonlinear editing software Lightworks declared, “Sora’s almost magical powers represents yet another seismic shift in the possibilities of content creation.”

“It’s incredible and scary” says Erik Naso of Newsshooter.

“Sora is a glimpse into a future where the lines between creation, imagination, and AI blur into something truly extraordinary,” gushed Conor Jewiss at Stuff.

Benj Edwards of Ars Technica thinks OpenAI is on track to deliver a “cultural singularity” — the moment when truth and fiction in media become indistinguishable.

“Technology like Sora pulls the rug out from under that kind of media frame of reference. Very soon, every photorealistic video you see online could be 100 percent false in every way. Moreover, every historical video you see could also be false.”

What has excited the AI and artistic community so much is the cinematic photorealism of the videos produced by OpenAI’s algorithm which seems “to understand how things like reflections, and textures, and materials, and physics, all interact with each other over time,” said Brownlee.

In its research paper Open AI states the model deeply understands language, enabling it to accurately interpret prompts and generate compelling characters that express vibrant emotions.

Sora can also create multiple shots within a single generated video that accurately persist characters and visual style.

OpenAI further states it is teaching the AI to understand and simulate the physical world in motion, with the goal of training models that help people solve problems that require real-world interaction.

Two videos in particular grabbed attention. “This is one of the most convincing AI generated videos I’ve ever seen, says Brownlee of a video made with this text prompt: “A movie trailer featuring the adventures of the 30 year old space man wearing a red wool knitted motorcycle helmet, blue sky, salt desert, cinematic style, shot on 35mm film, vivid colors.”

“This looks like it could be an actual film trailer,” says Theoretically Media’s Tim Simmonds.” I mean that there’s nothing really in here to majorly indicate that this is AI generated.”

The other, featuring an aerial flyover, was spun-up from the prompt: “Historical footage of California during the gold rush.”

“The drone footage of an old California mining town looks really, really pretty great,” Simmonds says. “And even as the camera makes this turn here, the buildings stay intact, they don’t start to shift and warp and morph into weird things.”

Brownlee thinks it demonstrates “all sorts of implications for the drone pilot that no longer needs to be hired, and all the photographers and videographers whose footage no longer needs to be licensed to show up in the ad that’s being made,” he says.

“It’s also very capable of historical themed footage,” he adds. “This is supposed to be California during the gold rush. It’s AI generated but it could totally pass for the opening scene in an old western.

Which begs the inevitable question, How long until an entire ad with every single shot is completely generated with AI? Or an entire YouTube video, or an entire movie?

Simmonds still thinks we are a way out from that “because [Sora] still has flaws and there’s no sound [no audio/dialogue sync] and there’s a long way to go with the prompt engineering to iron these things out,” he says.

Naso agrees that Sora “could change the game for stock footage,” adding that the next stage for AI prompt filmmaking is dialogue-based scenes. “So far, these examples are more like b-roll.”

Nonetheless, even at the pace of AI development it seems OpenAI has caught everyone napping.

Rachel Tobac, a member of the technical advisory council of the Cybersecurity and Infrastructure Security Agency (CISA), posted on X (formerly known as Twitter) that “we need to discuss the risks” of the AI model.

“My biggest concern is how this content could be used to trick, manipulate, phish, and confuse the general public,” she said.

OpenAI also says it is aware of defamation or misinformation problems arising from this technology and plans to apply the same content filters to Sora as the company does to DALL-E 3 that prevent “extreme violence, sexual content, hateful imagery, celebrity likeness, or the IP of others,” as Aminu Abdullahi reports at TechRepublic.

Others flagged concerns about copyright and privacy, with Ed Newton-Rex, CEO of non-profit AI certification company Fairly Trained, maintaining: “You simply cannot argue that these models don’t or won’t compete with the content they’re trained on, and the human creators behind that content.”

Anticipating these concerns, OpenAI plans to watermark content created with Sora with C2PA metadata. However, OpenAI doesn’t currently have anything in place to prevent users of its other image generator, DALLE-3, from removing metadata.

OpenAI said it is engaging with artists, policymakers and others to ensure safety before releasing the new tool to the public. However, its get-out clause is that despite extensive research and testing, “we cannot predict all of the beneficial ways people will use our technology, nor all the ways people will abuse it.”

The Microsoft-backed company is valued at $80 billion after a recent injection of VC funds. “It will become impossible for humans to detect AI-generated content by human beings,” Gartner analyst Arun Chandrasekaran warned TechRepublic. “VCs are making investments in startups building deepfake detection tools, however, there is a need for public-private partnerships to identify, often at the point of creation, machine-generated content.”

Sora joins a chorus of other text to video generators such as Runway and Fliki, the Meta Make A Video generator, and the yet-to-be-released Google Lumiere.

Question: Has Apple taken its eye off the ball? Answer: Maybe not. Its researchers have just published paper about Keyframer, a design tool for animating static images with natural language.

As Emilia David at The Verge points out, Keyframer is one of several generative AI innovations that Apple has announced in recent months. In December, the company introduced Human Gaussian Splats (HUGS), which can create animation-ready human avatars from video clips. Apple also released MGIE, an AI model that can edit images using text-based descriptions.


No comments:

Post a Comment