Wednesday 21 February 2024

AI Video Generators: Is This the Year We’ll See Feature Films?

NAB

AI video generation has gone from uncanny valley to near realistic in just a few years and the latest wave of tools, led by Google Lumiere, brings the prospect of coherent AI-generated features a step closer.

article here

Google Lumiere may yet to be released but the company has released video clips it says were created by the new technology. Reviewers have gone wild.

“An AI video generator that looks to be one of the most advanced text-to-video models yet,” says Matt Growcoot at PetaPixel. “Pretty amazing,” assesses Sabrina Ortiz at ZDNet. “Revolutionary,” judges Jace Dela Cruz at TechTimes. “Can render cute animals in implausible situations,” writes Benj Edwards at Ars Technica.

YouTuber Matt Wolfe predicts that 30-60 minute long completely AI-generated films “that are coherent and enjoyable” are coming in the next few months.

Calling the news a “bombshell,” Tim Simmons, owner and founder of AI commentator Theoretically Media, notes that Lumiere isn’t perfect but it’s a startling advance nonetheless, catapulting Google into the front ranks of AI video generators.

“I don’t think I’ve seen before this [idea] you can give the model a reference image and then it will generate videos in the style of that reference image,” Wolfe says.

That’s because Google has taken a different approach to its model.

As The Verge explains, Lumiere uses new diffusion model called Space-Time-U-Net, or STUNet, that figures out where things are in a video (space) and how they simultaneously move and change (time).

“Other models stitch videos together from generated key frames where the movement already happened, while STUNet lets Lumiere focus on the movement itself based on where the generated content should be at a given time in the video,” says The Verge reporter Emilia David.

Lumiere starts with creating a base frame from the prompt. Then, it uses the STUNet framework to begin approximating where objects within that frame will move to create more frames that flow into each other, creating the appearance of seamless motion.

Lumiere starts with creating a base frame from the prompt. Then, it uses the STUNet framework to begin approximating where objects within that frame will move to create more frames that flow into each other, creating the appearance of seamless motion.

“By handling both the placement of objects and their movement simultaneously,” Google claims Lumiere “can create consistent, smooth and realistic movement across a full video clip,” reports Ryan Morrison at Tom’s Guide.

Or as Simmons puts it, “Basically, it all comes down to this space time unit which allows for the video to be created all at once, as opposed to other models which begin with an input frame, an output frame and then generates key frames between those. [With Lumiere] the video is generated all at once.”

Beyond text-to-video generation, Lumiere will also allow for image-to-video generation, stylized generation, which lets users make videos in a specific style, cinemographs that animate only a portion of a video, and inpainting to mask out an area of the video to change the color or pattern.

It can generate 80 frames at 16fps — or five seconds of video — putting it on par and even ahead of its competitors. But Google’s research paper describes “a new inflation scheme which includes learning to downsample the video in both space and time,” which Google says can pave the way to longer (suggesting even “full-length”) clips.

CineD carries an overview of AI video generators to highlight their current capabilities and limitations. This excludes Lumiere but includes leaders like Runway’s Gen-2, Pika 1.0 and Stability AI’s Stable Video Diffusion (SVD).

After testing, Mascha Deikova concludes that AI video generators haven’t yet reached the point where they can take over our jobs as cinematographers or 2D/3D animators.

“The frame-to-frame consistency is not there, the results often have a lot of weird artifacts, and the motion of the characters does not feel even remotely realistic,” she says.

Deikova also finds the overall process still requires “way too much effort” to get a decent generated video that’s close to your initial vision. “It seems easier to take a camera and get the shot that you want ‘the ordinary way,’” she says.

“At the same time, it is not like AI is going to invent its own ideas or carefully work on framing that is the best one for the story. Nor is that something non-filmmakers will be constantly aware of while generating videos.”

There are also other limitations that users of AI video generators should be aware of so that they don’t fall foul of the law. For example, as noted by Deikova, SVD doesn’t allow using its models for commercial purposes. You will face the same issue with Runway and Pika on a free-of-charge basis. At the same time, once you get a paid subscription, Pika will remove their watermark and grant commercial rights.

Lumiere is not available for independent testing, and there is no word as to Google’s timeline for potential deployment. This is perhaps because, as Google’s Lumiere paper noted, “there’s a risk of misuse for creating fake or harmful content with our technology, and we believe that it is crucial to develop and apply tools for detecting biases and malicious use cases to ensure a safe and fair use.”

In CineD’s roundup, it warns that nobody knows what data most of AI video models were trained on. Most possibly, it suggests, the database consists of anything to be found online — therefore images and works of artists who haven’t given their permission nor have received any attribution.

One company forging a different tack is Adobe with its content credentialled image generator Firefly, trained on sources that either Adobe owns or that artists have given the company permission to use. The company is also developing a video AI generator but has yet to release it.

As of today — and we really mean, like right now — text or image-to-video generators could become a quick solution for ideation (story boarding), previsualization and animation.

Bu applying visual storytelling tools and “crafting beautiful evolving cinematography” is, right now, one for human minds.

No comments:

Post a Comment