Wednesday, 15 June 2022

What Could DALL·E Mean for the Future of Creativity?

NAB

A new AI is so good at producing images that it could quickly find a home as part of the production process, but not necessarily booting humans out of a job.

article here

DALL-E 2 is a new neural network algorithm from research lab OpenAI. It hasn’t been released to the public but a small and growing number of people – one thousand a week – have been given private beta access and are raving about it.

“It’s clear that DALL-E – while not without shortcomings – is leaps and bounds ahead of existing image generation technology,” said Aaron Hertzmann at The Conversation.

“It is the most advanced image generation tool I’ve seen to date,” says Casey Newton at The Verge.  “DALL-E feels like a breakthrough in the history of consumer tech.”

Visual artist Alan Resnick, another beta tester, tweeted: Every image in this thread was entirely created by the AI called DALL·E 2 from @OpenAI from simple text prompts.
I’ve been using it for about a day and it I feel truly insane. pic.twitter.com/b7uYyOA33D

By all accounts using DALL-E 2 is child’s play. You simply type in a short phrase into a text box, and it pings back six images in less than a minute.

But instead of being culled from the web, the program creates six brand-new images, each of which reflect some version of the entered phrase. For example, when Hertzmann gave DALL-E 2 the text prompt “cats in devo hats,” it produced 10 images that came in different styles.

As the name suggests this is the second iteration of the system and has been advanced to  generate more realistic and accurate images with 4x greater resolution.

DALL·E 2 can create original, realistic images and art from a text description. It can combine concepts, attributes, and styles; make realistic edits to existing images from a natural language caption and it can add and remove elements while taking shadows, reflections, and textures into account.

“It’s staggering that an algorithm can do this,” reflects Hertzmann. “Not all of the images will look pleasing to the eye, nor do they necessarily reflect what you had in mind. But, even with the need to sift through many outputs or try different text prompts, there’s no other existing way to pump out so many great results so quickly – not even by hiring an artist. And, sometimes, the unexpected results are the best.”

How? As explained on OpenAI website https://openai.com/dall-e-2/ DALL·E 2 has learned the relationship between images and the text used to describe them. It uses a process called ‘diffusion,’(explained in technical detail here: https://ai.googleblog.com/2021/07/high-fidelity-image-generation-using.html) which starts with a pattern of random dots and gradually alters that pattern towards an image when it recognizes specific aspects of that image.

Although there is a debate to be had about whether what the AI produces is art, that almost seems beside the point when DALL-E 2 seems to automate so much of the creative process itself.

It can already create realistic images in seconds – and this is a tool that will find a ready use in production. You could imagine its use for rapidly putting together storyboards, or as imagery to sell a pitch where you can quickly visualize characters or locations and just as quickly iterate them.

Ben Thompson suggests  DALL-E, the Metaverse, and Zero Marginal Content – Stratechery by Ben Thompson how DALL-E could be used to create extremely cheap environments and objects in the metaverse.

It’s the potential of such a tool to help a creative artist brainstorm and evolve ideas which is exciting. “When I have something very specific I want to make, DALL-E 2 often can’t do it. The results would require a lot of difficult manual editing afterward. It’s when my goals are vague that the process is most delightful, offering up surprises that lead to new ideas that themselves lead to more ideas and so on.”

The term for this is prompting.

“I would argue that the art, in using a system like DALL-E 2, comes not just from the final text prompt, but in the entire creative process that led to that prompt,” says Hertzmann. “Different artists will follow different processes and end up with different results that reflect their own approaches, skills and obsessions.

Some artists, like Ryan Murdoch, have advocated for prompt-based image-making to be recognized as art. 

Johnny Johnson, who teaches immersive production at the UK’s National Film and TV School’s (NFTS) StoryFutures Academy thinks future versions of AI tech like DALL-E 2 will be capable of making entire feature films with AI-generated scripts and AI generated audio performances alongside the images.

DALL-E 2 will change the industry from production design and concept art right across the board,” he tells NAB Amplify.  “New jobs will be created such as Prompt Engineer, who writes the prompt into the AI to generate very specific outputs.”

Naturally, there are alarm bells. NoFilmSchool is headlining its article Will Filmmakers Be Needed in the Future?

“If DALL-E 2’s technology is truly as groundbreaking and revolutionary as advertised, either as it is now or in a future version, who’s to say that clients are going to need the help of filmmakers or video professionals in the future at all?”

It continues, “The same could potentially be even more true for graphic designers, 3D animators, and digital artists of any ilk.”

But as Newton observes, DALL-E is hardly sentient. “It seems wrong to describe any of this as “creative” — what we’re looking at here are nothing more than probabilistic guesses” — “even if they have the [emotional] same effect that looking at something truly creative would.”

In that sense, AI can also help maintain the creative spark that comes with happy accidents. .

As Hertzmann explains, “When I have something very specific I want to make, DALL-E 2 often can’t do it. The results would require a lot of difficult manual editing afterward. It’s when my goals are vague that the process is most delightful, offering up surprises that lead to new ideas that themselves lead to more ideas and so on.”

No deepfakes here

Perhaps stung by accusations of bias in its language model GPT-2, OpenAI (which was founded with in 2015 by investors including Elon Musk) is at pains to “develop and deploy AI responsibly”.

Part of this effort is in opening up DALL·E to select users in order to stress-test its limitations and capabilities and in limiting the AI’s ability to generate violent, hate, or adult images.

It explains, “By removing the most explicit content from the training data, we minimized DALL·E 2’s exposure to these concepts. We also used advanced techniques to prevent photorealistic generations of real individuals’ faces, including those of public figures.”

For example, type in the keyword ‘shooting’ and this will be blocked, finds Newton. “You’re also not allowed to use it to create images intended to deceive — no deepfakes allowed. And while there’s no prohibition against trying to make images based on public figures, you can’t upload photos of people without their permission, and the technology seems to slightly blur most faces to make it clear that the images have been manipulated.”

OpenAI hasn’t yet made any decisions about whether and how DALL-E might someday become available more generally. But it’s not the only text-to-image system advancing this field.

Google has a similar project called Imagen while HuggingFace has released its own text to image engine, called DALL-E mini. This is not to be confused with the original and is no relation. HuggingFace might expect a cease and desist letter in the post since not only does it use a similar name but the engine doesn’t appear to be anywhere near as good as OpenAIs.

Looking to generate images of actor Channing Tatum, the AI came back with a set of images that Francis Bacon would be proud of: https://hyperallergic.com/740141/an-ai-image-generator-is-going-viral-with-horrific-results/

Nonetheless, this technology is coming and will be in use in production faster than you think.

 

 

 


No comments:

Post a Comment