NAB
A new AI is so good at producing images that it could quickly find a home as part of the production process, but not necessarily booting humans out of a job.
article here
DALL-E 2 is a new neural network algorithm from research lab
OpenAI. It hasn’t been released to the public but a small and growing number of
people – one thousand a week – have been given private beta access and are
raving about it.
“It’s clear that DALL-E – while not without shortcomings –
is leaps and bounds ahead of existing image generation technology,” said Aaron
Hertzmann at The Conversation.
“It is the most advanced image generation tool I’ve
seen to date,” says Casey Newton at The Verge. “DALL-E feels like a breakthrough in the history of consumer tech.”
Visual artist Alan Resnick, another beta tester, tweeted: Every image
in this thread was entirely created by the AI called DALL·E 2 from @OpenAI from
simple text prompts.
I’ve been using it for about a day and it I feel truly insane. pic.twitter.com/b7uYyOA33D
By all accounts using DALL-E 2 is child’s play. You simply
type in a short phrase into a text box, and it pings back six images in less
than a minute.
But instead of being culled from the web, the program
creates six brand-new images, each of which reflect some version of the entered
phrase. For example, when Hertzmann gave DALL-E 2 the text prompt “cats in devo
hats,” it
produced 10 images that came in different styles.
As the name suggests this is the second iteration of the
system and has been advanced to generate
more realistic and accurate images with 4x greater resolution.
DALL·E 2 can create original, realistic images and art from
a text description. It can combine concepts, attributes, and styles; make
realistic edits to existing images from a natural language caption and it can
add and remove elements while taking shadows, reflections, and textures
into account.
“It’s staggering that an algorithm can do this,” reflects
Hertzmann. “Not all of the images will look pleasing to the eye, nor do they
necessarily reflect what you had in mind. But, even with the need to sift
through many outputs or try different text prompts, there’s no other existing
way to pump out so many great results so quickly – not even by hiring an
artist. And, sometimes, the unexpected results are the best.”
How? As explained on OpenAI website https://openai.com/dall-e-2/ DALL·E 2
has learned the relationship between images and the text used to describe them.
It uses a process called ‘diffusion,’(explained in technical detail here: https://ai.googleblog.com/2021/07/high-fidelity-image-generation-using.html)
which starts with a pattern of random dots and gradually alters that pattern
towards an image when it recognizes specific aspects of that image.
Although there is a debate to be had about whether what the
AI produces is art, that almost seems beside the point when DALL-E 2 seems to
automate so much of the creative process itself.
It can already create realistic images in seconds – and this
is a tool that will find a ready use in production. You could imagine its use
for rapidly putting together storyboards, or as imagery to sell a pitch where
you can quickly visualize characters or locations and just as quickly iterate
them.
Ben Thompson suggests
DALL-E,
the Metaverse, and Zero Marginal Content – Stratechery by Ben Thompson how
DALL-E could be used to create extremely cheap environments and objects in
the metaverse.
It’s the potential of such a tool to help a creative artist
brainstorm and evolve ideas which is exciting. “When I have something very
specific I want to make, DALL-E 2 often can’t do it. The results would require
a lot of difficult manual editing afterward. It’s when my goals are vague that
the process is most delightful, offering up surprises that lead to new ideas
that themselves lead to more ideas and so on.”
The term for this is prompting.
“I would argue that the art, in using a system like DALL-E
2, comes not just from the final text prompt, but in the entire creative
process that led to that prompt,” says Hertzmann. “Different artists will
follow different processes and end up with different results that reflect their
own approaches, skills and obsessions.
Some artists, like Ryan Murdoch, have advocated for
prompt-based image-making to be recognized as art.
Johnny Johnson, who teaches immersive production at the UK’s
National Film and TV School’s (NFTS) StoryFutures Academy thinks future
versions of AI tech like DALL-E 2 will be capable of making entire feature
films with AI-generated scripts and AI generated audio performances alongside
the images.
DALL-E 2 will change the industry from production design and
concept art right across the board,” he tells NAB Amplify. “New jobs will be created such as Prompt
Engineer, who writes the prompt into the AI to generate very specific outputs.”
Naturally, there are alarm bells. NoFilmSchool is headlining
its article Will Filmmakers Be Needed in the Future?
“If DALL-E 2’s technology is truly as groundbreaking and
revolutionary as advertised, either as it is now or in a future version, who’s
to say that clients are going to need the help of filmmakers or video
professionals in the future at all?”
It continues, “The same could potentially be even more true
for graphic designers, 3D animators, and digital artists of any ilk.”
But as Newton observes, DALL-E is hardly sentient. “It seems
wrong to describe any of this as “creative” — what we’re looking at here are
nothing more than probabilistic guesses” — “even if they have the [emotional]
same effect that looking at something truly creative would.”
In that sense, AI can also help maintain the creative spark
that comes with happy accidents. .
As Hertzmann explains, “When I have something very specific
I want to make, DALL-E 2 often can’t do it. The results would require a lot of
difficult manual editing afterward. It’s when my goals are vague that the
process is most delightful, offering up surprises that lead to new ideas that
themselves lead to more ideas and so on.”
No deepfakes here
Perhaps stung by accusations of bias in its language model
GPT-2, OpenAI (which was founded with in 2015 by investors including Elon Musk)
is at pains to “develop and deploy AI responsibly”.
Part of this effort is in opening up DALL·E to select users
in order to stress-test its limitations and capabilities and in limiting the
AI’s ability to generate violent, hate, or adult images.
It explains, “By removing the most explicit content from the
training data, we minimized DALL·E 2’s exposure to these concepts. We also used
advanced techniques to prevent photorealistic generations of real individuals’
faces, including those of public figures.”
For example, type in the keyword ‘shooting’ and this will be
blocked, finds Newton. “You’re also not allowed to use it to create images
intended to deceive — no deepfakes allowed. And while there’s no prohibition
against trying to make images based on public figures, you can’t upload photos
of people without their permission, and the technology seems to slightly blur
most faces to make it clear that the images have been manipulated.”
OpenAI hasn’t yet made any decisions about whether and how
DALL-E might someday become available more generally. But it’s not the only
text-to-image system advancing this field.
Google has a similar project called Imagen while HuggingFace has released its own text to image engine, called DALL-E mini. This is not to be confused with the
original and is no relation. HuggingFace might expect a cease and desist letter
in the post since not only does it use a similar name but the engine doesn’t
appear to be anywhere near as good as OpenAIs.
Looking to generate images of actor Channing Tatum, the AI
came back with a set of images that Francis Bacon would be proud of: https://hyperallergic.com/740141/an-ai-image-generator-is-going-viral-with-horrific-results/
Nonetheless, this technology is coming and will be in use in
production faster than you think.
No comments:
Post a Comment