NAB
Breakthroughs in text-to-image and language
modeling technology such as DALL-E 2 have astonished us this year. OpenAI lead researcher Mark Chen speaks to The Atlantic’s Ross
Andersen at the Progress Summit 2022, and says that while AI democratizes art
for all, artists are producing the better final product.
article here
Chen describes the process of
training the tool on several hundred million images, a combination of licensed
and publicly available media, which — importantly — have text (metadata)
descriptions so that the AI associates word prompts with the images.
DALL-E 2 knows what individual
objects are “and is able to combine things in ways that it hasn’t seen in the
training set before,” says Chen. “That’s part of the magic of AI, that you can
kind of generalize beyond what you trained it on.”
There’s an art to training neural
networks, too, he implies. “You want to make these them big enough so they’re
basically have enough base intelligence to be able to compose all of these
elements together.”
If there’s an art to scaling these
big models, there’s also an art to writing prompts. Evolving from
single-sentence descriptions, creators are now attaching concepts like the mood
they want or very specific details or textures. Prompts can now run for several
paragraphs.
“I think it’s really about
personalization… all these adjectives that you’re adding [into a prompt] helps
you personalize the output to what you want. It makes sense that prompts have
grown in length and in specificity. It’s a tool to help people create the
content that they want for themselves.”
Addressing the controversial issue
surrounding whether artist’s should be recognized, or paid, when their work is
used to inspire an AI artwork, Chen defends OpenGI’s approach, saying the
organization works closely with the art community.
“Our goal isn’t to stiff artists or
anything like that. Throughout the whole release process we want to be very
conscientious and work with the artists and have them provide feedback.”
However, Chen also suggests that
artists who use generative tools will still be able to rise above the crowd and
make money because their innate talent means that they are better at using
them. DALL-E 2, in other words, is — like a paintbrush or a video camera — a
tool.
“With DALL-E we found that artists
are better at using these tools than the general population. We’ve seen some of
the best artwork coming out of these systems basically produced by artists,”
Chen says.
“With AI you always worry about job
loss and displacement and we don’t want to ignore these possibilities but we do
think it’s a tool,” he continues.
“You know, there are smartphone
cameras but it really hasn’t replaced photographers. [Instead] it allows people
to make the images they want.”
Chen then turns to Chat GPT-3,
OpenGI’s AI algorithm that turns text prompts into whole written articles, or
scripts, or poems.
One idea would be to combine GPT-3
with DALL-E 2 “so maybe you have a conversational kind of interface for
generating images,” says Chen.
Artist Don Allen Stevenson joins the
presentation at the 16-minute mark and runs through some of the ways AI tools
can be used, essentially as a way to boost the ideation process. He says there
are entire departments in animation that can benefit from using AI such as
creating background characters, composing scenes, concept art, environment
design, and reference modelling. Out Painting, a technique used in DALL-E 2,
can extend and scale an image automatically in ways the artist may not have
imagined.
He also explains how you can use Chat
GPT-3 to generate better prompts. There are examples, too, of how these
techniques can produce, rapidly, the virtual worlds which will populate the
metaverse.
No comments:
Post a Comment