Friday, 6 January 2023

Next-Gen (Generated) Creativity: The AI Imagery and Text Tool Combo

NAB

Breakthroughs in text-to-image and language modeling technology such as DALL-E 2 have astonished us this year. OpenAI lead researcher Mark Chen speaks to The Atlantic’s Ross Andersen at the Progress Summit 2022, and says that while AI democratizes art for all, artists are producing the better final product.

article here

Chen describes the process of training the tool on several hundred million images, a combination of licensed and publicly available media, which — importantly — have text (metadata) descriptions so that the AI associates word prompts with the images.

DALL-E 2 knows what individual objects are “and is able to combine things in ways that it hasn’t seen in the training set before,” says Chen. “That’s part of the magic of AI, that you can kind of generalize beyond what you trained it on.”

There’s an art to training neural networks, too, he implies. “You want to make these them big enough so they’re basically have enough base intelligence to be able to compose all of these elements together.”

If there’s an art to scaling these big models, there’s also an art to writing prompts. Evolving from single-sentence descriptions, creators are now attaching concepts like the mood they want or very specific details or textures. Prompts can now run for several paragraphs.

“I think it’s really about personalization… all these adjectives that you’re adding [into a prompt] helps you personalize the output to what you want. It makes sense that prompts have grown in length and in specificity. It’s a tool to help people create the content that they want for themselves.”

Addressing the controversial issue surrounding whether artist’s should be recognized, or paid, when their work is used to inspire an AI artwork, Chen defends OpenGI’s approach, saying the organization works closely with the art community.

“Our goal isn’t to stiff artists or anything like that. Throughout the whole release process we want to be very conscientious and work with the artists and have them provide feedback.”

However, Chen also suggests that artists who use generative tools will still be able to rise above the crowd and make money because their innate talent means that they are better at using them. DALL-E 2, in other words, is — like a paintbrush or a video camera — a tool.

“With DALL-E we found that artists are better at using these tools than the general population. We’ve seen some of the best artwork coming out of these systems basically produced by artists,” Chen says.

“With AI you always worry about job loss and displacement and we don’t want to ignore these possibilities but we do think it’s a tool,” he continues.

“You know, there are smartphone cameras but it really hasn’t replaced photographers. [Instead] it allows people to make the images they want.”

Chen then turns to Chat GPT-3, OpenGI’s AI algorithm that turns text prompts into whole written articles, or scripts, or poems.

One idea would be to combine GPT-3 with DALL-E 2 “so maybe you have a conversational kind of interface for generating images,” says Chen.

Artist Don Allen Stevenson joins the presentation at the 16-minute mark and runs through some of the ways AI tools can be used, essentially as a way to boost the ideation process. He says there are entire departments in animation that can benefit from using AI such as creating background characters, composing scenes, concept art, environment design, and reference modelling. Out Painting, a technique used in DALL-E 2, can extend and scale an image automatically in ways the artist may not have imagined.

He also explains how you can use Chat GPT-3 to generate better prompts. There are examples, too, of how these techniques can produce, rapidly, the virtual worlds which will populate the metaverse.

 


No comments:

Post a Comment