Advanced Multimodal AI is Busting Out of the Lab and Into Your Life

Wednesday, 28 September 2022

Advanced Multimodal AI is Busting Out of the Lab and Into Your Life

NAB

Apple announced the iPhone in 2007. Now, we can no longer fathom a world without a smartphone in our pockets. The same happened with social media. Facebook and TikTok govern our virtual relationships and how we are informed about news. We’re on the verge of a third technology revolution, which will blend with and be fueled by the ubiquity of devices and algorithms.

article here

AI has “world-shaping potential,” Alberto Romero, who runs the newsletter The Algorithmic Bridge, writes in an excerpt posted on Medium.

It’s not any old AI that will impact us in ways we can only imagine. The large AI models and an integration of those technologies into the Internet of Things.

Romero lists in The Algorithmic Bridge the rise of various AI tools and focuses — particularly concentrating on the tremendous gains made in the field of large language models.

From 2012 to 2022, the AI field has evolved at an unprecedented rate of progress.

Today, generative large language models, together with multimodal and art models, dominate the landscape, tech giants, ambitious startups, and non-profit organizations aim to leverage their potential — either for private benefit or to democratize their promises.

These include OpenAI’s release of GPT-3 — arguably the best-known AI model of the decade — and Google’s own LaMDA, the AI that earlier this year was claimed to be sentient by former Google engineer Blake Lemoine.

Even this has been superseded at Google by PaLM, published in April. PaLM currently holds the title of the largest dense language model and has the highest performance across benchmarks. Romero believes it’s state-of-the-art in language AI.

However, the next major advance is already in training. This phase is focused on building AI tools that mimic our other senses, notably hearing and sight — but also human creativity.

OpenAI’s DALL-E 2 is the most well-known AI art model (also known as diffusion-based generative visual models). Others include Microsoft’s NUWA, Meta’s Make-A-Scene, Google’s Imagen and Midjourney, and Stable Diffusion.

“These models, some behind paid memberships and others free-to-use, are redefining the creative process and our understanding of what it means to be an artist,” Romero says.

But that’s no longer news. Throwing the evolution forward, Romero assumes that these AI models combining language, multimodal, and art-based features are going to become our next virtual assistants.

Advanced AI is going to be a “truly conversational Siri or Alexa,” and your next search engine will be a “intuitive and more natural Google Search or Bing.” and your next artistic tool “will be a more versatile and creative Photoshop.”

The large-scale AI models are emerging from the lab to find a home in consumer products.

“This shift from research to production will entail a third technological revolution this century,” Romero maintains. “It will complete a trinity formed by smartphones, social media, and large AI models, an interdependent mix of technologies that will have lasting effects on society and its individuals.”

How is it all going to redefine our relationship with technology and with one another?

We’ll find out sooner rather than later.

Adrian Pennington - The Write Stuff

Wednesday, 28 September 2022

Advanced Multimodal AI is Busting Out of the Lab and Into Your Life

No comments:

Post a Comment