CES 2026: “The ChatGPT moment for physical AI is nearly here”

IBC

Passive language based text to video models are so last year. From the enterprise to the home to the creative suite, the future is multi-modal AI capable of massive three dimensional world building and physical interaction.

article here

“What if AI could step out of the screen and start working for us in real life?” posed Lyu Jae-cheol, CEO of tech giant LG at the Consumer Electronics Show in Las Vegas where it was clear that ‘physical AI’ is already here.

Multiple exhibits and executive speeches at the massive tech jamboree showed that the AI industry is turning its attention from large language models to ‘world models’ which are not only key to making robots everyday companions but also to the next phase of filmmaking and game development.

“[Humans] are incredible animals with profound capabilities that use our own spatial intelligence to connect perception with action,” said Amit Jain, CEO, Luma AI. “What excites me is that there's now a new wave of GenAI technology that we can finally give machines something closer to the human level spatially.”

AI pioneer and founder of World Labs’ Fei-Fei Li said, “We're moving from systems that understand words and images passively to systems that not only understand but can help us to interact with the world.”

AI and entertainment

When it comes to AI and entertainment, the overall theme painted at CES was of a hybrid future where content creators work with AI. Panelists on the session ‘AI & Cinematic Creativity: Approaching Its Citizen Kane Moment’ agreed that not only had AI tools evolved in sophistication and speed but that bespoke pipelines were also necessary to enable artists to get the best out of the technology.

Katya Alexander, head of production at Showrunner AI said, “I wish we could stop talking about the tools. Just judge the work. Is it good or not?”

Jason Zada, founder of the AI-native entertainment studio Secret Level, said, “AI needs middleware to be viable for Hollywood feature film production. Criticising AI slop, Zada added, “It’s easier than ever to get 90% of the way there [so] when I see work like that, I wonder why they didn’t bother to take it the extra 10% that would have made it great.”

World Labs and Luma AI showed how this capability was already being built into the latest version of their software.

“Today, video and image AI models are used to generate pixels to produce pretty pictures,” said Luma’s Jain. “What is needed are more intelligent models that combine audio, video, language, image together. At Luma, we are training these systems that simulate physics causality to finally render out the results in audio, video, image, text, whatever is appropriate for the information that you're trying to work with.”

Luma’s latest model Ray3 is claimed as the worst first capable of generating video in 4K and HDR. It is also the world's first reasoning video model – meaning it is able to think and decide whether what it's about to generate is good.

Instead of asking users to come up with better prompts to get their desired result Luma has added a tool called Ray 3 Modify, which allows users enhanced ability edit AI generated video.

“It can take any footage from real cameras or from AI generation and change it as little or as much as you want to realise the creative goals. It's a powerful system for customers in games, entertainment and advertising which allows us to enable a new era of hybrid human-AI productions. In practice it means filmmakers and creators can now create entire cinematic universes without elaborate [real] sets and then edit and modify anything to get to the result they want.”

He said Luma is building multi-modal agentic models which will provide the ability to analyse multiple frames of long form video, edit any element, and maintaining the fidelity of a character's scenes and story – something which has not yet been possible.

“You're seeing human and AI collaborate in designing characters, environment shots, and the whole world. We have been using it heavily internally and we couldn't be more excited. Soon, individual creators or small teams will have the power of doing what entire Hollywood Studios do.”

Fei-Fei Li demonstrated a photoreal Hobbit world created by World Labs’ multi-modal AI tool Marble. “Traditionally, building 3D scenes for video games requires laser scanners or calibrated cameras or hand built models using pretty sophisticated and complicated software. World Labs is creating a new generation of models that use advanced GenAI technology to learn not just flat pixel structures but 3D and 4D structures of the world. It draws this directly from data to provide rich, consistent, permanent, navigable 3D worlds.”

Remarkably, she said this could even be achieved by inputting a handful of 2D images. “Marble can reconstruct an environment and create three dimensional cohesive worlds that flow together and scales into something much larger. This is much closer to how humans piece together a place from a few glances.”

She warned, “A lot of workflows and things that were difficult to do will go through a revolution because of the incredible technology. Creators can now shape what's in their mind's eye in realtime, experimenting with the space, the light, and movement as if they are sketching inside a living world.”

Intelligent Transformation

The Consumer Technology Association (CTA) which runs CES termed this mega trend ‘Intelligent Transformation’. “Artificial intelligence is becoming foundational across devices, platforms, and services, enabling smarter systems and more personalised consumer experiences,” said CTA Executive Chair and CEO Gary Shapiro.

The goal of world models is for AI to understand the physics of the real world including things like gravity, causality, motion and object permanence that humans take for granted – or as Nvidia CEO Jensen Huang put it, “the complete unknowns...of the common sense of the physical world.”

Nvidia is training its physical AI systems on synthetic data such as for automated car driving within digital twins of cities.

“The ChatGPT moment for physical AI is nearly here,” Huang said during his keynote. This sentiment was echoed by Greg Brockman, co-founder and president of OpenAI, who explained on stage “We're moving from asking a simple question in a text box and getting a very contained answer to agentic AI models that people can use for very personalized ways. In the enterprise we’re already starting to bring in models like codecs to be able to transform software engineering. This year we're going to see enterprise agents really take off.”

The CTA sees this trend expressed not just in software but across infrastructure. Autonomous cars and robots are the obvious manifestation of this shift. At CES, many were presented as commercial ready rather than experimental prototype.

“AI-driven autonomy is now moving into real-world environments, supported by advances in sensing, compute and scenario simulation,” noted Brian Comiskey of the CTA.

LG unveiled CLOiD, described as an “agent appliance” for the home. It features two arms and five-fingered hands and is capable of learning the home environment and continuously optimising it.

Italian developer Generative Bionics debuted GENE.01, a humanoid robot scheduled for sale later this year. “Our goal is to design a humanoid that is as efficient as it is intelligent, inspired by humans where intelligence lives not only in the brain but also in the body,” explained CEO Daniele Pucci during the keynote by chip maker AMD (AMD is an investor in his company). The platform is designed to achieve human level intelligence - safe, physical, human-like interaction engineered into real products. Pucci explained how development was inspired by biomechanics.
“Human movements rely on fast reflexes and our nervous system basically exploits our biomechanics. These are the same principles we built into our robots. Since we also learn through touch so our robots need the sense of touch.”
China’s Hesai Technology announced it will integrate its lidar solutions into the AI-driven motion-capture systems of South Korea’s MOVIN. This would, they claim, create the world's first motion-AI built on lidar, without the need for a studio, markers or any cameras, instead relying on laser pulses and an AI engine to deliver “highly reliable” full-body tracking.

“Motion capture is no longer just a tool for animation or VFX,” said Byeoli Choi, CEO of MOVIN. “With lidar as the foundation, we see motion data becoming core infrastructure for physical AI — from live performance to humanoid robotics.”

Regular consumer devices also reflect this transition. Laptops, televisions and wearables are being positioned less as standalone products and more as intelligent platforms that adapt to users over time.

Samsung’s mammoth 130-inch Micro RGB display launched at the show comes with a Vision AI Companion (VAC) intended to elevate the viewing experience with watch guides and mood (ambient room) related features.

Rival Google used CES to announce TV’s equipped with its Google TV software will now come with Gemini AI embedded. Principally this means voice-control to find programming or adjust setting on the TV as well as making photos and videos stored in Google's photo database editable via voice prompts from the TV.

“When we talk about CES five to 20 years now, we’re going to see an even larger range of humanoid robots,” said Shapiro. “AI is the future of creativity.”

Smart glasses get wearable

Smart glasses have euphemistically been called wearable for years but the latest range demonstrates that comfort does not need to be ceded for functionality. The hardware is getting smaller and lighter, driving a 247.5% worldwide growth in 2025 according to IDC, as expanding product availability, use cases and demographics shift the market from gamers to a more mainstream audience.

For instance, the Full-Feature Colorful AR+AI Glasses, from China’s Mojie weigh only 38g, which for a brief while made them the world's lightest AR glasses. Even Realities’ G2 smart glasses are lighter, at 36g. There are no cameras or external speakers; instead gesture control of the display is via a smart ring worn on the fingers. The Hong Kong-based outfit claims its interface can deliver real-time translation and also deliver a teleprompt in front of your eyes which is a use case that CEOs and other public speakers find of benefit.

Both of those were honoured with a CES 2026 Innovation Award, along with an optical technology 12 years in development at Belgium firm Swave. Its Holographic eXtended Reality (HXR) “sculpts and steers light to form high-resolution 3D images” and claims to enable the world’s first true holographic display. HXR also produces “the world’s smallest pixel” with a pixel pitch of less than 300nm intended to power spatial computing for smart glasses and other XR wearables.

Gamers aren’t being left out. The ROG Xreal R1 AR glasses, from ASUS and Xreal, tout the ‘world’s first’ micro-OLED display with a 240Hz refresh rate and immersive audio by Bose . The glasses also come with a Control Dock for connecting to PCs and consoles and can be switched into opaque mode for the wearer to enjoy a 171-inch virtual screen.

Samsung's latest Odyssey gaming monitor has a 32-inch 6K screen with glasses-free 3D.

Along with the high resolution, it offers a 165Hz refresh rate (which can be boosted to 330Hz). Another Samsung model, the 27-inch Odyssey G6, is the world's first 1,040Hz monitor "to help players track targets and see fine details during high-speed movement," Samsung claims.

Samsung is developing a pair of smart glasses using Google’s Android XR system which is considered the best bet to rival market leader Meta. It is due for launch this year but was not debuted at CES.

Muscle memory

Speaking of Meta, it has been combining its Meta Ray-Ban Displays with an EMG wristband that translates signals created by your muscles into commands for the glasses — an advancement that has major benefits for people with disabilities.

At CES, it removed the glasses altogether and demonstrated how the neural band could be used to control in-car entertainment. Teamed with Garmin, the demo also showed how familiar swipe and pinch gestures could control car functions like windows.

An intriguing EMG Handwriting tool, in pilot, allows users to compose messages to be sent on WhatsApp or Messenger by using their fingers to form words on any surface. Meta Platforms explained that while wearing the Neural Band, users could have these “movements transcribed into digital messages”. Does this spell the end of the laptop and Microsoft Word?

Tariff impact

The CTA warned that the impact of economic uncertainty as a result of US tariffs “is becoming more visible as companies move through pre-tariff inventories and face tougher cost decisions heading into 2026.”

Nonetheless the immediate forecast is for steady growth with the industry projected to make $565 billion in revenue this year.

The CTA noted that the cost of tariff is falling unevenly across the industry, with smaller companies more likely to face margin pressure or supply chain disruptions.

Environmental impact

AI was everywhere in stark contrast to conversations or technologies that reduce its all-consuming impact on the earth.

“I would love to have a GPU running in the background for every single person in the world because I think it can deliver value for them, but that's billions of GPUs,” said OpenAI’s Brockman. “No one has a plan to build that kind of scale …”

Energy was presented by the CTA as a unifying challenge. Electrification, smart grids and AI-enabled monitoring are central to managing rising power demand from digital infrastructure and AI workloads, it said. Alongside established renewable technologies, the CTA pointed to hydrogen, next-generation nuclear and early-stage fusion concepts as areas of active exploration at CES 2026.

Adrian Pennington - The Write Stuff

Thursday, 8 January 2026

CES 2026: “The ChatGPT moment for physical AI is nearly here”

No comments:

Post a Comment