Wednesday, 29 January 2025

Live volumetric streaming: “Volumetric video is better than conventional video”

IBC 

Volumetric video holds the potential to re-invent the live-streaming experience, with UK tech outfit Condense emerging as a pioneer.

article here  

Channel 4’s launch of an app for the Apple Vision Pro is the latest indication that broadcasters see a future in streaming augmented and virtual reality experiences. However, like many such apps C4’s Taskmaster hardly scratches the surface of spatial computing’s potential. The viewer may be in a virtual world but the show itself is the same 2D passive broadcast experience.

As displays such as Vision Pro and Meta’s pending Orion AR glasses dial up the comfort and fidelity of content in virtual worlds content experiences will be three dimensional and if they are live the audience will be able to interact with the show’s talent.

The key to this is to generate and transmit not just three dimensional video but video that is volumetric. Calling it holographic conjures up Princess Leia from 1977’s Star Wars but volumetric is more or less the same thing and the race is on to crack the code to deliver content at scale.

“Shared live experiences in social 3D spaces is the future,” says Nick Fellingham, CEO and co-founder at Bristol-based tech developer Condense. “We want to take reality and condense it down and put it in front of you. That's going to be a really compelling.”

Founded in 2019, Condense was part of a consortium including BT Sport which won the IBC 2022 Innovation Award in the Content Everywhere category.  It raised $4.5 million (£3.7m) in seed funding led by  Deeptech Labs and has a share of a £1.2m government backed fund to refine and develop volumetric live-streaming technology. Earlier this year BBC Ventures bought a £500,000 stake in the company. It is the BBC investment arm’s first investment since it was set up two years ago.

“By partnering, we can rapidly explore new ways to engage younger audiences who don't regularly come to the BBC,” explained Jeremy Walker, Head of Ventures, BBC. “We can pioneer and shape the next evolution of content creation, and by investing in this Bristol-based startup, the BBC is backing British creativity, innovation, and technology.”

The Condense team have found a way of streaming live volumetric content for playback within interactive 3D engines like Unity or Unreal for viewing on AR/VR/XR headsets.

“Volumetric content means video where the user can choose the viewpoint, they can move around and watch from any angle,” Fellingham says. “We're really focused on handing the control of where you view the content from over to the user. We're the only ones streaming live events in this way.”

There are around 600 million active users of virtual environments who primarily spend time inside applications like Fortnite Roblox, Rec.Room and VRChat. As more and more people gravitate towards interactive 3D environments the market is building for content.

“The problem is that it's difficult for producers to get high quality video content into those places,” says Fellingham. “The data needs optimising.”

Condense has built an end-to-end system from capture to transmission that trumps rival solutions using proprietary compression algorithms.

“We believe neural representations are a step-change in the fields of computer vision and graphics. Unlike traditional methods that rely on explicit geometric models and high-complexity data formats, neural representations use implicit neural networks to encode 3D models and 2D videos,” says the company’s chief scientific officer and co-founder Ollie Feroze.

“This new approach offers significant advantages, including more efficient data representations, enhanced visual quality, superior scalability, and reduced computational complexity.” 

The Condense system is portable and doesn’t require a dedicated studio. Its rig uses just ten cameras while other systems require dozens or even hundreds.

The stream is compressed realtime at variable bit rate according to the capability of the device and the end user’s bandwidth. A bespoke plugin that sits inside the interactive 3D engines plays the content back. Its architecture is a hybrid of cloud and local compute power.  

“The combination of these things mean that content creators now can actually start producing content,” says Fellingham.

Its focus is on live streaming for sports and music and also has an eye on corporate conferences, education and more.

“Live is where we think there's a lot of value,” he says. “Multiplayer games are real-time applications and so real-time content fits into them well. It also makes the production process simpler. If you can get volumetric content out in real time, you don't have to wait for it to process overnight.”

Fellingham claims, “Our algorithms are extremely fast and among the most optimised in the world which is how we can do this in real time.”

An algorithm called Volumetric Fusion stiches - or fuses - the video from the multiple cameras. “It also fuses data over time,” he says. “We build up a probabilistic model of the surface of an object and then we morph it over time by adding more data from the cameras.”

Its rig is fitted with ten Time-of-flight cameras which provide a depth value of each pixel, from which the 3D structure of the scene can be estimated. Each camera has a 4K RGB sensor but Fellingham takes issue with the concept of measuring output by resolution.

“When you're talking about 4K, 8K or 16K you're talking about a pixel count in a [conventional rectangular 2D] frame but volumetric pixels sit in 3D space. You can move closer to them and the resolution will decrease or you can move further away from them and the resolution will increase because the viewer has perspective.

“It also depends on surface area. We can measure how many pixels are mapped on average from the cameras to a centimetre of surface area on the model. But there aren’t many good standard metrics commonly used across all providers.”

Condense uses another of its own metrics called ‘structural similarity’ for measuring how close its volumetric rendering is to the original camera viewpoint.

“We can add extra cameras into the rig and then measure how good a job we're doing of estimating novel viewpoints but this hasn't yet caught on across the industry.”

Where there’s occlusion, as will happen when there is more than one artist, the system applies predictive techniques to fill in the blanks using data from previous frames.

The rig circling a performer measures three meters by three meters by two meters. This is good enough to capture details of a performer’s nose and eyes in a live stream. The size of the volume and the whole capability of the system will increase along with compute power.

“The size of the volume is tied to GPU memory and compute and to bandwidth as well as how well we can optimise our algorithms,” he says.

A screen in front of the performer shows them the inside of the virtual venue to encourage interaction with the virtual audience.

“We often find that when a performer comes in they aren’t really prepared for what to expect. The performer is peering out into a sea of avatars. But within about five minutes they are focused in on the virtual crowd and interacting with them. As soon as they make a ‘shout out’ to the crowd and the crowd does what they say it totally changes how it feels in that space.  There's something special because it is your avatar and the performer both sharing the same 3D space.

“This is critical and what's been missing from so many of these virtual events that you see inside online gaming platforms to date. I would call them ‘showings’ rather than concerts. They are replays that feel clinical. If you don't feel like you're sharing the moment then you don't feel like you're connecting and that's what's different about live. You feel like you're part of the moment.”


BBC partnership

Its partnership with the BBC has been going ten months with an initial focus on music and generating content for social consumption.

Condense has built a platform for avatar-based music consumption which is white labelled to the BBC Radio 1 as The New Music Portal. A Condense array is set up at Maida Vale Studios where artists including Gardna, Charlotte Plank, Sam Tompkins and Confidence Man have already performed live streamed gigs

“The ambition is to be running these events much more regularly,” says Fellingham. “I think we've got a bi-weekly cadence now.”

The company has also signed a deal with a major record label with an announcement pending.

Condense previously worked with BT Sport to explore how combat sports could be captured and live streamed volumetrically. One use case was to stream a boxing match from Wembley for users to view via AR glasses in their living room.

“We learned that there are some operational complexities in doing this kind of capture. You need to work with the existing camera crew to set this kind of tech up in a stadium and with a crowd.

“I am still convinced that boxing and UFC are prime candidates for upgrading the content experience with this volumetric video.”

Having built partnerships with content companies to prove the value of volumetric the next step is to scale the business and the technology.

“It’s difficult with a nascent tech since you're having to educate the market as you build. Getting solid content partnerships is a first step to the scale. We believe that volumetric video is better than conventional video. Users feel closer to the content. You feel like you're more involved. It gives you a sense of presence. People talk a lot about VR but you really feel presence when you're consuming volumetric video. It doesn't matter how you if it's inside a game, or inside a headset, it makes you feel closer.”

He adds, “It is inevitability that this kind of video will become more and more prevalent.”

Fellingham studied physics and taught himself AI. In 2012 he joined startup SecondSync which was acquired by Twitter in 2014. SecondSync CEO Andy Littledale and CTO Dan Fairs joining Fellingham in setting up Condense, as director of operations and CTO respectively. Feroze, the fourth cofounder, has expertise in applied machine learning, AI, and a PhD in computer vision.


 

 

No comments:

Post a Comment