IBC
Volumetric video holds the potential to re-invent the live-streaming experience, with UK tech outfit Condense emerging as a pioneer.
article here
Channel 4’s launch of an app for the Apple Vision Pro is the
latest indication that broadcasters see a future in streaming augmented and
virtual reality experiences. However, like many such apps C4’s
Taskmaster hardly scratches the surface of spatial computing’s potential.
The viewer may be in a virtual world but the show itself is the same 2D passive
broadcast experience.
As displays such as Vision Pro and Meta’s pending Orion AR
glasses dial up the comfort and fidelity of content in virtual worlds content
experiences will be three dimensional and if they are live the audience will be
able to interact with the show’s talent.
The key to this is to generate and transmit not just three
dimensional video but video that is volumetric. Calling it holographic conjures
up Princess Leia from 1977’s Star Wars but volumetric is more or less the same
thing and the race is on to crack the code to deliver content at scale.
“Shared live experiences in social 3D spaces is the future,”
says Nick Fellingham, CEO and co-founder at Bristol-based tech developer
Condense. “We want to take reality and condense it down and put it in front of
you. That's going to be a really compelling.”
Founded in 2019, Condense was part of a consortium including
BT Sport which won the IBC 2022 Innovation Award in the Content Everywhere
category. It raised $4.5 million (£3.7m) in seed
funding led by Deeptech
Labs and has a share of a £1.2m government
backed fund to refine and develop volumetric live-streaming technology. Earlier
this year BBC Ventures bought a £500,000 stake in the company. It is the BBC
investment arm’s first investment since it was set up two years ago.
“By partnering, we can rapidly explore new ways to engage
younger audiences who don't regularly come to the BBC,” explained Jeremy
Walker, Head of Ventures, BBC. “We can pioneer and shape the next
evolution of content creation, and by investing in this Bristol-based startup,
the BBC is backing British creativity, innovation, and technology.”
The Condense team have found a way of streaming live
volumetric content for playback within interactive 3D engines like Unity or
Unreal for viewing on AR/VR/XR headsets.
“Volumetric content means video where the user can choose
the viewpoint, they can move around and watch from any angle,” Fellingham says.
“We're really focused on handing the control of where you view the content from
over to the user. We're the only ones streaming live events in this way.”
There are around 600 million active
users of virtual environments who primarily spend time inside applications like
Fortnite Roblox, Rec.Room and VRChat. As more and more people gravitate towards
interactive 3D environments the market is building for content.
“The problem is that it's difficult for producers to get high
quality video content into those places,” says Fellingham. “The data needs
optimising.”
Condense has built an end-to-end system from capture to
transmission that trumps rival solutions using proprietary compression
algorithms.
“We believe neural representations are a step-change in the
fields of computer vision and graphics. Unlike traditional methods that rely on
explicit geometric models and high-complexity data formats, neural
representations use implicit neural networks to encode 3D models and 2D videos,”
says the company’s chief scientific officer and co-founder Ollie Feroze.
“This new approach offers significant advantages, including
more efficient data representations, enhanced visual quality, superior
scalability, and reduced computational complexity.”
The Condense system is portable and doesn’t require a
dedicated studio. Its rig uses just ten cameras while other systems require
dozens or even hundreds.
The stream is compressed realtime at variable bit rate according
to the capability of the device and the end user’s bandwidth. A bespoke plugin
that sits inside the interactive 3D engines plays the content back. Its
architecture is a hybrid of cloud and local compute power.
“The combination of these things mean that content creators
now can actually start producing content,” says Fellingham.
Its focus is on live streaming for sports and music and also
has an eye on corporate conferences, education and more.
“Live is where we think there's a lot of value,” he says. “Multiplayer
games are real-time applications and so real-time content fits into them well. It
also makes the production process simpler. If you can get volumetric content
out in real time, you don't have to wait for it to process overnight.”
Fellingham claims, “Our algorithms are extremely fast and
among the most optimised in the world which is how we can do this in real
time.”
An algorithm called Volumetric Fusion stiches - or fuses -
the video from the multiple cameras. “It also fuses data over time,” he says.
“We build up a probabilistic model of the surface of an object and then we
morph it over time by adding more data from the cameras.”
Its rig is fitted with ten Time-of-flight cameras which provide
a depth value of each pixel, from which the 3D structure of the scene can be
estimated. Each camera has a 4K RGB sensor but Fellingham takes issue with the
concept of measuring output by resolution.
“When you're talking about 4K, 8K or 16K you're talking
about a pixel count in a [conventional rectangular 2D] frame but volumetric pixels
sit in 3D space. You can move closer to them and the resolution will decrease
or you can move further away from them and the resolution will increase because
the viewer has perspective.
“It also depends on surface area. We can measure how many
pixels are mapped on average from the cameras to a centimetre of surface area
on the model. But there aren’t many good standard metrics commonly used across
all providers.”
Condense uses another of its own metrics called ‘structural
similarity’ for measuring how close its volumetric rendering is to the original
camera viewpoint.
“We can add extra cameras into the rig and then measure how
good a job we're doing of estimating novel viewpoints but this hasn't yet caught
on across the industry.”
Where there’s occlusion, as will happen when there is more
than one artist, the system applies predictive techniques to fill in the blanks
using data from previous frames.
The rig circling a performer measures three meters by three
meters by two meters. This is good enough to capture details of a performer’s
nose and eyes in a live stream. The size of the volume and the whole capability
of the system will increase along with compute power.
“The size of the volume is tied to GPU memory and compute
and to bandwidth as well as how well we can optimise our algorithms,” he says.
A screen in front of the performer shows them the inside of
the virtual venue to encourage interaction with the virtual audience.
“We often find that when a performer comes in they aren’t really prepared for
what to expect. The performer is peering out into a sea of avatars. But within
about five minutes they are focused in on the virtual crowd and interacting
with them. As soon as they make a ‘shout out’ to the crowd and the crowd does
what they say it totally changes how it feels in that space. There's something special because it is your
avatar and the performer both sharing the same 3D space.
“This is critical and what's been missing from so many of
these virtual events that you see inside online gaming platforms to date. I would
call them ‘showings’ rather than concerts. They are replays that feel clinical.
If you don't feel like you're sharing the moment then you don't feel like
you're connecting and that's what's different about live. You feel like you're
part of the moment.”
BBC partnership
Its partnership with the BBC has been going ten months with
an initial focus on music and generating content for social consumption.
Condense has built a platform for avatar-based music
consumption which is white labelled to the BBC Radio 1 as The New Music Portal.
A Condense array is set up at Maida Vale Studios where artists including Gardna,
Charlotte Plank, Sam Tompkins and Confidence Man have already performed live
streamed gigs
“The ambition is to be running these events much more
regularly,” says Fellingham. “I think we've got a bi-weekly cadence now.”
The company has also signed a deal with a major record label
with an announcement pending.
Condense previously worked with BT Sport to explore how
combat sports could be captured and live streamed volumetrically. One use case
was to stream a boxing match from Wembley for users to view via AR glasses in
their living room.
“We learned that there are some operational complexities in
doing this kind of capture. You need to work with the existing camera crew to
set this kind of tech up in a stadium and with a crowd.
“I am still convinced that boxing and UFC are prime
candidates for upgrading the content experience with this volumetric video.”
Having built partnerships with content companies to prove
the value of volumetric the next step is to scale the business and the
technology.
“It’s difficult with a nascent tech since you're having to
educate the market as you build. Getting solid content partnerships is a first
step to the scale. We believe that volumetric video is better than conventional
video. Users feel closer to the content. You feel like you're more involved. It
gives you a sense of presence. People talk a lot about VR but you really feel presence
when you're consuming volumetric video. It doesn't matter how you if it's
inside a game, or inside a headset, it makes you feel closer.”
He adds, “It is inevitability that this kind of video will
become more and more prevalent.”
Fellingham studied physics and taught himself AI. In 2012 he
joined startup SecondSync which was acquired by Twitter in 2014. SecondSync CEO
Andy Littledale and CTO Dan Fairs joining Fellingham in setting up Condense, as
director of operations and CTO respectively. Feroze, the fourth cofounder, has
expertise in applied machine learning, AI, and a PhD in computer vision.
No comments:
Post a Comment