Tuesday 8 November 2016

An Object Lesson in Personalized Streaming Video Experiences

Streaming Media 

What custom content does each viewer want to see? As broadcast and broadband converge, object-based media is showing the way to the future, and the BBC is taking the lead.

One of the powerful arguments for delivering object-based, as opposed to linear, media is the potential to have content adapt to the environment in which it is being shown. This has been standard practice on the web for years, but it is now being cautiously applied by broadcasters and other video publishers using standard internet languages to create and deliver new forms of interactive and personalised experiences as broadcast and broadband converge.
“The internet works by chopping things up, sending them over a network, and reassembling them based on audience preference or device context,” explains Jon Page, R&D head of operations at the BBC. “Object-based broadcasting (OBB) is the idea of making media work like the internet.”
Live broadcast content already comprises separate clean feeds of video, audio, and graphics before they are “baked in” to the MPEG/H.264/H.265 signal on transmission. OBB simply extracts the raw elements and delivers all the relevant assets separately along with instructions about how to render/publish them in context of the viewer’s physical surroundings, device capability, and personal needs.
The nearest parallel to what an object-based approach might mean for broadcasting can be found in video games. “In a video game, all the assets are object-based, and the decision about which assets to render for the viewer’s action or device occurs some 16 milliseconds before it appears,” says BBC research engineer Matthew Shotton. “The real-time nature of gaming at the point of consumption expresses what we are trying to achieve with OBB.”
MIT devotes a study group to object-based media, and its head and principal research scientist, V. Michael Bove (right), agrees that video games are an inherently object-based representation. “Provided the rendering capacity of the receiving device is known, this is proof that object-based media can be transmitted,” he says. The catch is that this only works provided the video is originated as an object.
The BBC’s R&D division is the acknowledged leader in OBB. Rather than keep its research a secret in its lab, the company is keen for others to explore and expand on its research.
“We want to build a community of practice, and the more people who engage in the research, the faster we can get some interesting experiences to be delivered,” says BBC research scientist Phil Stenton. “We are now engaged with web standards bodies to deliver OBB at scale.”

Back to Basics: What Is an Object?


In the BBC’s schema, an object is “some kind of media bound with some kind of metadata.” Object-based media can include a frame of video, a line from a script, or spoken dialogue. It can also be an infographic, a sound, a camera angle, or a look-up table used in grading (and which can be changed to reflect the content or to aid visual impairment). When built around story arcs, a “theme” can be conceived of as an object. Each object is automatically assigned an identifier and a time stamp as soon as it is captured or created.
Since making its first public demonstration of OBB during the 2014 Commonwealth Games, the BBC has conducted numerous spinoff experiments. These range from online video instructions for kids on how to create a 3D chicken out of cardboard to work with BBC News Labs to demonstrate how journalists can use "linked data" to build stories. It has created customised weather forecasts, a radio documentary constructed according to the listener's time requirements, and most recently a cooking programme, CAKE, which was the first project produced and delivered entirely using an object-based approach.
All these explorations are a means to an end. “They illustrate how we build an object-based experience and help us understand if it is technically feasible for distribution and delivery for ‘in the moment’ contextual rendering,” says Stenton. “The next step is to extract common tools and make them open for others to use.”
In particular, the BBC is wrestling with discerning which objects are domain-specific and which can be used across applications, how those common objects can be related to one another, and what standards are needed to make OBB scalable.
Most websites are able to accommodate and adapt to the wide variety of devices that may be used to view them with varying layouts, font sizes, and levels of UI complexity. The BBC also expects a sizeable portion of both craft and consumer applications of the future to be based on HTML, CSS, and JavaScript. However, the tremendous flexibility afforded by that web tech is also a disadvantage.
“Repeatability and consistency of approach among production teams is extremely difficult to maintain,” says BBC research engineer Max Leonard, “especially when combined with the sheer volume of possible avenues one can take when creating new object-based media compositions.”

Object-Based Compositions

The BBC’s OBB experiments have relied on HTML/CSS/JS but have taken different approaches to accessing, describing, and combining the media, making the content from one experience fundamentally incompatible with another.
“The only way we can practice an object-based approach to broadcasting in a sustainable and scalable way at the same level of quality expected of us in our linear programming is to create some sort of standard mechanism to describe these object-based compositions, including the sequences of media and the rendering pipelines that end up processing these sequences on the client devices,” says Leonard. “The crux of the problem, as with any standard, is finding the sweet spot between being well-defined enough to be useful, but free enough to allow for creative innovation.”
BBC R&D has a number of building blocks for this language. This includes an Optic Framework (Object-based Production Tools In the Cloud) which, to the end-user, will appear as web apps in a browser, but the video processing and data are kept server-side in the BBC’s Cosmos cloud.
The Optic Framework aims to deliver reusable data models to represent this metadata, so different production tools can use the same underlying data models but present differing views and interfaces on this based on the current needs of the end user.
Optic uses the JT-NM data model as its core, and each individual component within it uses NMOS standards to allow for the development of tools within an open and interoperable framework.
Inspired by the WebAudio API, the BBC has built an experimental HTML5/WebGL media processing and sequencing library for creating interactive and responsive videos on the web. VideoContext uses a graph-based rendering pipeline, with video sources, effects, and processing represented as software objects that can be connected, disconnected, created, and removed in real time during playback.
The core of the video processing in VideoContext is implemented as WebGL shaders written in GLSL. A range of common effects such as cross-fade, chroma keying, scale, flip, and crop is built in to the library. “There’s a straightforward JSON [JavaScript Object Notation] representation for effects that can be used to add your own custom ones,” explains Shotton. “It also provides a simple mechanism for mapping GLSL uniforms onto JavaScript object properties so they can be manipulated in real time in your JavaScript code.”
The library—available as an open source— works on newer builds of Chrome and Firefox on the desktop, and, with some issues, on Safari. “Due to several factors, the library isn’t fully functional on any mobile platform,” says Shotton. “This is in part due to the requirement for a human interaction to happen with a video element before it can be controlled programmatically.” The BBC is using the library internally to develop a streamable description for media composition with the working title of UMCP (Universal Media Composition Protocol).
It has taken a cue from Operational Transformation, a solution to support multiuser, single-task working, which powers Google Docs and Etherpad. “With a bit of domain-specific adaptation, this can be put to work in the arena of media production,” explains Leonard.
The kernel of the idea is that the exact same session description metadata is sent to every device, regardless of its capabilities, which can, in turn, render the experience in a way that suits it: either live, as the director makes the cuts, or at an arbitrary time later on.
“It is the NMOS content model which allows us to easily refer to media by a single identifier, irrelevant of its actual resolution, bitrate, or encoding scheme,” explains Leonard.
“One of the substantial benefits of working this way would be to allow us to author experiences once, for all devices, and deliver the composition session data to all platforms, allowing the devices themselves to choose which raw assets they need to create the experience for themselves,” he says. Examples include a low bitrate version for mobile, a high-resolution version for desktop, and 360° for VR headsets.
In theory, this would allow the production team to serve potentially hundreds of different types of devices regardless of connection or hardware capability without having to do the laborious work of rendering a separate version for everyone.
The hardware for an object-based production, called IP Studio, is being adapted for IP by the BBC. From a production point of view, equipment from a camera to a vision mixer or archive can be treated as an object. “IP Studio orchestrates the network so that real-time collections of objects work as a media production environment,” says Page. So, in the BBC’s schema, Optic will output UCMP, and that sits on top of IP Studio.

OBB Goes Commercial


As a public-funded body, the BBC is driven to unearth new ways of making media accessible to its licence fee-paying viewers. Larger onscreen graphics or sign presenters in place of regular presenters to assist people with impairments are two examples of OBB intended to improve accessibility.
The BBC is also part of the European Commission-funded 2-Immerse with Cisco, BT, German broadcaster IRT, ChyronHego, and others. It is developing prototype multiscreen experiences that merge broadcast and broadband content with the benefits of social media. To deliver the prototypes, 2-Immerse is building a platform based on European middleware standard HbbTV2.0.
OBB is likely to be commercialised first, though, in second-screen experiences. “The process of streaming what’s on the living room TV is broken,” argues Daragh Ward, CTO of Axonista. “Audiences expect to interact with it.”
The Dublin-based developer offers a content management system and series of software templates that it says makes it easier for producers to deploy OBB workflows instead of building one from scratch. Initially, this is based around extracting graphics from the live signal.
Axonista’s solution has been built into apps for the shopping channel QVC, where the “buy now” TV button becomes a touchscreen option on a smartphone, and The QYOU, an online curator of video clips that uses the technology to add interactivity to data about the content it publishes.
The idea could attract producers of other genres. Producers of live music shows might want to overlay interactive information about performances to the second screen. Sports fans might want to select different leaderboards or heat maps, or track positions over the live pictures. BT Sport has trialed this at motorcycle event MotoGP and plans further trials next year.
Another idea is to make the scrolling ticker of news or finance channels interactive. “Instead of waiting for a headline to scroll around and read it again, you can click and jump straight to it,” says Ward. Since news is essentially a playlist of items, video content could also be rendered on-demand by way of the news menu.
This type of application still leaves the lion’s share of content “baked in,” but it’s a taste of OBB’s potential. “All TV will be like this in future,” says Ward. “As TV sets gain gesture capability and force feedback control, it allows new types of interactivity to be brought into the living room.”
The audio element of OBB is more advanced. Here, each sound is treated as an object to add, remove, or push to the fore or background for interactivity, to manage bandwidth, processing capacity, or for playback on lower fidelity devices.
Dolby’s Atmos object-based audio (a version of its cinema system) is likely to be introduced to consumers as part of a pay TV operator’s 4K/UHD package. Both BT Sport and Sky, the broadcasters dueling it out with 4K live services in the U.K., have commissioned their mobile facility providers to build-in Atmos recording gear. Sources at these OBB providers suggest that a switch-on could happen by this time next year.
Initially, a Dolby Atmos production would allow additional user-selectable commentary from a neutral or team/fan perspective, different languages, and a referee’s mic. It would also add a more “at the stadium” feel to live events with atmospheres from the PA system and crowd.
BT’s research teams are also exploring the notion of responsive TV UI for red button interaction on the big screen and targeting 2020 as time for launch.
“Today we tend to send out something optimised for quite a small screen size, and if you have a larger screen it is then scaled up,” Brendan Hole, TV and content architect at BT, told the IBC conference.
“We are asking what happens if the broadcast stream is broken into objects so that the preferences of the user can be taken into account. You can add or remove stats in a sports broadcast for example, have viewer selection of specific feeds. It could automatically take account of the size and type of screen or it could take account of the fact I have a device in my hand so elements, like stats, could be delivered to mobile instead of on the main screen.”
Others investigating OBB include Eko Studio, formerly known as Interlude’s Treehouse. It offers an online editing suite that lets users transform linear videos into interactive videos so that the viewer can choose the direction of the video.
New York-based creative developer Brian Chirls has developed Seriously.js an open source JavaScript library for complex video effects and compositing in a web browser. Unlike traditional desktop tools, Seriously.js aims to render video in real time, combining the interactivity of the web with the aesthetic power of cinema. Though Seriously.js currently requires authors to write code, it is targeted at artists with beginner-level JavaScript skills so that the main limitation is creative ability and knowledge of video rather than coding ability.
MIT put the groundwork into object-based media a decade ago. It has since moved on to holographic video and display, although some of the same principles apply.
“We are exploring holographic video as a medium for interactive telepresence,” says Bove. “Holosuite is an object-based system where we used a range-finding camera like Microsoft Kinect as a webcam to figure out which pixels represent a person and which pixels the room with the ability to live stream content of people separately from the backgrounds and with full motion parallax and stereoscopic rendering.”
For content creators, object-based techniques offer new creative editorial opportunities. The advantages of shooting in an object-based way is that media becomes easily reusable, and it can be remixed to tell new stories or build future responsive experiences that don’t require any re-engineering.
“Either we need to produce multiple different versions of the same content which is highly expensive or we capture an object once and work out how to render it,” says Page. “Ultimately, we need to change the production methodology. OBB as an ecosystem has barely begun.”

No comments:

Post a Comment