IBC
https://www.ibc.org/features/putting-a-price-on-disaster/7996.article
Downtime is something no broadcaster
wants and the ability to rapidly switch over to a back-up system is imperative.
Given recent high-profile outages, is it time to rethink disaster recovery
strategies?
The fire alarm that shut down systems at Red Bee
Media last September was a disaster that highlighted the benefits of a robust
recovery policy.
While BBC channels were switched from White City to
Salford (Red Bee’s other UK transmission centre) with minimal disruption, Channel
4 and its sister channels continued to experience difficulties after the event.
Observers in the industry, while not wishing to
comment directly on that particular incident, spoke of the necessity for a
disaster recovery strategy that enables broadcasters to maintain all services
despite outages.
Instead, it will have suffered from lost
advertising revenue and reputational damage, although it is facilities provider
Red Bee Media has also been the subject of some viewers displeasure.
Viewers have taken to social media to express their displeasure, with many complaints are from the visually or hearing impaired questioning why audio description and subtitles were not seemingly prioritised as some channels returned to air without them.
“Some broadcasters are better prepared than
others,” says Michael Rebel, Director, Solution Architecture, Imagine
Communications. “When a channel is forced off air, there are two consequences
for the broadcaster. First, there is the direct loss of having no income
because you can’t transmit any commercials. The second is harder to value, but
potentially more devastating: the risk that viewers may go and discover other
channels and other content.
“The adequacy of a disaster recovery plan is
essentially a business decision: How long do you dare risk the loss of income
and brand image?”
Why have DR anyway?
There are two main reasons why you need a disaster
recovery plan. First, a technical failure, which might be anything from a power
outage to failure of a key piece of equipment or a natural disaster.
The second reason is around restrictions on staff
working. That might be because of a fire in the building, disruption to travel
and traffic around the site, or the need to protect the team by minimising its
exposure to the viral transmission. This has been thrown into sharp relief
under Covid-19.
“Broadcasters’ DR plans are more advanced than they
were, but maybe not yet fully adequate,” says CiarĂ¡n Doran, Director of
Marketing, Rohde & Schwarz. “Most DR systems require access to a back-up
site, but Covid showed us that we also need to be prepared for situations where
you cannot physically access your systems.”
Media organisations understandably make business
decisions that balance the amount of downtime and loss disasters may cause
against the expense of running the most secure recovery models.
“Generally, these top-level disaster recovery plans
are adequate but they’re not perfect,” says Rick Young, SVP, Head of Global
Products, LTN Global. “With natural disasters growing increasingly common and
catastrophic, on-premise models will be severely tested.”
DR plans can vary between a complete replication of
studio and equipment in another location to cloud-based instances ready to spin
up. The adequacies of each approach vary widely depending on how they are
measured, whether that’s cost effectiveness, robustness or ease of activation.
Large broadcasters typically have some sort of
backup options to deal with each type of disaster. They may have a primary and
secondary output for distributing signals, a temporary backup infrastructure
for use only during the hurricane season (in the US) or a hybrid (physical and
virtual) DR system to deal with inaccessibility to a physical location.
“Disaster recovery plans using hardware-intensive
systems are, by and large, a luxury that only large broadcasters with deep
pockets can afford,” says Srinivasan KA, Co-founder, Amagi. “They run their
backup as a primary option once a month or once a quarter to test the system.
But this is an expensive endeavour. Most broadcasters only use DR for their
very profitable channels. With natural disasters becoming a more frequent
occurrence, and with the unpredictability of system failures, there is
definitely a need for broadcasters to rethink their DR strategies.”
Rethinking strategies
For on-premises playout and content management,
there are several scenarios with different factors to consider.
“For high-value channels — in which failure can
cause significant loss of revenue — broadcasters should consider a traditional
hot/hot system with two servers at separate facilities running parallel with
duplicate content and playlists,” says Young. “This is the safest option, with
virtually imperceptible switchover if the first server fails, but it’s also the
most expensive option, requiring two sets of hardware, bandwidth and separate
facilities.”
A slightly cheaper option, according to Young, is
running a ‘warm’ standby server ready to go. It may not have all the content or
run completely in sync, and someone will have to manually trigger the switch,
but it’s less expensive. Broadcasters will traditionally measure downtime in
minutes in this model.
“For lower-value channels, the cheapest option is
to identify an existing server that someone could switch to provide content and
playout, but this takes a lot of time, measuring in hours instead of minutes.”
If a DR centre is going to be of any value, it must
be at some distance from the primary playout. The implied criticism of the Red
Bee Media/C4 scenario is that this was not the case.
“There is no point having a disaster recovery
centre that might be evacuated by the same ruptured gas main, for example,”
says Rebel. “Given the geographic diversity, then you have to design systems
that will maintain the two sites in synchronisation, for content and for
playlists. So, you must choose an automation and channel platform that provides
mirroring intrinsically within the system.”
Automated failover is an option, but sometimes a
broadcaster will need a more manual approach in the event of a disaster. Doran
says there are many examples “where automated systems take the wrong decision,
so it’s a mental challenge to leave such a critical decision of whether one
playout chain stays on air or switches to another”.
Cloud control
The dependence on communication and the need for
geographic diversity make the cloud the logical choice for DR playout. Indeed,
backing-up playout in the cloud is seen as the optimum strategy by tech vendors
talking to IBC365. Not just for VOD channels either.
“Linear playout now should be in the cloud, it’s
literally the best place for it,” says Adam Leah, Creative Director, nxtedition.
“Not just for the elasticity and scalability but also robustness. By providing
a distributed process in the cloud you can mitigate the risk. The tricky part
comes around live broadcasts, as this is where we tend to find latency,
increased cost and security issues.”
A cloud DR strategy has many advantages. Among
them, it allows for control of playout from literally anywhere with an internet
connection.
“Should the primary centre need to be evacuated,
then a channel controller can simply pick up a laptop and work from wherever
they can get online,” says Rebel. “There is no reason why you cannot control a
premium channel from a nearby Starbucks if that is the quickest way to get back
on air.”
There are also bold claims for cloud’s
cost-effectiveness, effectively mitigating against the all or nothing back-up
systems of old.
“DR systems are now much more affordable, software
defined and cloud connectable and it is no longer cost prohibitive to have DR
in place for many more channels than just premium channels,” says Doran. “It is
also possible, right now, to set up DR facilities that are remotely or cloud
operated – for all or only parts of the workflow.”
This flexibility is inherent in the building blocks
of cloud services. For example, playout software can remain dormant in the
cloud until the moment you need it. “Should the worst happen all you need is
the time to spool it up and you can be on air virtually immediately,” says
Rebel. “With a cloud model where you pay only for the processing you need when
you need it, this is very much a cost-effective solution.”
Srinivasan agrees: “Cloud can be your insurance
where you only pay a fraction of the cost of the primary infrastructure for DR
and run it only when it’s needed.”
Cloud also gives broadcasters the option to pick
the kind of DR plan that best suits their needs, whether they are a
billion-dollar TV network or a midsize or a small, niche channel.
“Larger networks can opt for a 24/7 disaster
recovery option known as a ‘Hot DR’,” Srinivasan explains. “But then, there are
also ‘Warm DR’ and ‘Cold DR’ options. With Warm DR, broadcasters can have
content prepped and ready to go from the cloud, but not start the playout on
the channel until disaster strikes. With Cold DR, there is no playout that is
run from the cloud. Instead, evergreen content is stored in the cloud with a
playlist. In the event of a disaster, this evergreen content starts playing out
until the playout problem is resolved - an ideal solution for niche content
owners.”
Technically, this is all standard stuff. Channel
playout engines are now fully implemented software platforms that utilise
microservices and modular architecture, so can operate in the cloud or the
machine room equally.
“That also gives the reassurance that, should you
need to go to DR, the playout operations and user interfaces will look and
operate exactly the same, with no risk through unfamiliarity,” Rebel says. “And
security is excellent: AWS is used by the US Intelligence Community. If five
nines [99.999%] are the gold standard for conventional playout the SLAs from
the major cloud providers offer nine nines.”
Accordingly, broadcasters are now seeing
cloud-based disaster recovery as the route to wider implementation of a cloud
strategy.
“Broadcasters who are still thinking about
transitioning their workflows to the cloud, can wet their toes by transitioning
their DR to the cloud first,” says Srinivasan. “It is a low-cost and low-risk
way of experimenting with the cloud, before deep diving into end-to-end
cloud-based media management.”
By nature of being more distributed, cloud and
IP-based systems can offer more stability, but it’s not absolute. An issue with
Google’s servers brought down many of the company’s cloud services, leaving
millions of users without access to their data last December. In June, a
problem at cloud computing provider Fastly exposed the fragility of the
internet when prominent sites including HBO Max, Hulu, Vimeo, Amazon, Twitter
and Spotify were disrupted.
Would cloud have helped?
So, would a cloud DR have prevented the downtime
experienced by Channel 4?
“Nothing is absolute, but a well-architected
cloud-based model with redundant routes and automatic failover would likely
prevent a similar scenario,” says Young.
“If you are replicating your playout system to one
in the cloud, that should certainly reduce the outage time,” Doran says. “Just
as with any crisis management situation where you work out everything that
could go wrong and then do everything in your power to make sure it doesn’t,
the best DR plan is to ensure your primary master control playout system is
bang up to date and doesn’t malfunction.”
Nxtedition’s Leah says: “Risk can never be
mitigated completely but it can be managed.” He calls for a new approach across
the board. “We need to change minds. The old way of doing things is not the
optimal solution available. By using microservices both on premise and in the
cloud there are some remarkably efficient ways of replicating and distributing
the content to playout. The industry needs to experience what that is like to
fully embrace it.”
As the pandemic has shown, the ability to operate
remotely is increasingly vital. As Doran outlines: “Covid highlighted the need
to move to IP and software-defined infrastructures so that broadcasters can
take advantage of the flexibility they offer. The more flexibility you can
build into your system the more it can withstand a variety of disaster
scenarios.”
The argument is that by moving workflows to the
cloud, broadcasters can have a distributed infrastructure, with nothing on
premise, while accessing everything remotely.
“If you do not need to cram staff together, then
why would you?” insists Rebel. “Ultimately, the goal will be to decentralise
all playout, allowing operations from any remote location to suit the channel
and its staff, reducing the environmental impact of people travelling to work.
“Any DR centre for a major channel –
terrestrial or cloud – should be capable of taking over as close to instantly
as the business demands. That might be a few frames, or it might be a few
minutes. Anything longer means that you are not really recovering from the
disaster.”
Perhaps the ideal disaster-proofing strategy would
be to opt for a hybrid approach. Broadcasters using traditional infrastructure
could mix on-premises and cloud-based DR mechanisms.
“Those who are already operating on the cloud could
choose to invest in different regions of the same cloud service provider or
choose multiple cloud service providers such as AWS and Google Cloud for
running their DR systems,” suggests Srinivasan. “In the eventuality of one
service provider facing an outage, the other would automatically fill the gap.
This distributed infrastructure could prove to be the best strategy to mitigate
the impact of a disaster.”
Ofcom’s ire
UK comunications regulator Ofcom issued a statement specifically about the provision of access services, criticising Channel 4 for “not having a strong backup plan in place” and telling the broadcaster that it shouldn’t have taken several weeks to fix the problem.
It said: “After a long outage, subtitles have now been restored on many Channel 4 programmes. However, signing and audio description are still not available on the broadcaster’s channels.
“We remain deeply concerned about the scale of the technical failures experienced by Channel 4 and the length of time taken to fix them. These problems have caused deep upset and frustration among people who are deaf, hard of hearing, blind or partially sighted.
“Channel 4 did not have strong backup measures in place, and it should not have taken several weeks to provide a clear, public plan and timeline for fixing the problem.
“We expect Channel 4 to meet – or exceed – the timings it has set for restoring all its subtitling and other access services.
“When this is done, Ofcom will review the equipment and facilities that Channel 4 had – and now has – in place, so that lessons can be learned.
“We will consider what action might be required to make sure broadcasters do not find themselves in this situation again, and that subtitles, signing and audio description remain reliable even when problems occur with the infrastructure used to provide them.”
No comments:
Post a Comment