Momentum builds behind content credentials to combat AI deepfakes

IBC

The Coalition for Content Provenance and Authenticity (C2PA), is an organisation developing technical methods to document the origin and history of digital-media files, both real and fake.

article here

In April 2022 a BBC news report claimed that Ukraine was behind a missile attack on a Donbas station that killed 57 people. The video opened with a BBC logo and had the broadcaster’s watermark in the corner. It was a fake, as a BBC Verify journalist pointed out on X but it was also a wake-up call to the broadcaster to do something about rising deepfake disinformation.

“Everyone was horrified to see the fake video but the only thing we could do was tweet denials,” says Laura Ellis, Head of Technology Forecasting, BBC. “For some it was the ‘Aha!’ moment when they fully realised we needed to do more.”

Fortunately, the Corporation was already pioneering efforts to go beyond flagging deepfakes after the event and to show audiences the source of video it publishes up front.

“The work of BBC Verify is key in terms of fact checking and signalling to the audience if we’ve not been able to check it but we wanted to raise the bar by turning the question on its head. We want to positively assert media provenance by showing audiences how this media came to us and how it was made.”

“Most people said that it was ambitious, that it was almost undoable. But we are very stubborn and thought that this is something R&D should be looking at.”

The idea of media provenance or data integrity has been gaining ground in the tech community as a way of combatting the rush of AI-generated fakes. It is news media which is particularly vulnerable to this sort of attack (truth being the first casualty of war and also of political elections). So, in a bid to take the initiative broadcasters including the BBC along with Canada’s CBC and the New York Times joined forces to ensure their own integrity as a trustworthy news source did not fall victim.

“What we’re seeing is a really fundamental shift in the media ecosystem that we need to act on,” says Judy Parnall, BBC Head of Standards and Industry. “I kind of wish the elections [including UK and the US] were not this year. We’ll be in a better position in 2025 when efforts set in train a few years ago really come to fruition.”

Project Origin

Project Origin was formed by the consortium in 2018 to secure trust in news through technology. They were later joined by Microsoft. As the project progressed, they found fellow travellers in Adobe which had established its own similar Content Authenticity Initiative. In 2020, they combined efforts into the Coalition for Content Provenance and Authenticity (C2PA) to work on a set of open standards which would allow content to contain provenance details.

“We were looking at a similar problem [to CAI] so we agreed to work towards one technical standard,” Parnall explains. “C2PA is the underlying technical standard and CAI and Project Origin the two user communities feeding into it. We are absolutely hand in glove. At Project Origin we are better placed to bring in larger news and tech organisations and Adobe is bringing in more individual users. C2PA pulls them all together.”

A first technical standard for attaching cryptographically secure metadata to image and video files was released in 2021. It is a free, open source implementation released under the Linux Foundation.

Images that have been authenticated by the C2PA system can include a “cr” icon in the corner; users can click on it to see whatever information is available for that image—when and how the image was created, who first published it, what tools they used to manipulate it, how it altered, and so on.

A number of vendors have begun incorporating the standard into their product with more announcements pending. The most recent signatories include Sony, joining Nikon, Canon and Leica, in developing cameras capable of capturing CP2A data at acquisition. In its press conference, Sony likened content credentials to a “birth certificate for an image.”

Most significantly, at the start of the year month OpenAI said it will implement the C2PA digital credentials for images generated by DALL-E 3, the latest version of its AI-powered image generator. It said this was to prevent the use of its Gen-AI products for misinformation ahead of the US Presidential Election in November.

In parallel OpenAI said it was experimenting with a “provenance classifier” for detecting images generated by DALL-E.

“Our internal testing has shown promising early results, even where images have been subject to common types of modifications. We plan to soon make it available to our first group of testers – including journalists, platforms, and researchers – for feedback.”

How Content Credentials are implemented

The CAI’s work is focused on three main areas: capture, edit, and publish, explains Santiago Lyon, Head of CAI Advocacy and Education, Adobe. "With capture, we work with camera and smartphone manufacturers to integrate provenance technology into their hardware devices at production, allowing us to empirically establish a file’s provenance from the moment a photo is taken, or a video or audio file is recorded.”

The next area concerns editing. Here, provenance technology is integrated into multiple tools including Adobe’s creative suite like Photoshop, allowing for any editing changes made to a file to be captured and securely stored and creating a secure ‘edit history’ of the file in question.

“When digital files are published, metadata can sometimes be removed, so we are also actively working with news publishers, social media platforms and others to retain and display this underlying provenance information through a universal icon displayed next to each published asset.”

This icon and the underlying provenance information are the Content Credentials, which Lyon says are the equivalent of a digital ‘nutrition label’ on food.

“The consumer can then inspect the Content Credential published alongside each digital file and better understand where it came from and what changes have been made to it. Over time, our hope is that consumers will naturally expect to see Content Credentials displayed alongside online images, videos, audio recordings and other file type, to discern what is trustworthy.”

Lyon adds, “Ahead of multiple elections happening across the world this year, we cannot let misinformation erode trust, endanger creative and digital economies, and even threaten democracy itself.”

Building a standard

Standards work most effectively when adopted by broad user group and C2PA is “hoping for a critical mass” to achieve its goal. One aim is to get the standard ratified by international standards bodies such as the ISA which details professional standards for the auditing of financial information.

“We’re talking to everyone,” says Parnall. “A number of groups have come together and many are not yet announced or are working out how to integrate it into their system. The bigger the organisation the more complicated this is.”

News workflows are particularly complex and also time sensitive. It is important that adding C2PA signals into the chain doesn’t add a processing delay.

Innovation incubator BBC News Labs is testing the integration of C2PA signals on the BBC website and trailing the same with an unnamed social platform. This work is now being picked up and developed as a pilot by Origin partner Media City Bergen who, with the IPTC, joined the Origin consortium as members last year. One aim is to prove that a third party can come in and work with signals off-the-shelf.

“Our in-house research team has found evidence that adding provenance to images increases trust in content amongst those who don't typically consume our content,” Ellis says. “We also found evidence that provenance evens out trust across a range of images we use (editorial, stock and user generated content).”

The idea is to build a range of options through which organisations can employ provenance signals - directly at the point content is published and using functionality offered by manufacturers.

Ellis’s group is also exploring the idea of “service centres” to which publishers could send their images for validation and certification; the images would be returned with cryptographically hashed metadata validating their authenticity.

The underlying technology

The technology is a lighter touch system than blockchain. It is being designed to work whether a user is connected to the internet or not and CP2A members want a solution that has a lighter carbon footprint than blockchain.

“We need to make the tools as low friction as possible and to automate the process so the journalist has the minimal amount to do,” says Ellis.

Another consideration is for the journalist to redact information from the signals for instance in order to protect the identity of a source or a vulnerable person being interviewed.

C2PA is based on cryptographically hashed metadata, a technique that forms a small and unique representation of the underlying data. If the data changes in any way, even by a single digital bit, the hash will no longer match the data.

“Protecting the integrity of the hash through a cryptographic signature is an effective way of keeping the integrity of the whole data,” Ellis explains, “by proving the signature was a witness to the hash and checking the data still generates the same protected hash value.”

Fox’s Verify system

Fox has gone a different route and developed an in-house system called Verify, an open source protocol just launched in beta and like CP2A designed to establish the history and origin of registered media. It is built on a blockchain developed by researchers at Polygon Labs.

Fox Corp launched a closed beta of Verify on August 23, coinciding with the first Fox News GOP debate. To date, 89,000 pieces of content, spanning text and images, have been signed to Verify, from Fox News, Fox Business, Fox Sports, and Fox TV affiliates.

“With this technology, readers will know for sure that an article or image that purportedly comes from a publisher in fact originated at the source,” Fox explained.

Additionally, Verify establishes a bridge between media companies and AI platforms. Fox say Verify creates new commercial opportunities for content owners by utilising smart contracts to set programmatic conditions for access to content.

Social media question mark

None of these efforts will have much impact if social media platforms don’t get onboard. Viewers will only see content credentialled information if they’re using a platform or application that can read and display the data.

Meta is reportedly engaged on this issue down to the practicalities of the additional compute requirements needed for content watermarking. X boss Elon Musk has voiced his support for AI regulation.

“The preference is for social media platforms to take credentialled content and continue to display those credentials. It’s a bit chicken and egg. They want to know there’s enough movement in the standard before they go,” says Parnall. “They are very aware of [CP2A] and the work we have been doing.”

Social media are the “key problem space,” says Ellis declined to comment further.

Chain of trust

The idea of content credentials and the work of the C2PA in particular is gathering momentum. “It is remarkable watching the community come together around C2PA which is very much a dominant force. Its really the only standard in town.”

Nonetheless rollout might be glacial relative to the eyewatering pace of Generative AI and media saturation of deepfakes.

“This is a gradual rollout that needs to be introduced in every part of the ecosystem and therefore requires a lot of collaboration,” Parnall says. “Having the ability to drill down and nest all this material together so people can access all the information about the way the news they are consuming is produced is important but it is not a quick fix.”

Companies including Nvidia, Publicis Groupe, AFP, Reuters and AP, Intel, Ateme and Truepic are filling in the gaps at different points in content production. AWS is another member.

Adobe, for example, generates the relevant metadata for every image that’s created with its image-generating tool, Firefly. Microsoft does the same with its Bing Image Creator.

“It will take a while for all of these components to knit together properly but why C2PA is so utterly essential is that if you’ve got open standards and a body of people that want this to work then you have a chance of making it happen.”

Can it be hacked?

There’s a misunderstanding that C2PA is easy to hack or remove but that is not the point, says Ellis. “The point is that as a trustworthy broadcaster we want to put these signals in to show that our content is trustworthy and to give users the ability to interrogate the various elements.”

The likelihood is that integrity of C2PA will be strengthened by combining it with other provenance marking technology such as watermarking and fingerprinting.

Meanwhile Microsoft is keeping an eye on developments in quantum computing which threaten to take computer powered cyberattacks into another realm of sophistication.

“We are working with the tech giants so when a quantum break happens we are as prepared as we can be,” Ellis says. “A quantum future is built into our thinking.”

The reliability of C2PA certification is vital. If somebody spoofs the C2PA it is “instantly a disaster for us,” she says. “We need out output to have integrity so we’re putting a lot of effort into making it as secure as possible and also that people understand what it is.

“It is not a watermark,” she continues. “It is a way of communicating with the audience that this media is from [the BBC] and that this is how we made it. Similarly, if you don’t see those signals or those signals are broken that is the time to be alert.”

Communicating the message

Getting that message across to the public demands a huge programme of media literacy. “It is a mammoth task,” says Ellis. “We hope to use [BBC] airwaves and websites to explain but the issue of enormous interest to everybody including in the regulator sector, at Ofcom, the Government and the House of Lords.”

There are plans to bring on partners to help educate the media business, the wider public, schools and universities. Systems integrators could advise companies on how to adopt the standard.

“It’s like running a startup and what we’re trying to do next is scale up,” she says.

The principal is being enshrined in legislation like the pending EU AI Act. The Biden administration issued an executive order on AI that requires content authentication and labelling of synthetic content.

“Technology moves faster than legislation. Let’s be realistic, it might not be the most appropriate standard to use in five years’ time but the principals of media provenance should be universal and perpetual.”

Adrian Pennington - The Write Stuff

Wednesday, 7 February 2024

Momentum builds behind content credentials to combat AI deepfakes

No comments:

Post a Comment