Tuesday, 5 March 2024

Tag, Search, Serve: What You Need to Know About Analytical AI

NAB

article here

Generative AI can be used to create audio, stills, and videos but something often overlooked is how useful Analytical AI can be. In the context of video analysis, it would involve facial or location recognition, logo detection, sentiment analysis, and speech-to-text, just to name a few. Analytical tools are the focus of a Michael Kammes podcast, “AI Tools For Post Production You Haven’t Heard Of.”

“Welcome to the forefront of post-production evolution,” he says.

Kammes invites post-production chiefs to take a look at a number of analytical tools. These include StoryToolkitAI, an editing tool that uses AI to transcribe, understand content and search for anything in your footage, integrated with ChatGPT and other AI models. It began as a GitHub project by developer Octimot, runs on OpenAI’s Whisper and Python, and can be used on Blackmagic Design’s DaVinci Resolve among other professional editing systems.

“StoryToolKitAI transforms how you interact with your own local media. Sure, it handles the tasks we’ve come to expect from AI tools that work with media like speech-to-text transcription. But it can understand and execute tasks that it was never explicitly trained for,” he says.

He describes it as a “conversational partner. You can use it to ask detailed questions about your index content, just like you would talk with ChatGPT.”

Kammes likes that StoryToolkit runs locally so users get privacy even while the application itself is open source. He believes the app’s architecture is a blueprint for how things should be done in the future.

“That is, media processing should be done by an AI model of your choosing and can process media independently of your creative software. Or better yet, tie this into a video editing software’s plug-in structure, and then you have a complete media analysis tool that’s local, and using the AI model that you choose.”

While many analytical AI indexing solutions search your content based on literal keywords, others perform a semantic search by using a search engine that understands words from the searcher’s intent and their search context. This type of search is intended to improve the quality of search results.

This is what Twelve Labs seems to have cracked. Its tech can be used for tasks like ad insertion or even content moderation, says Kammes. “Like figuring out which videos featuring running water or depicting natural scenes like rivers and waterfalls or manmade objects like faucets and showers,” he explains.

“In order to do this, you would need to be able to understand video the way a human understands video and what we mean by that is understanding the relationship between those audio and video components and how it evolves over time because context matters the most.”

Cloud storage developer Wasabi Technologies recently acquired Curio AI, a technology developed by GrayMeta that uses AI and ML to automatically generate a searchable index of unstructured data. GrayMeta President and CEO Aaron Edell and his AI team are also joining Wasabi.

According to Kammes, speaking ahead of the acquisition announcement, “Curio isn’t just a tagging tool. It’s a pioneering approach to using AI for indexing and tagging your content using their localized models. Traditionally, analytical AI generated metadata can drown you in data and options and choices, overloading and overwhelming you. GrayMeta simplifies the search process right in your web browser.”

Wasabi is planning to gives its users exclusive access to Curio. It will allow them to easily search their huge archives of unstructured data, something that was not possible before, the company said.

“Imagine walking into Widener Library at Harvard with 11 million volumes, and there’s no card catalog,” David Friend, CEO of Wasabi, told Joseph Kovar at CRN. “That’s what we have right now with unstructured data in the cloud. Our acquisition of this machine learning technology is really going to be the most important development since the introduction of object storage itself.”

He added, “Today unstructured data is still in the dark ages. I believe that what we’re doing here with Curio AI to automatically create an index of every face, every logo, every object, every sound, every word, will really revolutionize the utility of object storage for the storage of unstructured data.”

Wasabi plans to fully integrate Curio into its cloud storage, and not offer it as a standalone technology for other storage clouds.

“It’s going to be one integrated product, and it’s going to be sold by the terabyte just like our regular storage, but at a slightly higher price. And for that, you will get unlimited use of the AI,” Friend detailed.

Curio will automatically scan anything that’s put into Wasabi’s storage and produce an index which can then be accessed using the Curio user interface and one of several media asset management systems including Iconik, Strawberry and Avid. The company expects to go to market with the product later this year “with channel partners who sell into the media and entertainment industry.”

Wasabi even thinks its combination of object storage and Curio is a step ahead of even Amazon, Google and Microsoft in terms of functionality.

“The hyperscalers can’t do what we’re doing with Curio. I mean, they have a toolkit, and you can assemble something like this if you have the time and money. But there’s nothing equivalent to this that anybody else is offering as far as I know.”

Next Kammes addresses Code Project AI server which handles both analytical and generative AI. He describes it as “Batman’s utility belt” where each gadget and tool on the belt represents a different analytical or generative AI function designed for specific tasks.

“And just like Batman has a tool for just about any challenge, Code Project AI Server offers a variety of AI tools that can be selectively deployed and integrated into your systems, all without the hassle of cloud dependencies.”

This includes object and face detection, scene recognition, text and license plate reading, and for even the transformation of faces into anime-style cartoons. Additionally, it can generate text summaries and perform automatic background removal from images.

The Server offers a straightforward HTTP REST API for integration into a facility or workflow. “For instance, integrating scene detection in your app is as simple as making a JavaScript call to the server’s API. This makes it a bit more universal than a proprietary standalone AI framework,” says Kammes.

It further also allows for extensive customization and the addition of new modules to suit specific needs.

Finally, Kammes highlights Pinokio “a playground for you to experiment with the latest and greatest in generative AI.”

Pinokio is a self-contained browser that allows you to install and run various analytical and generative AI applications and models without knowing how to code. It does this by taking GitHub code repositories (called repos( and automating the complex setups of terminals, clones and environmental settings. “With Pinokio, it’s all about easy one click installation and deployment, all within its web browser,” Kammes insists. “It enables you to with various AI services before they go mainstream.”

It already chock full of diverse AI applications to play with, from image manipulation with Stable Diffusion to voice cloning and AI generated video tools. “Pinokio helps to democratize access to AI tools by combining ease of use with a growing list of modules. As AI continues to grow in various sectors platforms like this are vital in empowering users to explore and leverage AI is full potential. The cool part is that these models are constantly being developed and refined by the community,” Kammes says.

“Plus, since it runs local and it’s free, you can learn and experiment without being charged per revision. Every week there are more analytical and generative AI tools being developed and pushed to market.”

 


No comments:

Post a Comment