23 Sep

How A Major Media Company Uses Machine Learning to Make It’s Vast Digital Archive Searchable and Accessible

Seb Bulpin

The Company

The company is an American media and entertainment conglomerate, known for its animated films, theme parks  and TV and streaming services. 

The Challenge

For almost a century, this organisation has been committed to preserving its archive of content, including drawings and concept artwork related to classic films, to serve as a vault of inspiration and reference for its writers and artists.  

This vast ocean of content of all different shapes and sizes, much of which is digital, has to be very carefully organised and maintained in order to retain its usefulness. 

Ultimately, every single frame of footage needs to be appropriately tagged across a vast range of categories, including the different characters, the relationships between them, the archetypes the characters portray, whether or not the animals in the film talk, music, nature as well as emotional themes and tones.  

The challenge is in finding a solution for organising all this content in a way that is effective, but does not involve huge amounts of difficult manual work, which would be impracticable. 

The foundation of their system is metadata: snippets of information that describe the stories, scenes and characters in the shows and movies. But manually tagging every single frame of a movie across all the possible categories would be an impossible manual task. 

The Solution

The organisation is partnering with a leading cloud provider to build machine learning tools that can automatically tag the content with the appropriate metadata. 

The algorithm dives through the company's database, looking at what has already been tagged and then continuing to tag the rest of the database in a consistent fashion. 

They started out with pure machine learning, before incorporating more deep learning as the depth of their data set grew. Using elastic cloud compute allows the team to rapidly test new versions of the machine learning model. 

A particular success was developing a deep-learning method that allowed a computer to tell animated characters from static representations (e.g. a ‘real’ character and that same character on a poster) and to identify characters in difficult and strange lighting. 

Once the model has a generic understanding of any particular character, it can search for that character (with minor adjustments) in any show or movie and tag them appropriately.  

With the content correctly tagged, users can quickly find what they need through a search interface. 

The Result

The technology allows the company's writers and artists to search through the vast database and effortlessly find any kind of digital content based on a very wide range of descriptions and metadata categories. 

Humans still approve the tags that the algorithm applies, but as the model learns more and more about the archive, the workload for the technical teams on this project is steadily dropping, allowing them to spend more time improving the algorithm, which then makes the search function more effective.  

The project’s success is now such that the technology is being deployed in other areas of the business. The team are now helping another media subsidiary to provide personalised recommendations to their customers based on the metadata of the articles and videos that they consume.

Latest Stories

See More
This website uses cookies to maximize your experience and help us to understand how we can improve it. By clicking 'Accept', you consent to the use of these cookies. If you would like to manage your cookie settings, you can control this in your internet browser. Find out more in our Privacy Policy