- The Client is an American subscription video-on-demand (SVOD) service owned by A Global Media & Entertainment Company.
- Client runs the content from a popular television service, originally programming under a large production banner, and content acquired via third-party library deals.
- In the streaming domain, Content is King, and being able to derive insights from all the content determines the platform’s success.
- Metadata is currently available from different sources (internal and external), each having its representation.
- The CKG teams main challenge is integrating data from these sources and linking them to create a unified and consistent representation, which will be used in making critical business decisions.
About the client
Incorporated 50 years ago, it is one of the major streaming companies with over 50MM+ subscribers, continuously improving its products and services over the generations.
The client’s Global Data & AI (“DAI”) is a cross-vertical, integrated full-stack data and analytics platform-based organization. We primarily focus on the end-to-end data pipelines to products and services that influence decisions based on analytical modeling and probabilistic and deterministic directions and tendencies.
DAI Mission: Build best-in-class data & AI products and solutions to enhance storytelling & experiences for the client’s audiences globally.
Current State Challenges
Client’s organization owns much content and related metadata, but to support many businesses use cases, we also need additional data from external sources.
There is a lot of helpful information available on the internet and from third parties (free and paid) in addition to the information we already have in our content metadata repository.
The challenge is integrating data from these sources and linking them to create a unified and consistent representation.
Content metadata (data about movies/series, talents, etc.) is one of the most foundational datasets that powers use cases such as Content Valuation, Portfolio Optimization, Demand Prediction, Marketing Analytics, Search, and Recommendation.
This unified data representation will help to address the use case as specified below :
- Predict what titles or kinds of titles will increase viewership and generate more revenue.
- Find similar titles to use in recommendations to improve consumer experience.
- Optimize the product offerings on the media & entertainment platform.
- Generate reports to make intelligent business decisions.
- Optimize marketing efforts.
Why AWS Platform?
AWS is the leading cloud provider, and to solve the business challenge described above, we wanted something that would allow us to develop and deploy faster and provide storage for vast amounts of data (Petabytes of data) - this is where S3 becomes helpful, allow processing and transformation of data, and do not incur additional costs once it is done - this is where transient EMR cluster, which can be provisioned on need basis. Also, AWS provides Neptune Graph, which fits very well with our Knowledge Graph requirement.
Why Info Services for Implementation?
Info Services helps customers implement scalable and cost-effective solutions meeting different needs, particularly data & analytics solutions.
Info Services brought niche capabilities required for streaming initiatives with extremely talented AWS Solution architects and data engineers who successfully implemented the solution. Also, they invest in technologies and employees to create a challenging environment that paves the professional growth for the associates.
Architected the solution with best practices and well architect framework.
- Ingestion (ETL):
For ingestion, data pipelines are built corresponding to each ingestion source. This data pipeline will download all the data dumps to S3, transform them using Spark jobs, and load relevant data to snowflake tables.
- Normalize and Unify:
Once data is loaded into Snowflake, various algorithms are applied to create a unified data set
- Visualize and Query
The unified data set is then loaded into Neptune Graph DB for visualization and query purposes.
Results and Benefits
CKG has become the go-to source for all the data science teams to get unified content in one place and use it for their machine learning algorithms.
Also, with a large media & entertainment company and the client’s merger in place, CKG has a vital role in managing all the first- party and third-party content metadata and providing stakeholders with descriptive and statistical metadata into a unified schema with the help of an ontology.