Transforming Data Engineering with Databricks for a Leading Media and Entertainment Company
Enabling unified data processing with cost-efficient storage, streamlined workflows, and enhanced performance.

Executive Summary
The client is an American multinational media and entertainment company, is renowned for its extensive content library and pioneering data strategies. As a leader in cable television and streaming services, They offer a diverse array of original programming, feature films, documentaries, and more. For over 30 years, it has maintained a strong presence in the industry with a workforce of 35,000, operating across multiple countries and establishing itself as a true global MNC. This case study highlights the migration of data engineering pipelines from a combination of AWS EMR and Snowflake architecture to a unified Databricks platform. The migration aimed to unify the data engineering approach across the company, reduce costs, and provide a single interface for data processing.
Client: The client is an American multinational media and entertainment company, is renowned for its extensive content library and pioneering data strategies.
High Storage and Compute Costs: Storing unstructured data in Snowflake proved 10x costlier, with Bronze and Silver layers increasing costs Performance Issues : Complicated queries, especially self-joins, caused performance bottlenecks and the lack of transparency hindered optimization. Inefficient Resource Management : The inability to finely control compute resources led to operational inefficiencies. Data Consistency : Ensuring that data across Snowflake and Databricks remained identical during the migration was crucial to avoid disruptions.
High Storage and Compute Costs: Storing unstructured data in Snowflake proved 10x costlier, with Bronze and Silver layers increasing costs Performance Issues : Complicated queries, especially self-joins, caused performance bottlenecks and the lack of transparency hindered optimization. Inefficient Resource Management : The inability to finely control compute resources led to operational inefficiencies. Data Consistency : Ensuring that data across Snowflake and Databricks remained identical during the migration was crucial to avoid disruptions.
High Storage and Compute Costs: Storing unstructured data in Snowflake proved 10x costlier, with Bronze and Silver layers increasing costs Performance Issues : Complicated queries, especially self-joins, caused performance bottlenecks and the lack of transparency hindered optimization. Inefficient Resource Management : The inability to finely control compute resources led to operational inefficiencies. Data Consistency : Ensuring that data across Snowflake and Databricks remained identical during the migration was crucial to avoid disruptions. Phase 1: Bronze and Silver Layers : Data pipelines that previously wrote to Snowflake were redirected to write to Databricks, using External Delta Tables on Amazon S3 to optimize storage costs. Phase 2: Gold Layer : To maintain client application stability, data was written to both Snowflake and Databricks during the transition period. This dual-write strategy ensured no immediate impact on analytics and BI applications. Client Migration Strategy : They enabled clients to validate Gold layer data in Databricks before fully migrating applications. Upon successful migration, Snowflake was decommissioned.

