Transforming Data Engineering with Databricks for a Leading Media and Entertainment Company

High Storage and Compute Costs: Storing unstructured data in Snowflake proved 10x costlier, with Bronze and Silver layers increasing costs·

Enabling unified data processing with cost-efficient storage, streamlined workflows, and enhanced performance.

Executive Summary

The client is an American multinational media and entertainment company, is renowned for its extensive content library and pioneering data strategies. As a leader in cable television and streaming services, They offer a diverse array of original programming, feature films, documentaries, and more. For over 30 years, it has maintained a strong presence in the industry with a workforce of 35,000, operating across multiple countries and establishing itself as a true global MNC. This case study highlights the migration of data engineering pipelines from a combination of AWS EMR and Snowflake architecture to a unified Databricks platform. The migration aimed to unify the data engineering approach across the company, reduce costs, and provide a single interface for data processing.

Client: The client is an American multinational media and entertainment company, is renowned for its extensive content library and pioneering data strategies.

High Storage and Compute Costs: Storing unstructured data in Snowflake proved 10x costlier, with Bronze and Silver layers increasing costs Performance Issues : Complicated queries, especially self-joins, caused performance bottlenecks and the lack of transparency hindered optimization. Inefficient Resource Management : The inability to finely control compute resources led to operational inefficiencies. Data Consistency : Ensuring that data across Snowflake and Databricks remained identical during the migration was crucial to avoid disruptions.

High Storage and Compute Costs: Storing unstructured data in Snowflake proved 10x costlier, with Bronze and Silver layers increasing costs Performance Issues : Complicated queries, especially self-joins, caused performance bottlenecks and the lack of transparency hindered optimization. Inefficient Resource Management : The inability to finely control compute resources led to operational inefficiencies. Data Consistency : Ensuring that data across Snowflake and Databricks remained identical during the migration was crucial to avoid disruptions.

High Storage and Compute Costs: Storing unstructured data in Snowflake proved 10x costlier, with Bronze and Silver layers increasing costs Performance Issues : Complicated queries, especially self-joins, caused performance bottlenecks and the lack of transparency hindered optimization. Inefficient Resource Management : The inability to finely control compute resources led to operational inefficiencies. Data Consistency : Ensuring that data across Snowflake and Databricks remained identical during the migration was crucial to avoid disruptions. Phase 1: Bronze and Silver Layers : Data pipelines that previously wrote to Snowflake were redirected to write to Databricks, using External Delta Tables on Amazon S3 to optimize storage costs. Phase 2: Gold Layer : To maintain client application stability, data was written to both Snowflake and Databricks during the transition period. This dual-write strategy ensured no immediate impact on analytics and BI applications. Client Migration Strategy : They enabled clients to validate Gold layer data in Databricks before fully migrating applications. Upon successful migration, Snowflake was decommissioned.

Cost ReductionThis migration cut costs up to 40% and lowered infrastructure and orchestration overhead and it also simplified data pipelines and reduced friction.
This migration cut costs up to 40% and lowered infrastructure and orchestration overhead and it also simplified data pipelines and reduced friction.
Unified Data Engineering WorkflowsReplacing AWS EMR and Airflow with Databricks workflow cut pipeline execution times and simplified management within a unified ecosystem.
Replacing AWS EMR and Airflow with Databricks workflow cut pipeline execution times and simplified management within a unified ecosystem.
Enhanced PerformanceFine-tuned resource control and optimized query performance addressed prior performance challenges.
Fine-tuned resource control and optimized query performance addressed prior performance challenges.
Improved Client ExperienceConsistent and reliable data availability through phased migration and dual-write strategy.
Consistent and reliable data availability through phased migration and dual-write strategy.

Technical Info

IndustryGeneral
EngagementHigh Storage and Compute Costs: Storing unstructured data in Snowflake proved 10x costlier, with Bronze and Silver layers increasing costs

Request this case study

Share your details and our team will follow up with the full story.

GET IN TOUCH

Start a Conversation that Drive Impact

Ready to accelerate your digital transformation? Our experts are here to help you navigate the future

Global Hubs

New Jersey
Austin
Atlanta
San Jose
Hyderabad