← All Case Studies
Media & Entertainment

Platform Modernisation & £450k Annual Savings

ITV

Led platform engineering for the ITV Common Platform hosting ITVX, achieving significant cost savings through infrastructure modernisation.

£485k Total annual cost savings
~100 EKS clusters managed
~30 Platform engineers supported
2000+ Monitoring checks migrated
Technologies
AWSEKSTerraformTerragruntPrometheusGrafanaLokiTempoGitHub ActionsJenkinsOpenTelemetrySlothApache DevLakePythonRuby

The Challenge

ITV’s Common Platform powers ITVX (formerly ITV Hub), ITV News, and various OTT projects. The platform had accumulated technical debt across several areas:

  • Legacy logging: Puppet-managed ELK stack with high operational overhead and licensing costs
  • Fragmented monitoring: Sensu/Uchiwa setup requiring manual configuration for over 2000 checks
  • Inefficient load balancing: Hundreds of Classic Load Balancers (CLBs) for individual services
  • Limited developer self-service: Engineers dependent on platform team for common infrastructure tasks

Our Approach

Logging Infrastructure Overhaul

We designed and implemented a migration from the legacy ELK stack to Loki hosted on EKS. This involved:

  • Architecting a scalable Loki deployment with appropriate retention policies
  • Developing migration tooling to ensure zero data loss during transition
  • Creating Grafana dashboards to maintain feature parity with Kibana
  • Training development teams on LogQL and the new observability stack

Monitoring Modernisation

Migrated from Sensu/Uchiwa to Prometheus/Alertmanager:

  • Converted 2000+ legacy checks to Prometheus recording rules and alerts
  • Implemented alerts for EKS and AWS services (RDS, Lambda, SQS)
  • Established self-service alerting through Kubernetes CRDs

Internal Developer Platform

Architected the Common Platform IDP including:

  • Opinionated Terraform component modules with built-in alerting
  • SLO framework using Sloth for standardised service level objectives
  • Jenkins pipelines enabling developer self-service for metrics (PrometheusRules), scripts (K8s CronJobs), and log-based alerts (LogQL)
  • Actions Runner Controller deployment for GitHub Actions self-hosted runners
  • Migration of internal CI/CD from Jenkins pipelines to GitHub Actions workflows

Distributed Tracing

Rolled out OpenTelemetry operator and OpenTelemetry Collector to enhance tracing capabilities across the platform, with a goal of simplifying metric scraping and log shipping.

Backup & Disaster Recovery

Architected and implemented an AWS Backup solution for RDS, S3, and DynamoDB across the entire AWS organisation, ensuring data protection and compliance.

Engineering Intelligence

  • Deployed and configured Apache DevLake to visualise platform DORA metrics for senior stakeholders
  • Automated collection of KPIs to track developer uptake of the IDP
  • Created Python scripts to automate Terraform state migration of services to the new IDP

Platform Operations

  • Performed standard EKS BAU including Helm chart upgrades, Kubernetes version upgrades, and vulnerability patching
  • Maintained and improved internal tooling in Ruby, Python, and Bash

Results

The platform modernisation delivered measurable business impact:

  • £450,000/year savings from retiring the legacy ELK stack
  • £35,000/year savings from consolidating CLBs to shared ALBs (completed in 3 weeks)
  • Improved reliability through standardised monitoring and alerting
  • Faster delivery with self-service infrastructure provisioning
  • Reduced toil for the platform team through automation