Results
- Greenfield observability capability established across 40+ Kubernetes clusters - the organisation had no unified monitoring before this engagement
- 864 million metrics per day ingested through a purpose-built network telemetry platform processing ~10,000 datapoints per second, with long-term retention and horizontal scaling
- Developer self-service for dashboards and alerts through Kubernetes-native CRDs, removing the platform team as a bottleneck for observability configuration
- Resilience and alerting completeness built in from day one - not retrofitted after the fact
- Consolidated operating model replacing a fragmented mix of ClickOps, ad-hoc installs, and multiple competing CD tools with a single, standardised GitOps approach
- Unblocked self-hosting of critical third-party platforms (e.g. MuleSoft, Camunda) that the organisation had previously been unable to run reliably on their own infrastructure
- Technical authority for Kubernetes and cloud infrastructure across the platform team, owning architecture, roadmap, and delivery for the platform and observability domain
- Team capability uplift through systematic, hands-on upskilling - materially improving the team’s ability to operate the platform and fulfil service requests independently
The Problem
A major UK telecommunications infrastructure provider had invested in Kubernetes but was experiencing the operational problems that emerge when a multi-cluster estate grows without standardisation:
- No unified observability - teams had no visibility across the estate, making troubleshooting slow and incident response reactive rather than structured
- Fragmented deployment tooling - clusters had been set up through a mix of manual configuration, ad-hoc installs, and multiple competing CD tools with no consistent operating model
- Inconsistent cluster management - no standard approach to how clusters were provisioned, configured, or maintained, creating operational risk and making changes difficult
- Large-scale network telemetry requirements - the business needed to ingest and retain very high volumes of network metrics for operational visibility and capacity planning
What We Delivered
Centralised Observability Platform
Designed and operated an organisation-wide observability stack covering metrics, logs, and traces across the entire Kubernetes estate. The architecture was built for resilience and self-service from day one - engineering teams could define their own dashboards and alerts through Kubernetes-native configuration without waiting on the platform team. This gave the organisation production visibility it had never had before.
Network Telemetry at Scale
Built a purpose-designed telemetry platform to handle the ingestion of approximately 864 million metrics per day at around 10,000 datapoints per second. The architecture used controlled batching, horizontal scaling, and long-term retention strategies to meet both real-time operational needs and compliance requirements.
Platform Consolidation
Standardised the fragmented Kubernetes estate into a single GitOps operating model. This unified how clusters were managed, how services were deployed, and how changes were promoted - replacing a patchwork of manual processes and competing tools. This also unblocked the reliable self-hosting of complex third-party platforms that the organisation had previously been unable to run on their own infrastructure.
Technical Leadership and Knowledge Transfer
Served as the technical authority for Kubernetes and cloud infrastructure across the platform team. Through hands-on guidance and architectural decision-making, materially improved the team’s ability to operate the platform independently and fulfil service requests without external dependencies.