← All Case Studies
Biotech / Global Biotech Company

Internal Developer Platform & Observability Migration

Evolving a centralised Internal Developer Platform for a global biotech company - migrating from commercial observability tooling to an open-source stack, introducing frontend monitoring, and standardising platform operations across the board.

LGTM Full Grafana stack deployed
RUM Frontend observability enabled
Technologies
AWSEKSArgo CDTerraformGitHub ActionsPrometheusGrafanaLokiAlloyKarpenterOpenTelemetry

Results

  • Observability cost reduction by migrating from commercial tooling to an open-source Grafana LGTM stack - standardising metrics, logs, and traces across the platform
  • End-to-end visibility from browser to backend through real user monitoring integration, enabling correlation between frontend user experience and backend telemetry for the first time
  • Edge visibility through Cloudflare log ingestion and CloudWatch integration into the centralised observability platform
  • GitOps-driven self-service for dashboards and alerting - engineering teams configure observability through Kubernetes manifests, not tickets
  • Ephemeral, Kubernetes-backed CI runners replacing previous runner infrastructure, with secure access to internal AWS resources - unblocking IaC and platform CI/CD workflows
  • Reduced infrastructure costs by replacing static auto scaling groups with dynamic node provisioning, eliminating manual capacity tuning and enabling efficient spot instance consolidation
  • Eliminated cross-registry egress bottlenecks by migrating container image hosting to a same-region registry, further enabling cost-efficient node consolidation
  • Unified secrets management across the platform, replacing multiple ad-hoc approaches with a consistent, cloud-native secrets lifecycle
  • Simplified ingress architecture by consolidating multiple ingress controllers into a single load balancer controller, reducing operational complexity and unnecessary infrastructure
  • Consolidated platform documentation from fragmented sources into a single, maintainable source of truth

The Problem

A global biotech company had built a centralised Internal Developer Platform, but several areas needed improvement to support the organisation’s growing engineering needs:

  • High observability costs - commercial monitoring tooling was expensive and the organisation needed a more cost-effective approach without losing visibility
  • No frontend observability - no way to correlate user-facing issues with backend telemetry, making it difficult to diagnose problems that originated at the browser level
  • CI/CD limitations - existing runner infrastructure lacked secure access to internal AWS resources, blocking IaC and platform automation workflows
  • Static capacity management - fixed auto scaling groups required manual tuning, leading to over-provisioning or capacity shortfalls
  • Fragmented platform operations - multiple approaches to secrets management, ingress routing, and documentation had accumulated over time without standardisation

What We Delivered

Observability Re-architecture

Migrated the platform’s observability stack from commercial tooling to a fully open-source Grafana LGTM stack. Standardised collection of metrics, logs, and traces through a unified telemetry agent, with integrations for edge traffic and cloud service metrics. Introduced self-service dashboards and alerting through GitOps, removing the platform team as a bottleneck for observability configuration.

Extended observability to the user edge by introducing real user monitoring, enabling the organisation to correlate frontend performance with backend telemetry for the first time.

CI/CD and Infrastructure Optimisation

Implemented ephemeral, Kubernetes-backed CI runners with secure access to internal AWS resources, unblocking platform automation workflows. Replaced static capacity management with dynamic node provisioning, eliminating manual tuning and reducing infrastructure costs through efficient spot instance usage. Migrated container image hosting to remove cross-registry egress bottlenecks.

Platform Standardisation

Unified secrets management across the platform by replacing multiple ad-hoc approaches with a single, consistent lifecycle. Simplified ingress architecture by consolidating multiple controllers into one. Consolidated fragmented operational and architectural documentation into a single, maintainable source of truth.