The Challenge
Novonesis, a global biotech leader, needed to evolve their Internal Developer Platform to:
- Reduce observability costs while maintaining comprehensive visibility
- Improve CI/CD reliability with secure access to internal AWS resources
- Optimise infrastructure costs through better capacity management
- Standardise operations across fragmented documentation and tooling
Our Approach
Observability Migration
Re-architected platform observability by migrating from Datadog to a Grafana LGTM stack:
- Grafana Alloy for standardised collection of metrics, logs, and traces
- Cloudflare log ingestion for edge visibility
- CloudWatch integration for AWS service metrics
- GitOps-driven self-service dashboards and alerting via Kubernetes manifests
Extended observability to the user edge by integrating Grafana Faro for Real User Monitoring (RUM), enabling correlation between frontend user experience and backend telemetry.
CI/CD Improvements
Implemented GitHub Actions Runner Controller (ARC):
- Ephemeral, Kubernetes-backed CI runners
- Secure access to internal AWS resources
- Unblocked IaC and platform CI/CD workflows
- Reduced runner maintenance overhead
Infrastructure Optimisation
Introduced Karpenter for dynamic node provisioning:
- Replaced static Auto Scaling Groups
- Eliminated manual capacity tuning
- Improved cluster utilisation
- Reduced infrastructure costs through efficient spot instance consolidation
Migrated container image hosting from GHCR to ECR:
- Removed cross-registry egress bottlenecks
- Enabled efficient Karpenter-driven consolidation
Platform Standardisation
- Unified secrets management by standardising on External Secrets, replacing multiple ad-hoc approaches
- Simplified ingress architecture by consolidating multiple ingress controllers into AWS Load Balancer Controller
- Standardised documentation by introducing MkDocs and consolidating fragmented docs into a single source of truth
Results
- Cost-effective observability through migration from commercial tooling to open-source Grafana stack
- End-to-end visibility from user browser to backend services via RUM integration
- Reliable CI/CD with ephemeral runners and secure AWS access
- Reduced infrastructure costs through Karpenter and spot instance optimisation
- Consistent operations with unified secrets, ingress, and documentation