The Challenge
MONY Group’s Data Platform organisation faced significant infrastructure challenges:
- Slow IaC execution: Terraform plan/apply operations were taking excessive time, slowing development velocity
- Unreliable CI/CD: VM-based GitHub runners were costly, difficult to maintain, and frequently failed
- Security concerns: Service Account JSON keys were used for GCP authentication, creating security risks
- Fragmented infrastructure: No consistent patterns for Terraform module usage or state management
Our Approach
IaC Rearchitecture
Introduced Terragrunt to modernise the IaC setup:
- State management: Centralised and standardised Terraform state configuration
- Provider management: Consistent provider versioning across all modules
- DRY configuration: Eliminated repetition through Terragrunt’s hierarchy
- Modular architecture: Introduced re-usable, versioned, opinionated Terraform modules
The migration required careful planning to move 3500+ existing Terraform resources to the new architecture without service disruption.
CI/CD Transformation
Implemented GitHub Actions Runner Controller (ARC) on GKE:
- Autoscaling runners: Kubernetes-native runners that scale based on workflow demand
- Replaced Docker builds: Migrated to Kaniko for secure, daemonless container builds
- Reduced costs: Eliminated always-on VM infrastructure
- Improved reliability: Ephemeral runners ensure clean, consistent build environments
Migrated 500+ existing workflows to the new runner infrastructure.
Security Improvements
Replaced Service Account JSON keys with GCP-GitHub Workload Identity Federation:
- Keyless authentication: No more long-lived credentials to manage or rotate
- Reduced attack surface: Eliminated risk of key exposure
- Audit trail: Better visibility into authentication events
Network Architecture
Designed and implemented GCP/AWS HA VPN architecture:
- Migrated workloads to use the new VPN setup
- Consolidated multiple VPNs, reducing complexity and cost
Results
- 65% faster Terraform plan/apply execution times
- 3500+ resources successfully migrated to new IaC architecture
- 500+ workflows running on reliable Kubernetes-based runners
- Improved security through keyless authentication with Workload Identity
- Reduced costs from VPN consolidation and ephemeral runner infrastructure
- Team enablement for ~15 Data Engineers on GCP services, IaC, and GKE