Results
- 65% reduction in Terraform plan/apply times by rearchitecting the IaC setup and introducing proper state and provider management
- 3500+ Terraform resources successfully migrated to the new architecture without service disruption
- 500+ CI workflows migrated from unreliable, costly VM-based runners to autoscaling Kubernetes-backed runners
- Eliminated long-lived credentials by replacing service account JSON keys with keyless workload identity federation, removing a significant security risk
- Reduced VPN costs and complexity by redesigning the GCP/AWS HA VPN architecture, consolidating multiple VPNs and migrating workloads to the new setup
- Introduced IaC best practices including reusable, versioned, opinionated infrastructure modules - giving the team a foundation they could build on independently
- ~15 Data Engineers enabled on GCP services, IaC patterns, and Kubernetes deployments through hands-on support and knowledge transfer
The Problem
The Data Platform organisation within a financial services company had infrastructure that was holding the team back:
- Slow Terraform execution - plan and apply operations were taking excessive time, slowing development velocity and making iteration painful
- Unreliable CI/CD - VM-based runners were costly, difficult to maintain, and frequently failed, with no autoscaling capability
- Security risk from long-lived credentials - service account JSON keys were used for GCP authentication, creating exposure risk with no rotation strategy
- No consistent IaC patterns - Terraform usage had grown organically without module reuse, state management standards, or provider versioning - making changes risky and time-consuming
- Expensive and complex networking - multiple VPN connections between GCP and AWS without consolidation
What We Delivered
IaC Rearchitecture
Introduced a modern IaC architecture with proper state management, provider versioning, and DRY configuration patterns. Established reusable, versioned, opinionated infrastructure modules that the team could adopt as a standard. Planned and executed the migration of 3500+ existing Terraform resources to the new architecture without service disruption - reducing plan/apply times by 65%.
CI/CD Transformation
Replaced unreliable VM-based runners with autoscaling, Kubernetes-native CI runners on GKE. Migrated 500+ existing workflows to the new infrastructure, replacing traditional container builds with secure, daemonless builds. The result was lower cost, higher reliability, and clean ephemeral build environments for every workflow run.
Security Improvements
Eliminated long-lived service account credentials by implementing keyless workload identity federation between GCP and GitHub. This removed the risk of key exposure, eliminated credential rotation overhead, and improved the audit trail for authentication events.
Network Architecture
Redesigned the GCP/AWS VPN architecture for high availability, consolidating multiple VPN connections and migrating workloads to the new setup - reducing both complexity and cost.