← All Posts

GitOps Is the Right Model - But Not Before Your Platform Is Ready

GitOps gives you auditable, drift-free Kubernetes deployments. But scaled too early, it enforces inconsistency. A phased guide to GitOps adoption, repo structure, secrets management, and when to use Argo CD vs Flux.

GitOpsKubernetesPlatform EngineeringArchitecture

GitOps is the right model.

The mistake is scaling it before the platform is ready.

Used well, GitOps gives you controlled, auditable changes, reproducible environments, and continuous drift correction. Used without enough standardisation, it does something else entirely: it continuously enforces inconsistency.

What GitOps Actually Promises

The core idea is simple. Your Git repository is the single source of truth for your infrastructure and application state. A reconciliation controller (Argo CD, Flux, or similar) continuously compares the desired state in Git with the actual state in your cluster and corrects any drift.

This gives you several things that are genuinely valuable:

  • Auditability - every change is a commit. Who changed what, when, and why is in the Git history.
  • Reproducibility - you can rebuild any environment from the repository alone. No snowflakes, no manual steps.
  • Drift correction - if someone makes an ad hoc change directly in the cluster, the controller reverts it. The repo always wins.
  • Review workflows - changes go through pull requests. You get code review on infrastructure changes for free.

These properties matter at scale. When you’re managing dozens of clusters and hundreds of services, having a single mechanism for change management is genuinely important.

The problem isn’t the model. It’s the timing.

Where GitOps Goes Wrong

GitOps fails when it’s scaled across teams before the platform underneath it is standardised. And this happens more often than teams realise.

Enforcing inconsistency at scale

If every team structures their repos differently, defines Helm values differently, and uses different deployment patterns - GitOps won’t standardise that. It will take local inconsistency and make it estate-wide inconsistency, continuously reconciled.

What you end up with is:

  • 30 application repos, each with a slightly different directory structure
  • Helm values files that define the same thing in different ways
  • Some teams using Kustomize, some using raw manifests, some using Helm
  • Environment promotion handled differently per team
  • The reconciliation controller faithfully enforcing all of it

GitOps hasn’t solved your consistency problem. It’s automated it.

Repo structure becomes the new bottleneck

Without clear standards, teams spend significant time debating how to structure their GitOps repositories. Should it be a monorepo or multi-repo? Where do environment overrides go? How do you handle shared configuration? What’s the branching strategy for promotions?

These are all legitimate questions. But if every team answers them differently, the platform team ends up supporting multiple patterns - which is the opposite of what GitOps was supposed to achieve.

Drift correction becomes a source of incidents

Drift correction is one of GitOps’ strongest features. It’s also the most dangerous if your desired state isn’t correct.

We’ve seen cases where:

  • A team committed a misconfiguration to Git. The controller dutifully applied it across multiple environments before anyone noticed.
  • An emergency fix was applied directly to a cluster. The controller reverted it within minutes, re-breaking production.
  • A shared base configuration was updated without understanding which applications inherited from it. Dozens of services were affected simultaneously.

When your reconciliation controller is working correctly, mistakes propagate faster than they would with manual deployments. That’s a feature when your desired state is right. It’s a serious risk when it isn’t.

A Maturity Model for GitOps Adoption

The pattern that works in production isn’t “adopt GitOps everywhere on day one.” It’s a phased approach that builds on standardisation.

Phase 1 - Platform components and cluster bootstrap

Start with the things the platform team controls directly: cluster add-ons, ingress controllers, monitoring agents, policy controllers, namespace provisioning. These are already standardised because one team owns them.

This is where GitOps delivers value immediately:

  • Cluster bootstrap from a single repository
  • Consistent add-on versions across all clusters
  • Drift correction on infrastructure components (not application workloads)
  • A proven reconciliation workflow before any application team touches it

At this phase, you’re building muscle memory. The platform team learns how the reconciliation controller behaves, how to handle secrets in GitOps workflows, and how to structure repositories for multi-cluster management.

Phase 2 - Prove patterns on a small number of services

Pick two or three application teams that are willing to adopt GitOps. Work with them closely to define:

  • A standard repo structure - where manifests live, how environments are separated, where overrides go
  • A deployment promotion model - how changes move from dev to staging to production
  • A values management pattern - what’s in the base, what’s per-environment, what’s per-team
  • A secrets strategy - how secrets are referenced in Git without being stored there (Sealed Secrets, External Secrets, SOPS)

The goal isn’t just to get these teams deployed via GitOps. It’s to produce a template that other teams can adopt without reinventing the structure.

Phase 3 - Standardise, then scale

Once you have a proven pattern from Phase 2, codify it:

  • A scaffolding tool or template repo that generates the correct structure for new services
  • Documentation that covers the common workflows (deploying, rolling back, promoting, handling emergencies)
  • Clear guidelines on what’s allowed to diverge and what isn’t
  • Guardrails that enforce the standard (admission policies, CI checks on repo structure)

Only then do you scale GitOps across the wider engineering organisation. At this point, new teams aren’t making structural decisions - they’re adopting a pattern that’s already proven.

Phase 4 - Operate and evolve

GitOps at scale requires ongoing platform work:

  • Monitoring reconciliation health - are controllers failing silently? Are sync times increasing? Are there resources stuck in a degraded state?
  • Managing controller upgrades - Argo CD and Flux both have their own upgrade lifecycles. At scale, this is non-trivial.
  • Handling emergency overrides - there needs to be a documented process for when someone needs to bypass GitOps temporarily (and a mechanism to detect and resolve the resulting drift).
  • Deprecating old patterns - as the standard evolves, old repository structures need to be migrated. This is the same lifecycle management challenge that applies to any platform capability.

Common Architectural Decisions

Monorepo vs multi-repo

There’s no universally correct answer, but there are trade-offs:

Monorepo (all manifests in one repository):

  • Easier to enforce consistency and review cross-cutting changes
  • Simpler to manage for the platform team
  • Can become unwieldy as the number of services grows
  • A single broken commit can affect multiple services

Multi-repo (each service has its own manifests):

  • Gives teams more autonomy over their deployment lifecycle
  • Scales better in terms of repository size and CI performance
  • Harder to enforce consistency - each repo can drift from the standard
  • More repositories for the platform team to support

A common middle ground is an app-of-apps pattern: a central repository defines which applications exist and points to their individual repositories. This gives you central visibility with distributed ownership.

Helm vs Kustomize vs raw manifests

Pick one and standardise on it. The specific choice matters less than consistency. Supporting all three means supporting three different templating models, three different override patterns, and three different debugging workflows.

For most organisations:

  • Helm works well when you need parameterised, reusable charts with complex value hierarchies
  • Kustomize works well when you want to patch base manifests per environment without templating complexity
  • Raw manifests work for simple cases but become painful as environment-specific configuration grows

Secrets in GitOps

Secrets can’t live in Git as plaintext. The common patterns:

  • Sealed Secrets - encrypt secrets client-side, store the encrypted version in Git, decrypt in-cluster. Simple but limited.
  • External Secrets Operator - reference secrets from an external backend (Vault, AWS Secrets Manager) via a Kubernetes custom resource stored in Git. The actual secret value never touches Git.
  • SOPS - encrypt specific values within YAML files. Works with various key management backends. More flexible than Sealed Secrets but more complex.

External Secrets Operator is the most common choice for production platforms because it integrates cleanly with existing secrets backends and doesn’t require managing encryption keys separately.

When Not to Use GitOps

GitOps isn’t always the right answer:

  • Very early-stage platforms where the deployment model is still being defined. Adopt GitOps for platform components, but don’t force it on application teams until you have a standard to offer them.
  • Workloads with extremely rapid iteration cycles where the Git commit, PR review, merge, sync loop adds unacceptable latency. Some teams need to deploy dozens of times per day and a direct push model is genuinely faster.
  • Environments where the reconciliation model creates more risk than it reduces - for example, when desired state is frequently wrong and drift correction causes more incidents than it prevents.

The test is simple: is GitOps making your deployments more reliable and auditable, or is it adding ceremony without improving outcomes?

The Takeaway

GitOps works best when it has something consistent to enforce. Scaling it before that consistency exists doesn’t create order - it automates disorder.

Start with platform components. Prove patterns with a small number of teams. Standardise the structure. Then scale.

The technology is mature. The tooling is production-grade. The failure mode isn’t GitOps itself - it’s adopting the model before the platform is ready to support it.

If your GitOps setup is enforcing inconsistency rather than correcting it, that’s a problem we can help with.