How big should a platform team be?

There is no correct ratio. The right size depends on how much complexity the platform carries - how many clusters, deployment patterns, tools, and exceptions the team manages. Some small teams support hundreds of engineers because they have low variance. Large teams can struggle with fifty engineers if the platform is inconsistent.

What determines platform team size?

Complexity surface, not headcount ratio. The key factors are the number of supported deployment patterns, cluster count, tooling surface area, number of exceptions to standard paths, and how much self-service the platform provides versus manual support.

How do you reduce platform team load without hiring?

Reduce variance. Consolidate deployment patterns, standardise tooling, eliminate exceptions, and invest in self-service. Each exception removed and each pattern standardised reduces ongoing support load more than adding another engineer.

There Is No Correct Platform Team Size

On this page

Why Ratios Don’t Work
The Two Platform Team Archetypes
How to Evaluate Your Platform Team’s Size
Reducing the Need for Headcount
The Takeaway

There is no correct platform team size.

The question keeps coming up: “What is the right ratio of platform engineers to developers?” 1:10? 1:20? 1:50?

It’s the wrong question.

Platform teams don’t scale linearly with headcount. They scale with how standardised the platform is, how much variation they allow, and how much they own versus delegate.

We’ve seen small platform teams supporting hundreds of engineers. We’ve seen large platform teams struggling to support fifty. The difference wasn’t talent. It was design.

Why Ratios Don’t Work

The ratio model assumes that platform engineering effort scales proportionally with the number of engineers using the platform. It doesn’t.

A platform team supporting 200 engineers on a well-standardised platform with three deployment patterns, one observability stack, and clear self-service workflows might need five people. The same team supporting 50 engineers across six different deployment models, three monitoring tools, and constant ad hoc requests might need fifteen.

The variable isn’t how many people use the platform. It’s how much complexity the platform carries.

What actually drives platform team size

Variation - every additional pattern the platform supports requires ongoing maintenance, documentation, on-call coverage, and upgrade management. Two ingress controllers means two sets of runbooks, two upgrade lifecycles, two sets of edge cases to understand.

Manual intervention - if deploying a new service, creating a namespace, or provisioning infrastructure requires a platform engineer to do something manually, the team’s capacity is directly constrained by the number of requests.

Ownership breadth - a platform team that owns everything from CI pipelines to Kubernetes clusters to observability to developer tooling has a fundamentally different workload than one focused on infrastructure only.

Incident surface - more components, more clusters, and more variation mean more things that can break in more ways. On-call burden scales with the number of distinct failure modes, not the number of users.

Technical debt - legacy patterns that haven’t been deprecated, workarounds that became permanent, and systems that need replacement all consume platform engineering time without delivering new value.

The Two Platform Team Archetypes

In practice, most platform teams fall into one of two patterns:

The scaling team

This team has invested in standardisation and self-service. Their work looks like:

A small number of well-defined, well-documented patterns
Self-service workflows for common operations (new service, new environment, new namespace)
Golden paths that handle 90% of use cases without platform team involvement
Clear boundaries - the platform team owns the platform, not every team’s deployment problems
Automated guardrails (admission policies, CI checks) that enforce standards without human review

This team can grow their user base without proportionally growing their headcount. Adding 50 more engineers to the platform doesn’t meaningfully increase the team’s workload if those engineers are using the standard patterns.

The scaling team typically operates at ratios of 1:40 or higher.

The absorbing team

This team has become the place where operational complexity goes to live. Their work looks like:

Multiple ways to do the same thing, all of which are “supported”
Manual steps in common workflows that require a platform engineer
Frequent one-off requests that don’t fit existing patterns
Tribal knowledge instead of documentation
The platform team sits in the critical path for most changes

This team’s workload grows with every new user, every new service, and every new exception. They feel permanently understaffed because the demand is directly coupled to usage.

The absorbing team typically operates at ratios of 1:10 or worse - and still feels stretched.

How to Evaluate Your Platform Team’s Size

Rather than asking “how many people do we need?”, ask these questions:

How much of the platform team’s time is spent on recurring requests?

Track it for two weeks. If more than 30% of the team’s time goes to tickets, deployments, and ad hoc requests, the problem isn’t headcount - it’s a lack of self-service. Every recurring request is a missing automation.

How many supported patterns exist for common operations?

Count the number of distinct ways teams deploy services, manage configuration, handle secrets, and access infrastructure. Each additional pattern is a multiplier on operational load. If you have five ways to deploy a service, you have five times the maintenance, documentation, and on-call surface.

What would happen if you said no to exceptions for three months?

Most platform teams allow exceptions because saying no feels like obstruction. But exceptions compound. Each one adds a pattern to support, an edge case to handle, and a deviation from the standard.

If you hypothetically froze exceptions for three months, would the platform still serve 90% of use cases? If yes, the exceptions are adding cost without proportionate value. If no, the standard patterns have gaps that need addressing.

How often does the platform team sit in the critical path?

Map the common developer workflows - deploying a service, creating an environment, debugging a production issue. For each one, identify whether the platform team is required or optional.

Every workflow where the platform team is required is a bottleneck. The goal is to move the platform team out of the critical path for routine operations and into an enabling role for complex ones.

What’s the on-call burden?

If your platform team is getting paged frequently, the question isn’t whether you need more people on the rotation. It’s why the platform generates that many alerts. High on-call burden is usually a symptom of insufficient standardisation, missing automation, or unresolved reliability problems.

Reducing the Need for Headcount

The path to a smaller, more effective platform team isn’t cutting people. It’s reducing the complexity they need to manage.

Constrain variation ruthlessly

This is the single highest-leverage change a platform team can make. Fewer supported patterns means less maintenance, less documentation, less on-call surface, and less cognitive load.

Practically, this means:

Defining a standard deployment model and migrating teams to it
Deprecating legacy patterns with firm timelines
Saying no to exceptions unless there’s a genuine technical requirement that the standard cannot satisfy
Making the standard path easier than the exception path

Invest in self-service

Every operation that requires a platform engineer to perform manually is a scalability constraint. Common candidates for self-service:

New service provisioning
Environment creation
Namespace and resource quota management
Access and permissions management
Certificate provisioning
DNS record management

The implementation doesn’t need to be complex. A well-structured Terraform module with a PR-based workflow, a simple internal CLI tool, or even a documented kubectl command that teams can run themselves can eliminate a category of manual work.

Define clear ownership boundaries

Platform teams that own “everything infrastructure” inevitably absorb work that belongs elsewhere. Clear boundaries reduce scope creep:

The platform team owns the platform - the shared infrastructure, tooling, and standards
Application teams own their applications - deployment configuration, resource tuning, application-level monitoring
A clear interface between the two - the golden path, the self-service tools, the documentation

When a request comes in, the first question should be “does this belong to the platform, or to the team’s application?” Not every infrastructure-adjacent problem is a platform team problem.

Automate guardrails, not approvals

Policy enforcement shouldn’t require a human in the loop. Admission controllers, CI pipeline checks, and automated compliance scanning can enforce standards at the point of change rather than through manual review.

This shifts the platform team from a gatekeeper role to a standards-setting role. They define the rules; the automation enforces them.

The Takeaway

The right platform team size isn’t a number or a ratio. It’s a function of how much complexity the platform carries and how much of the team’s capacity is consumed by variation, manual work, and exception handling.

Well-designed platforms reduce the need for platform involvement over time. That’s how a small team supports a large organisation - not by working harder, but by designing a platform that requires less human intervention to operate.

If your platform team feels permanently understaffed, the answer probably isn’t more people. It’s less variance.

If you’re trying to figure out the right size and shape for your platform team, we can help you work through that.