Why AI Inference Is Creating a New Kind of Platform Engineer

Companies are hiring for AI inference platform work under a dozen different titles. The role exists - it just doesn't have a name yet. Here's what it looks like, why it's emerging, and why the label matters.

Most AI Platform Work Starts in the Wrong Layer

Most organisations building AI platforms start with the model layer and hope the infrastructure sorts itself out. It doesn't. Here's why starting with the platform layer produces better outcomes.

GPU Spend Is a Platform Problem, Not a Model Problem

GPU costs spiral when there's no platform governance. The fix isn't cheaper models - it's the same discipline platform teams already apply to CPU and memory: quotas, right-sizing, visibility, and accountability.

Your Kubernetes Platform Isn't Ready for AI Inference Workloads

Most Kubernetes platforms weren't designed for AI inference workloads. GPU scheduling, latency-sensitive serving, cost governance, and operational models all need rethinking before inference hits production.

GitOps Is the Right Model - But Not Before Your Platform Is Ready

GitOps gives you auditable, drift-free Kubernetes deployments. But scaled too early, it enforces inconsistency. A phased guide to GitOps adoption, repo structure, secrets management, and when to use Argo CD vs Flux.

There Is No Correct Platform Team Size

The right platform team size isn't a ratio. It's a function of how much complexity the platform carries. Here's why some small teams support hundreds of engineers while large teams struggle with fifty.

Nobody Decided to Have 100 Kubernetes Clusters

Kubernetes cluster sprawl is one of the most expensive problems in platform engineering. A decision framework for multi-cluster management, consolidation, and when a new cluster is actually justified.

How Platform Teams End Up With Six-Figure Observability Bills

A practical comparison of Datadog vs the open-source LGTM stack (Loki, Grafana, Tempo, Mimir). How observability costs spiral, what the migration looks like, and when open source is the right move.

If You Rotated Every Credential Today, Something Would Break

Secrets sprawl makes credential rotation dangerous instead of routine. A practical guide to consolidating Kubernetes secrets management, building a rotation strategy, and eliminating credentials scattered across five systems.

Platform Teams Don't Get Cut Because They're Not Valuable

Platform teams get cut because they can't prove value, not because they lack it. How product ownership drives adoption, the metrics that matter, and how to make platform engineering visible to leadership.

Welcome to the KubeWright Blog

Lessons from real platform engineering engagements - the decisions that matter, the mistakes that cost, and what actually works at scale.