Blog
Insights, tutorials, and lessons learned from our platform engineering work.
Why AI Inference Is Creating a New Kind of Platform Engineer
Companies are hiring for AI inference platform work under a dozen different titles. The role exists - it just doesn't have a name yet. Here's what it looks like, why it's emerging, and why the label matters.
Most AI Platform Work Starts in the Wrong Layer
Most organisations building AI platforms start with the model layer and hope the infrastructure sorts itself out. It doesn't. Here's why starting with the platform layer produces better outcomes.
GPU Spend Is a Platform Problem, Not a Model Problem
GPU costs spiral when there's no platform governance. The fix isn't cheaper models - it's the same discipline platform teams already apply to CPU and memory: quotas, right-sizing, visibility, and accountability.
Your Kubernetes Platform Isn't Ready for AI Inference Workloads
Most Kubernetes platforms weren't designed for AI inference workloads. GPU scheduling, latency-sensitive serving, cost governance, and operational models all need rethinking before inference hits production.
GitOps Is the Right Model - But Not Before Your Platform Is Ready
GitOps gives you auditable, drift-free Kubernetes deployments. But scaled too early, it enforces inconsistency. A phased guide to GitOps adoption, repo structure, secrets management, and when to use Argo CD vs Flux.
There Is No Correct Platform Team Size
The right platform team size isn't a ratio. It's a function of how much complexity the platform carries. Here's why some small teams support hundreds of engineers while large teams struggle with fifty.
Nobody Decided to Have 100 Kubernetes Clusters
Kubernetes cluster sprawl is one of the most expensive problems in platform engineering. A decision framework for multi-cluster management, consolidation, and when a new cluster is actually justified.
How Platform Teams End Up With Six-Figure Observability Bills
A practical comparison of Datadog vs the open-source LGTM stack (Loki, Grafana, Tempo, Mimir). How observability costs spiral, what the migration looks like, and when open source is the right move.
If You Rotated Every Credential Today, Something Would Break
Secrets sprawl makes credential rotation dangerous instead of routine. A practical guide to consolidating Kubernetes secrets management, building a rotation strategy, and eliminating credentials scattered across five systems.
Platform Teams Don't Get Cut Because They're Not Valuable
Platform teams get cut because they can't prove value, not because they lack it. How product ownership drives adoption, the metrics that matter, and how to make platform engineering visible to leadership.
Welcome to the KubeWright Blog
Lessons from real platform engineering engagements - the decisions that matter, the mistakes that cost, and what actually works at scale.