Autoscaling that works: HPA, VPA and Karpenter in practice

Autoscaling sounds like one feature. It is three, operating at different layers, and they will happily undermine each other if you wire them up without thinking. We run all three, but each owns a clear lane.

HPA: more pods, the one you start with

The Horizontal Pod Autoscaler adds and removes pod replicas based on a metric. CPU is the default and it is fine for a first pass, but most real services scale better on a custom or external metric: requests per second, queue depth, p95 latency. The trap is scaling on CPU for a service that is actually IO bound, where CPU stays flat while latency climbs.

Set a sane minReplicas (we rarely go below 2 or 3) so a cold scale up does not page you
Tune the stabilization window so the HPA does not flap on noisy metrics
Scale on the signal that actually correlates with pain, usually latency or queue depth, not raw CPU

VPA: right sizing requests, carefully

The Vertical Pod Autoscaler watches usage and recommends or sets CPU and memory requests. We almost never run it in Auto mode on the same workload as the HPA, because the two will fight: VPA shrinks the request, which changes the per pod CPU ratio the HPA reads. We use VPA in recommendation mode to find honest request values, then bake those into the manifests.

VPA in recommendation mode is the cheapest right sizing audit you will ever run. VPA in Auto mode next to an HPA is a fight you booked yourself.

Karpenter: the node layer

HPA and VPA move pods. Eventually you run out of room and need more nodes. Karpenter watches for unschedulable pods and provisions right sized nodes in seconds, then consolidates and tears them down when load drops. Compared to the older Cluster Autoscaler tied to fixed node groups, it picks instance types per workload and bin packs far tighter, which is where the cloud bill actually moves.

Let Karpenter choose from a broad instance family so it can find spare and cheaper capacity
Use consolidation to repack underused nodes, but set disruption budgets so it does not evict everything at once
Mix spot and on demand with a clear fallback for stateless work that tolerates interruption

How the three sit together

Our rule of thumb: HPA owns replica count, VPA informs request sizing offline, Karpenter owns nodes. Keep their jobs separate and they cooperate. Let them overlap and you will spend an afternoon explaining why a service scaled itself into a thrashing loop while the node count oscillated underneath it.

HPA: more pods, the one you start with

Set a sane minReplicas (we rarely go below 2 or 3) so a cold scale up does not page you

Tune the stabilization window so the HPA does not flap on noisy metrics

Scale on the signal that actually correlates with pain, usually latency or queue depth, not raw CPU

VPA: right sizing requests, carefully

VPA in recommendation mode is the cheapest right sizing audit you will ever run. VPA in Auto mode next to an HPA is a fight you booked yourself.

Karpenter: the node layer

Let Karpenter choose from a broad instance family so it can find spare and cheaper capacity

Use consolidation to repack underused nodes, but set disruption budgets so it does not evict everything at once

Mix spot and on demand with a clear fallback for stateless work that tolerates interruption

How the three sit together

Autoscaling that works: HPA, VPA and Karpenter in practice

HPA: more pods, the one you start with

VPA: right sizing requests, carefully

Karpenter: the node layer

How the three sit together

João Matos

Other notes from the team.

When you should NOT reach for Kubernetes

Choosing an ingress or gateway (and when the API Gateway pattern is overkill)

GitOps with Argo CD vs Flux: how we choose

Autoscaling that works: HPA, VPA and Karpenter in practice

HPA: more pods, the one you start with

VPA: right sizing requests, carefully

Karpenter: the node layer

How the three sit together

João Matos

Other notes from the team.

When you should NOT reach for Kubernetes

Choosing an ingress or gateway (and when the API Gateway pattern is overkill)

GitOps with Argo CD vs Flux: how we choose