Autoscaling sounds like one feature. It is three, operating at different layers, and they will happily undermine each other if you wire them up without thinking. We run all three, but each owns a clear lane.
HPA: more pods, the one you start with
The Horizontal Pod Autoscaler adds and removes pod replicas based on a metric. CPU is the default and it is fine for a first pass, but most real services scale better on a custom or external metric: requests per second, queue depth, p95 latency. The trap is scaling on CPU for a service that is actually IO bound, where CPU stays flat while latency climbs.
- Set a sane minReplicas (we rarely go below 2 or 3) so a cold scale up does not page you
- Tune the stabilization window so the HPA does not flap on noisy metrics
- Scale on the signal that actually correlates with pain, usually latency or queue depth, not raw CPU
VPA: right sizing requests, carefully
The Vertical Pod Autoscaler watches usage and recommends or sets CPU and memory requests. We almost never run it in Auto mode on the same workload as the HPA, because the two will fight: VPA shrinks the request, which changes the per pod CPU ratio the HPA reads. We use VPA in recommendation mode to find honest request values, then bake those into the manifests.
VPA in recommendation mode is the cheapest right sizing audit you will ever run. VPA in Auto mode next to an HPA is a fight you booked yourself.
Karpenter: the node layer
HPA and VPA move pods. Eventually you run out of room and need more nodes. Karpenter watches for unschedulable pods and provisions right sized nodes in seconds, then consolidates and tears them down when load drops. Compared to the older Cluster Autoscaler tied to fixed node groups, it picks instance types per workload and bin packs far tighter, which is where the cloud bill actually moves.
- Let Karpenter choose from a broad instance family so it can find spare and cheaper capacity
- Use consolidation to repack underused nodes, but set disruption budgets so it does not evict everything at once
- Mix spot and on demand with a clear fallback for stateless work that tolerates interruption
How the three sit together
Our rule of thumb: HPA owns replica count, VPA informs request sizing offline, Karpenter owns nodes. Keep their jobs separate and they cooperate. Let them overlap and you will spend an afternoon explaining why a service scaled itself into a thrashing loop while the node count oscillated underneath it.