Control Plane Scalability — Request Rate Limits, Watch Connection Limits, Burst Handling
Scale Limits
API Server Limits
Per API Server instance:
├─ Request rate: ~1000 requests/sec sustained
├─ Peak rate: ~5000 requests/sec (unsustainable)
├─ Concurrent connections: ~10,000 watch connections
├─ Object count: <100,000 objects cached
└─ Query complexity: Large result sets slowetcd Limits
├─ Write rate: ~1000 writes/sec
├─ Object count: 1-10 million (depends size)
├─ Transaction size: <1MB per transaction
├─ Key count: Unlimited in theory
└─ Database size: Limited by diskWatch Connection Limits
bash
# Check current watch count
kubectl get --raw /metrics | grep apiserver_registered_watchers
# GKE default: ~10,000 per API Server
# Typical: 4-8 API Server replicas
# Total capacity: ~40,000-80,000 watchesOptimization: ListWatch Batching
Instead of: 100 clients each with separate watch
Use: Shared watch with 100 handlers
Result: 1 upstream watch, 100 local subscribersBurst Handling
Burst Spike Recovery
Scenario: Deployment scales 1000 Pods quickly
Time 0: kubectl scale deployment --replicas 1000
Time 1-5s: 1000 Pods created
├─ Scheduler: 1000 scheduling decisions
├─ kubelet: 1000 Pod starts
└─ API Server: Processing 1000+ state changes
API Server queue fills:
├─ Buffer up to ~10,000 pending changes
├─ Older changes dropped (clients reconnect)
└─ Eventually stabilizesCapacity Planning
Small Cluster (dev/test)
Nodes: 3
Pods: 100-500
Objects: 1,000-5,000
API Server: Single instance
etcd: 1-2 instancesMedium Cluster (staging)
Nodes: 50-100
Pods: 5,000-10,000
Objects: 50,000-100,000
API Server: 3-4 instances (HA)
etcd: 3 instances (quorum)Large Cluster (production)
Nodes: 1,000+
Pods: 100,000+
Objects: 500,000-1,000,000
API Server: 8+ instances
etcd: 5-7 instances (distributed)Failure at Scale
Symptom: API Latency Spike
Cause: High request rate → queue backlog
Recovery: Auto-scaling API Server, APF rejections
Timeline: 5-30 minutes recoverySymptom: etcd Disk Full
Cause: Unbounded database growth
Recovery: Compaction, archiving old objects
Timeline: 1-2 hours (manual intervention)
Prevention: Monitor disk usage, set quotasWorkarounds for Overload
Workaround 1: Shard Cluster
Split into multiple clusters:
- Cluster A: 50,000 objects
- Cluster B: 50,000 objects
- Cross-cluster service: Federation/MCS
Workaround 2: Dedicated Control Plane
Some GKE offerings: Dedicated control plane for exclusive use
Workaround 3: Rate Limiting on Client
go
// Implement client-side rate limiting
limiter := rate.NewLimiter(100, 10) // 100/sec, burst 10
for item := range items {
if err := limiter.Wait(ctx); err != nil {
return err
}
client.Create(item)
}Reference Documentation
Summary
- API Server: ~1000 req/sec sustained, ~10k watch connections
- etcd: ~1000 writes/sec, 1-10M object limit
- Burst: Temporary overload handled via queuing
- Capacity planning: Scale API Server/etcd for expected load
- Workarounds: Sharding, federation, rate limiting