Skip to content

Control Plane Scalability — Request Rate Limits, Watch Connection Limits, Burst Handling

Scale Limits

API Server Limits

Per API Server instance:
├─ Request rate: ~1000 requests/sec sustained
├─ Peak rate: ~5000 requests/sec (unsustainable)
├─ Concurrent connections: ~10,000 watch connections
├─ Object count: <100,000 objects cached
└─ Query complexity: Large result sets slow

etcd Limits

├─ Write rate: ~1000 writes/sec
├─ Object count: 1-10 million (depends size)
├─ Transaction size: <1MB per transaction
├─ Key count: Unlimited in theory
└─ Database size: Limited by disk

Watch Connection Limits

bash
# Check current watch count
kubectl get --raw /metrics | grep apiserver_registered_watchers

# GKE default: ~10,000 per API Server
# Typical: 4-8 API Server replicas
# Total capacity: ~40,000-80,000 watches

Optimization: ListWatch Batching

Instead of: 100 clients each with separate watch
Use: Shared watch with 100 handlers
Result: 1 upstream watch, 100 local subscribers

Burst Handling

Burst Spike Recovery

Scenario: Deployment scales 1000 Pods quickly

Time 0: kubectl scale deployment --replicas 1000
Time 1-5s: 1000 Pods created
     ├─ Scheduler: 1000 scheduling decisions
     ├─ kubelet: 1000 Pod starts
     └─ API Server: Processing 1000+ state changes

API Server queue fills:
├─ Buffer up to ~10,000 pending changes
├─ Older changes dropped (clients reconnect)
└─ Eventually stabilizes

Capacity Planning

Small Cluster (dev/test)

Nodes: 3
Pods: 100-500
Objects: 1,000-5,000
API Server: Single instance
etcd: 1-2 instances

Medium Cluster (staging)

Nodes: 50-100
Pods: 5,000-10,000
Objects: 50,000-100,000
API Server: 3-4 instances (HA)
etcd: 3 instances (quorum)

Large Cluster (production)

Nodes: 1,000+
Pods: 100,000+
Objects: 500,000-1,000,000
API Server: 8+ instances
etcd: 5-7 instances (distributed)

Failure at Scale

Symptom: API Latency Spike

Cause: High request rate → queue backlog
Recovery: Auto-scaling API Server, APF rejections
Timeline: 5-30 minutes recovery

Symptom: etcd Disk Full

Cause: Unbounded database growth
Recovery: Compaction, archiving old objects
Timeline: 1-2 hours (manual intervention)
Prevention: Monitor disk usage, set quotas

Workarounds for Overload

Workaround 1: Shard Cluster

Split into multiple clusters:

  • Cluster A: 50,000 objects
  • Cluster B: 50,000 objects
  • Cross-cluster service: Federation/MCS

Workaround 2: Dedicated Control Plane

Some GKE offerings: Dedicated control plane for exclusive use

Workaround 3: Rate Limiting on Client

go
// Implement client-side rate limiting
limiter := rate.NewLimiter(100, 10)  // 100/sec, burst 10

for item := range items {
    if err := limiter.Wait(ctx); err != nil {
        return err
    }
    client.Create(item)
}

Reference Documentation


Summary

  • API Server: ~1000 req/sec sustained, ~10k watch connections
  • etcd: ~1000 writes/sec, 1-10M object limit
  • Burst: Temporary overload handled via queuing
  • Capacity planning: Scale API Server/etcd for expected load
  • Workarounds: Sharding, federation, rate limiting