API Priority and Fairness (APF) — Flow Schemas, Priority Levels, Rate Limiting
Tại Sao APF Quan Trọng
Control plane crash từ API overload là nightmare. Một workload misbehave (spam requests) có thể crash API server, khiến toàn bộ cluster down.
API Priority and Fairness giải quyết:
- Prevent cascade failure: Prioritize critical requests
- Fair allocation: Prevent single client starving others
- Predictable performance: SLA untuk different request types
Problem: Unbounded Request Rate
Tanpa rate limiting:
Buggy workload: 10,000 requests/sec
Good workload: 100 requests/sec
↓
API Server tries handle 10,100 req/sec
↓
Overload
↓
All requests become slow
↓
Cluster effectively frozenSolution: Prioritize & rate-limit by flow.
APF Architecture
Components
Incoming Request
↓
┌────────────────────────────────┐
│ Flow Classification │
│ Determine: which PriorityLevel?│
│ Determine: which flow? │
└────────────────────────────────┘
↓
┌────────────────────────────────┐
│ PriorityLevel Queue │
│ Allocate slots based on level │
└────────────────────────────────┘
↓
┌────────────────────────────────┐
│ FlowLevelTokenBucket │
│ Per-flow rate limiting │
└────────────────────────────────┘
↓
Either: Execute request or Reject (429)Priority Levels
Default priority levels:
yaml
system # Control plane components (highest)
leader-election # Etcd lease operations
workload-high # Important user workloads
workload-low # Best-effort workloads
catch-all # Everything else (lowest)Each level has allocation:
Total API capacity: 1000 req/sec
system: 150 req/sec (15%)
leader-election: 50 req/sec (5%)
workload-high: 500 req/sec (50%)
workload-low: 200 req/sec (20%)
catch-all: 100 req/sec (10%)Flow Schemas
What is a Flow?
Flow = grouping requests dari same source:
yaml
# Example: requests dari scheduler
flowSchema:
name: scheduler-flow
matchingPrecedence: 100
rules:
- subjects:
- kind: ServiceAccount
name: default
namespace: kube-system
resourceRules:
- verbs: ["get", "list", "watch"]
resources: ["pods"]
priorityLevel: "system" # Assign to system level
# Effect: All scheduler requests → "system" priority levelFlow Matching
Requests matched terhadap flow schemas dalam order (precedence):
Request (get pods):
├─ ServiceAccount: scheduler
├─ Verb: get
├─ Resource: pods
↓
Check FlowSchemas (in order):
├─ Schema 1: (no match)
├─ Schema 2: scheduler-flow (MATCH!)
├─ PriorityLevel: system
├─ Flow name: scheduler-get-pods
└─ Assign request to this flowToken Bucket Algorithm
Mechanism
Each flow has token bucket:
Flow: user-workload-update-pods
Bucket: 100 tokens
Refill rate: 10 tokens/sec
Time 0: Bucket full (100 tokens)
Request 1: -1 token (99 left)
Request 2: -1 token (98 left)
...
Request 100: -1 token (0 left)
Request 101: REJECTED (no tokens)
After 1 sec: +10 tokens (10 left)
Request 102: -1 token (9 left)Burst Allowance
Token buckets allow burst:
Refill rate: 10 tokens/sec
Burst size: 20 tokens
Sustained: 10 requests/sec
Burst: 20 requests/sec (limited duration)
After burst: Must wait for tokens to refillGKE APF Configuration
Check Current Config
bash
# List priority levels
kubectl get prioritylevel
# List flow schemas
kubectl get flowschema
# Check specific PriorityLevel
kubectl describe prioritylevel workload-highDefault PriorityLevels
yaml
# system - control plane + critical
system:
requestsPerSecond: 150
concurrentRequests: 30
handSize: 6
# leader-election - etcd leases
leader-election:
requestsPerSecond: 50
concurrentRequests: 10
# workload-high - important apps
workload-high:
requestsPerSecond: 500
concurrentRequests: 100Common Rejections & Debugging
Symptom: "429 Too Many Requests"
bash
# Check APF metrics
kubectl get --raw /metrics | grep apiserver_flowcontrol
# Typical metrics
apiserver_flowcontrol_request_concurrency_limit
apiserver_flowcontrol_request_queue_length
apiserver_flowcontrol_rejected_requests_totalRoot Cause Analysis
1. Check which flow is rejected
- Metrics labeled_by: flowSchema, priorityLevel
2. Identify request source
- Is it legitimate workload or leak?
3. Check allocation
- Is priority level under-provisioned?
4. Consider tuning
- Increase requestsPerSecond for level
- Adjust flow classificationSolution Patterns
Pattern 1: Workload Misbehave
yaml
# If buggy workload spamming, isolate to low priority
apiVersion: flowcontrol.apiserver.k8s.io/v1beta3
kind: FlowSchema
metadata:
name: buggy-workload
spec:
priorityLevelConfiguration:
name: workload-low
matchingPrecedence: 50
rules:
- subjects:
- kind: ServiceAccount
name: buggy-app
namespace: default
resourceRules:
- verbs: ["list"]
resources: ["pods"]Pattern 2: Under-provisioned Priority Level
yaml
# If workload-high consistently rejected, increase
apiVersion: flowcontrol.apiserver.k8s.io/v1beta3
kind: PriorityLevelConfiguration
metadata:
name: workload-high
spec:
type: Limited
limited:
requestsPerSecond: 1000 # increase dari 500
concurrentRequests: 200 # increase dari 100
handSize: 10Production Patterns
Pattern 1: Multi-Tenant Isolation
yaml
# Each tenant gets own flow
apiVersion: flowcontrol.apiserver.k8s.io/v1beta3
kind: FlowSchema
metadata:
name: tenant-a
spec:
priorityLevelConfiguration:
name: workload-high
matchingPrecedence: 80
rules:
- subjects:
- kind: ServiceAccount
serviceAccount:
namespace: tenant-a
name: default
---
apiVersion: flowcontrol.apiserver.k8s.io/v1beta3
kind: FlowSchema
metadata:
name: tenant-b
spec:
priorityLevelConfiguration:
name: workload-low
matchingPrecedence: 80
rules:
- subjects:
- kind: ServiceAccount
serviceAccount:
namespace: tenant-b
name: defaultPattern 2: Interactive vs Batch
yaml
# Interactive (low latency requirement)
apiVersion: flowcontrol.apiserver.k8s.io/v1beta3
kind: FlowSchema
metadata:
name: interactive
spec:
priorityLevelConfiguration:
name: workload-high
matchingPrecedence: 100
rules:
- subjects:
- kind: Group
group:
name: "system:authenticated"
resourceRules:
- verbs: ["get", "list"]
resources: ["pods"]
---
# Batch (throughput important)
apiVersion: flowcontrol.apiserver.k8s.io/v1beta3
kind: FlowSchema
metadata:
name: batch-jobs
spec:
priorityLevelConfiguration:
name: workload-low
matchingPrecedence: 50
rules:
- subjects:
- kind: ServiceAccount
namespace: jobs
name: batch-runner
resourceRules:
- verbs: ["create", "update", "list"]Monitoring APF
Key Metrics
apiserver_flowcontrol_rejected_requests_total
- Requests rejected due to rate limit
apiserver_flowcontrol_current_r_limit_exceeded_total
- Times flow exceeded resource limit
apiserver_flowcontrol_request_queue_length_after_enqueue
- Queue backlog
apiserver_flowcontrol_watch_count_samples
- Watch connections countAlerting
yaml
# Alert if rejections increasing
- alert: APIServerFlowControlRejections
expr: |
rate(apiserver_flowcontrol_rejected_requests_total[5m]) > 0.1
for: 5m
annotations:
summary: "API Server flow control rejections detected"Performance Impact
With APF
Request spike (100,000 req/sec):
├─ Requests classified into flows
├─ Tokens allocated fairly
├─ Excess requests queued atau rejected
├─ API server stable
└─ User-facing latency: increased but predictableWithout APF (legacy basic-auth style limit)
Same spike:
├─ All requests treated equally
├─ First ones succeed, later ones timeout
├─ API server might crash
└─ Complete cluster unavailabilityReference Documentation
Summary
- APF prevents overload: Fair allocation across flows
- Priority levels: system, workload-high, workload-low, catch-all
- Flow classification: Match requests by subject/resource/verb
- Token bucket: Burst allowance + sustained rate limiting
- Monitoring: Track rejections, queue depth
- Tuning: Adjust priority level capacity or flow classification