API Priority and Fairness (APF) — Flow Schemas, Priority Levels, Rate Limiting

Tại Sao APF Quan Trọng

Control plane crash từ API overload là nightmare. Một workload misbehave (spam requests) có thể crash API server, khiến toàn bộ cluster down.

API Priority and Fairness giải quyết:

Prevent cascade failure: Prioritize critical requests
Fair allocation: Prevent single client starving others
Predictable performance: SLA untuk different request types

Problem: Unbounded Request Rate

Tanpa rate limiting:

Buggy workload: 10,000 requests/sec
Good workload: 100 requests/sec
     ↓
API Server tries handle 10,100 req/sec
     ↓
Overload
     ↓
All requests become slow
     ↓
Cluster effectively frozen

Solution: Prioritize & rate-limit by flow.

APF Architecture

Components

Incoming Request
     ↓
┌────────────────────────────────┐
│ Flow Classification            │
│ Determine: which PriorityLevel?│
│ Determine: which flow?         │
└────────────────────────────────┘
     ↓
┌────────────────────────────────┐
│ PriorityLevel Queue            │
│ Allocate slots based on level  │
└────────────────────────────────┘
     ↓
┌────────────────────────────────┐
│ FlowLevelTokenBucket           │
│ Per-flow rate limiting         │
└────────────────────────────────┘
     ↓
Either: Execute request or Reject (429)

Priority Levels

Default priority levels:

yaml

system          # Control plane components (highest)
leader-election # Etcd lease operations
workload-high   # Important user workloads
workload-low    # Best-effort workloads
catch-all       # Everything else (lowest)

Each level has allocation:

Total API capacity: 1000 req/sec

system:         150 req/sec (15%)
leader-election: 50 req/sec (5%)
workload-high:  500 req/sec (50%)
workload-low:   200 req/sec (20%)
catch-all:       100 req/sec (10%)

Flow Schemas

What is a Flow?

Flow = grouping requests dari same source:

yaml

# Example: requests dari scheduler
flowSchema:
  name: scheduler-flow
  matchingPrecedence: 100
  rules:
  - subjects:
    - kind: ServiceAccount
      name: default
      namespace: kube-system
    resourceRules:
    - verbs: ["get", "list", "watch"]
      resources: ["pods"]
    priorityLevel: "system"  # Assign to system level

# Effect: All scheduler requests → "system" priority level

Flow Matching

Requests matched terhadap flow schemas dalam order (precedence):

Request (get pods):
  ├─ ServiceAccount: scheduler
  ├─ Verb: get
  ├─ Resource: pods
     ↓
Check FlowSchemas (in order):
  ├─ Schema 1: (no match)
  ├─ Schema 2: scheduler-flow (MATCH!)
     ├─ PriorityLevel: system
     ├─ Flow name: scheduler-get-pods
     └─ Assign request to this flow

Token Bucket Algorithm

Mechanism

Each flow has token bucket:

Flow: user-workload-update-pods

Bucket: 100 tokens
Refill rate: 10 tokens/sec

Time 0: Bucket full (100 tokens)
Request 1: -1 token (99 left)
Request 2: -1 token (98 left)
...
Request 100: -1 token (0 left)
Request 101: REJECTED (no tokens)

After 1 sec: +10 tokens (10 left)
Request 102: -1 token (9 left)

Burst Allowance

Token buckets allow burst:

Refill rate: 10 tokens/sec
Burst size: 20 tokens

Sustained: 10 requests/sec
Burst: 20 requests/sec (limited duration)

After burst: Must wait for tokens to refill

GKE APF Configuration

Check Current Config

bash

# List priority levels
kubectl get prioritylevel

# List flow schemas
kubectl get flowschema

# Check specific PriorityLevel
kubectl describe prioritylevel workload-high

Default PriorityLevels

yaml

# system - control plane + critical
system:
  requestsPerSecond: 150
  concurrentRequests: 30
  handSize: 6

# leader-election - etcd leases
leader-election:
  requestsPerSecond: 50
  concurrentRequests: 10

# workload-high - important apps
workload-high:
  requestsPerSecond: 500
  concurrentRequests: 100

Common Rejections & Debugging

Symptom: "429 Too Many Requests"

bash

# Check APF metrics
kubectl get --raw /metrics | grep apiserver_flowcontrol

# Typical metrics
apiserver_flowcontrol_request_concurrency_limit
apiserver_flowcontrol_request_queue_length
apiserver_flowcontrol_rejected_requests_total

Root Cause Analysis

1. Check which flow is rejected
   - Metrics labeled_by: flowSchema, priorityLevel
   
2. Identify request source
   - Is it legitimate workload or leak?
   
3. Check allocation
   - Is priority level under-provisioned?
   
4. Consider tuning
   - Increase requestsPerSecond for level
   - Adjust flow classification

Solution Patterns

Pattern 1: Workload Misbehave

yaml

# If buggy workload spamming, isolate to low priority
apiVersion: flowcontrol.apiserver.k8s.io/v1beta3
kind: FlowSchema
metadata:
  name: buggy-workload
spec:
  priorityLevelConfiguration:
    name: workload-low
  matchingPrecedence: 50
  rules:
  - subjects:
    - kind: ServiceAccount
      name: buggy-app
      namespace: default
    resourceRules:
    - verbs: ["list"]
      resources: ["pods"]

Pattern 2: Under-provisioned Priority Level

yaml

# If workload-high consistently rejected, increase
apiVersion: flowcontrol.apiserver.k8s.io/v1beta3
kind: PriorityLevelConfiguration
metadata:
  name: workload-high
spec:
  type: Limited
  limited:
    requestsPerSecond: 1000  # increase dari 500
    concurrentRequests: 200  # increase dari 100
    handSize: 10

Production Patterns

Pattern 1: Multi-Tenant Isolation

yaml

# Each tenant gets own flow
apiVersion: flowcontrol.apiserver.k8s.io/v1beta3
kind: FlowSchema
metadata:
  name: tenant-a
spec:
  priorityLevelConfiguration:
    name: workload-high
  matchingPrecedence: 80
  rules:
  - subjects:
    - kind: ServiceAccount
      serviceAccount:
        namespace: tenant-a
        name: default

---
apiVersion: flowcontrol.apiserver.k8s.io/v1beta3
kind: FlowSchema
metadata:
  name: tenant-b
spec:
  priorityLevelConfiguration:
    name: workload-low
  matchingPrecedence: 80
  rules:
  - subjects:
    - kind: ServiceAccount
      serviceAccount:
        namespace: tenant-b
        name: default

Pattern 2: Interactive vs Batch

yaml

# Interactive (low latency requirement)
apiVersion: flowcontrol.apiserver.k8s.io/v1beta3
kind: FlowSchema
metadata:
  name: interactive
spec:
  priorityLevelConfiguration:
    name: workload-high
  matchingPrecedence: 100
  rules:
  - subjects:
    - kind: Group
      group:
        name: "system:authenticated"
    resourceRules:
    - verbs: ["get", "list"]
      resources: ["pods"]

---
# Batch (throughput important)
apiVersion: flowcontrol.apiserver.k8s.io/v1beta3
kind: FlowSchema
metadata:
  name: batch-jobs
spec:
  priorityLevelConfiguration:
    name: workload-low
  matchingPrecedence: 50
  rules:
  - subjects:
    - kind: ServiceAccount
      namespace: jobs
      name: batch-runner
    resourceRules:
    - verbs: ["create", "update", "list"]

Monitoring APF

Key Metrics

apiserver_flowcontrol_rejected_requests_total
  - Requests rejected due to rate limit

apiserver_flowcontrol_current_r_limit_exceeded_total
  - Times flow exceeded resource limit

apiserver_flowcontrol_request_queue_length_after_enqueue
  - Queue backlog

apiserver_flowcontrol_watch_count_samples
  - Watch connections count

Alerting

yaml

# Alert if rejections increasing
- alert: APIServerFlowControlRejections
  expr: |
    rate(apiserver_flowcontrol_rejected_requests_total[5m]) > 0.1
  for: 5m
  annotations:
    summary: "API Server flow control rejections detected"

Performance Impact

With APF

Request spike (100,000 req/sec):
  ├─ Requests classified into flows
  ├─ Tokens allocated fairly
  ├─ Excess requests queued atau rejected
  ├─ API server stable
  └─ User-facing latency: increased but predictable

Without APF (legacy basic-auth style limit)

Same spike:
  ├─ All requests treated equally
  ├─ First ones succeed, later ones timeout
  ├─ API server might crash
  └─ Complete cluster unavailability

Reference Documentation

Summary

APF prevents overload: Fair allocation across flows
Priority levels: system, workload-high, workload-low, catch-all
Flow classification: Match requests by subject/resource/verb
Token bucket: Burst allowance + sustained rate limiting
Monitoring: Track rejections, queue depth
Tuning: Adjust priority level capacity or flow classification

API Priority and Fairness (APF) — Flow Schemas, Priority Levels, Rate Limiting ​

Tại Sao APF Quan Trọng ​

Problem: Unbounded Request Rate ​

APF Architecture ​

Components ​

Priority Levels ​

Flow Schemas ​

What is a Flow? ​

Flow Matching ​

Token Bucket Algorithm ​

Mechanism ​

Burst Allowance ​

GKE APF Configuration ​

Check Current Config ​

Default PriorityLevels ​

Common Rejections & Debugging ​

Symptom: "429 Too Many Requests" ​

Root Cause Analysis ​

Solution Patterns ​

Pattern 1: Workload Misbehave ​

Pattern 2: Under-provisioned Priority Level ​

Production Patterns ​

Pattern 1: Multi-Tenant Isolation ​

Pattern 2: Interactive vs Batch ​

Monitoring APF ​

Key Metrics ​

Alerting ​

Performance Impact ​

With APF ​

Without APF (legacy basic-auth style limit) ​

Reference Documentation ​

Summary ​

API Priority and Fairness (APF) — Flow Schemas, Priority Levels, Rate Limiting

Tại Sao APF Quan Trọng

Problem: Unbounded Request Rate

APF Architecture

Components

Priority Levels

Flow Schemas

What is a Flow?

Flow Matching

Token Bucket Algorithm

Mechanism

Burst Allowance

GKE APF Configuration

Check Current Config

Default PriorityLevels

Common Rejections & Debugging

Symptom: "429 Too Many Requests"

Root Cause Analysis

Solution Patterns

Pattern 1: Workload Misbehave

Pattern 2: Under-provisioned Priority Level

Production Patterns

Pattern 1: Multi-Tenant Isolation

Pattern 2: Interactive vs Batch

Monitoring APF

Key Metrics

Alerting

Performance Impact

With APF

Without APF (legacy basic-auth style limit)

Reference Documentation

Summary