Skip to content

Kiến Trúc Control Plane Components — API Server, Scheduler, Controller-Manager, Cloud-Controller-Manager

Tại Sao Cần Hiểu Từng Component

Khi container không khởi động, hoặc pod bị stuck pending, hoặc service không được reconcile — vấn đề có thể ở bất kỳ component nào. Hiểu role của từng thành phần giúp:

  1. Narrow down root cause — Know where to look in logs
  2. Predict failure modes — Biết component nào fail → impact gì
  3. Plan resource allocation — Control plane components cần CPU/memory khác nhau
  4. Design HA patterns — Biết dependencies để tránh single point of failure

High-Level Architecture

┌──────────────────────────────────────────────────┐
│              Control Plane Nodes                  │
│  (Replicated across 3 zones for HA)              │
├──────────────────────────────────────────────────┤
│  ┌─────────────┐  ┌──────────────┐  ┌──────────┐ │
│  │  API Server │  │  Scheduler   │  │Controller│ │
│  │  (multiple) │  │ (leader el.) │  │ Manager  │ │
│  └─────────────┘  └──────────────┘  └──────────┘ │
│        │                                   │      │
│        └────────────────┬───────────────────┘      │
│                         │                          │
│                    ┌────▼─────┐                   │
│                    │   etcd    │                   │
│                    │(consensus)│                   │
│                    └───────────┘                   │
│                                                   │
│  ┌────────────────────────────────────────────┐  │
│  │  Cloud-Controller-Manager                  │  │
│  │  (GCP-specific reconcilers)                │  │
│  └────────────────────────────────────────────┘  │
└──────────────────────────────────────────────────┘


        ┌──────────────┴──────────────┐
        │                             │
   ┌────▼─────────┐            ┌─────▼──────┐
   │  Worker      │            │  Worker    │
   │  Nodes       │            │  Nodes     │
   │              │            │            │
   │ ┌──────────┐ │            │┌──────────┐│
   │ │ kubelet  │ │            ││ kubelet  ││
   │ │(runs     │ │            ││(watches  ││
   │ │pods)     │ │            ││control   ││
   │ └──────────┘ │            ││plane)    ││
   └──────────────┘            └┴──────────┘

1. API Server — The Heart of Kubernetes

Role dan Responsibilities

API Server là entry point duy nhất cho tất cả Kubernetes operations:

  • REST endpoint mà tất cả clients (kubectl, controllers, kubelets) gọi
  • State storage gateway — write/read từ etcd
  • Request validation — syntax checking, schema validation
  • Admission control — webhooks, mutations
  • Watch provider — streaming changes cho clients

Lifecycle

Mỗi request đi qua API Server theo path này:

1. TLS Termination

2. Authentication (certificate/token/OIDC)

3. Authorization (RBAC, ACL)

4. Admission (webhooks, policies)

5. etcd Write/Read

6. Response Serialization

7. Stream/Return to Client

Detailed di Chapter 11 — tapi important tahu overall flow.

Configuration yang Bisa Tune (GKE Context)

Di GKE, API Server configuration limited, tapi beberapa option ada:

bash
# Membuat cluster dengan custom API server flag (GKE limited support)
gcloud container clusters create my-cluster \
  --enable-client-certificate=false \
  --disable-basic-auth \
  --enable-enable-authentication

# Checking API server audit
gcloud container clusters update my-cluster \
  --enable-cloud-logging

Common Failure Modes

SymptomPossible CauseDebug
502 / 503 errors dari kubectlAPI Server overload atau crashCheck GCP Cloud Logging
Watch connection dropsAPI Server restart atau etcd issueCheck watch reconnect logs
Slow API responsesAdmission webhooks timeoutCheck webhook latency
Certificate errorsCA rotation race conditionCheck kube-apiserver logs

Performance Characteristics

  • Request rate limit: ~1000s req/sec per server (depends load)
  • Watch connections: Max few thousands concurrent watches per server
  • Burst capacity: Limited by etcd backend latency
  • Network bandwidth: Data size matters (large response objects slow down)

2. Scheduler — Pod Placement Decision Maker

Role dan Responsibilities

Scheduler memutuskan Pod mana berjalan di Node mana. Ini bukan trivial decision.

Input: Pending Pod

    ┌────────────────────────────────┐
    │ Filtering Phase                │
    │ - Sufficient resources?        │
    │ - Node affinity/anti-affinity? │
    │ - Taints & tolerations OK?     │
    │ - PVC bindings available?      │
    └────────────────────────────────┘
         ↓ (reduced node set)
    ┌────────────────────────────────┐
    │ Scoring Phase                  │
    │ - Resource utilization         │
    │ - Affinity preferences         │
    │ - Image locality (optimize)    │
    │ - Other plugin scores          │
    └────────────────────────────────┘
         ↓ (ranked nodes)
    ┌────────────────────────────────┐
    │ Binding Phase                  │
    │ - Update etcd with Pod node    │
    │ - Notify kubelet to launch     │
    └────────────────────────────────┘

    Output: Pod bound to Node

Leader Election

Scheduler runs dạng single active instance (others are standbys):

yaml
# Only one scheduler actively scheduling at time
kubectl get pods -n kube-system -l component=kube-scheduler
# Sẽ see 3 replicas, nhưng chỉ 1 leader

# Check leader
kubectl get lease -n kube-system kube-scheduler -o yaml

Implication:

  • Jika leader scheduler crash, new leader elected dalam ~5-15 seconds
  • Pods tidak bisa scheduled selama transitional period
  • Leader election based on etcd lease mechanism

Scheduling Queues

Scheduler maintains queue của pending pods:

┌─────────────┐
│  Pending    │  Active queue —
│   Pods      │  Pods under scheduling
├─────────────┤
│ Back-off    │  Retry pods yang sebelum
│   Pods      │  scheduling gagal
├─────────────┤
│ Unschedulable├  Pods yang last attempt failed
│   Pods      │  with permanent reasons
└─────────────┘

Pods bergerak between queues based:

  • Retry exponential backoff (prevent scheduler thrashing)
  • Event-based triggers (new node → requeue unschedulable)

Common Failure Modes

SymptomPossible CauseDebug
Pod stuck in PendingInsufficient resources atau node selector mismatchkubectl describe pod shows pending reason
Pods scheduled unevenlyScore plugins misconfiguredCheck scheduler logs
Scheduling delays (10s+)etcd latency atau webhook delaysMonitor scheduler latency metrics
Preemption thrashingPriorityClass misconfigurationCheck preempted pod patterns

Production Considerations

Scheduling latency SLO: Target < 5s untuk 99th percentile (dari pod creation → scheduled)

bash
# Monitor scheduling latency (GKE exposes this metric)
kubectl top nodes  # simplified view

# More detailed: check Prometheus metrics
# kube_pod_info{condition="ready"}
# scheduler_scheduling_latency_seconds

3. Controller-Manager — Reconciliation Engine

Role dan Responsibilities

Controller-Manager runs collection of controllers yang continuously reconcile state:

Expected State (YAML)

   Reconciler

   Actual State

   Compare ≠? 

   Take Action (create, update, delete resources)

   Loop back every N seconds

Built-in Controllers (Partial List)

ControllerReconcilesAction
DeploymentDesired → actual PodsCreates/updates ReplicaSet
ReplicaSetDesired → actual Pod countCreates/deletes Pods
StatefulSetOrdered Pods, stable identitiesManages pod lifecycle + order
DaemonSetPod on every nodeSchedules pods per node
JobRun-to-completionCreates Pods, tracks completion
ServiceEndpoint discoveryUpdates endpoints as Pods change
PersistentVolumeStorage bindingClaims → volumes

Reconciliation Loop Pattern

Semua controller mengikuti level-triggered design:

go
// Pseudo-code dari setiap reconciliation loop
for {
    // Get desired state dari YAML (etcd)
    desired := getDesiredState(namespace, name)
    
    // Get current state dari cluster
    actual := getCurrentState(namespace, name)
    
    // Compare
    if desired != actual {
        // Take action ke actual state match desired
        takeCorrectiveAction()
    }
    
    // Sleep, wait for next reconciliation
    sleep(resyncInterval)  // ~2-15 minutes depending controller
}

Advantage: Robust ke missed events, eventuelle consistency guaranteed

Disadvantage: Latency antara desired→actual bisa minutes

Leader Election for Reconcilers

Seperti scheduler, hanya satu controller-manager instance aktif:

bash
kubectl get lease -n kube-system kubecontroller-manager

Ini mencegah race conditions (multiple instances trying reconcile same resource).

Common Failure Modes

SymptomPossible CauseDebug
Deployments tidak scalingReplicaSet controller issueCheck controller-manager logs
StatefulSet Pods out of orderOrdering logic bug atau concurrent updatesCheck StatefulSet ordinals
PVCs tidak bindingPersistentVolume controller issueCheck PVC status
Stuck finalizersController crashed before cleanupManual intervention needed

Production Considerations

Reconciliation latency: Typical 30-60 seconds (loop cycle)

  • Deployment created → kubelet sees Pod spec → Pod starts ≈ 5-10s
  • Pod deleted → controller loop recognizes → finalizers run ≈ 10-30s
  • Custom controller performance depends implementation

4. Cloud-Controller-Manager — GCP-Specific Reconcilers

Role dan Responsibilities

Cloud-Controller-Manager adalah GKE-specific component yang reconcile cloud resources:

ResourceControllerAction
Service (type: LoadBalancer)Service controllerCreates GCP Load Balancer
IngressIngress controllerCreates GCP HTTP LB
PersistentVolumeVolume controllerProvisions GCP Disks
NodeNode controllerSyncs node state dengan GCP

Example: Service LoadBalancer Reconciliation

yaml
# User creates LoadBalancer service
apiVersion: v1
kind: Service
metadata:
  name: my-service
spec:
  type: LoadBalancer  # ← triggers cloud-controller
  ports:
  - port: 80
    targetPort: 8080
  selector:
    app: myapp

# Cloud-controller-manager:
# 1. Watches Service resource
# 2. Sees type: LoadBalancer
# 3. Calls GCP APIs: create Load Balancer, Backend Service
# 4. Assigns external IP
# 5. Updates Service status.loadBalancer.ingress[].ip

GKE-Specific Behavior

Cloud-Controller-Manager handles:

  • Node taints — Autopilot vs Standard taints
  • Network routes — Managing VPC routes
  • Service IP allocation — ClusterIP assignment
  • Persistent Volume provisioning — Disk creation

Leader Election

Like scheduler/controller-manager, CCM runs in HA:

bash
kubectl get lease -n kube-system cloud-controller-manager

Common Failure Modes

SymptomPossible CauseDebug
Service stuck pending (LoadBalancer type)CCM not running atau GCP API throttleCheck CCM logs
External IPs not assignedNetwork quota exhaustedCheck GCP quota
PersistentVolumes not provisioningGCP Disk quota atau regional constraintCheck CCM logs

Inter-Component Communication

Dependency Chain

API Server ← (all components watch)

    ├→ Scheduler (watches unscheduled Pods)
    ├→ Controller-Manager (watches all resources)
    └→ Cloud-Controller-Manager (watches cloud-specific resources)
    
All write back via API Server

Watch Mechanism

Components don't poll etcd. Mereka gunakan watch API:

bash
# Under the hood, every controller does something like:
watch --resource=pods --field-selector=status.phase=Pending
# Streams events: ADDED, MODIFIED, DELETED

# This is HUGE performance difference vs polling

Implication: Jika watch connection drops, component misses events untuk few seconds, sampe reconnect.


Scaling Control Plane Components

API Server Scaling

GKE auto-scales API Server instances based on:

  • Request rate
  • Concurrent connections
  • etcd throughput

You cannot manually add API Server replicas — Google manages this.

Controller-Manager Scaling

Controllers run single-instance active (others hot-standby). Cannot scale horizontally beyond 1 active.

Workaround untuk custom controllers: Run separate control-plane outside GKE (but enterprise-grade setup).

Scheduler Scaling

Similarly single-instance active. Bottleneck untuk scheduling rate itu scheduler instance compute resources.


Monitoring Control Plane Components

GKE expose metrics via Prometheus endpoint:

bash
# API server latency
kube_apiserver_request_duration_seconds

# Scheduler latency
scheduler_scheduling_attempt_duration_seconds
scheduler_e2e_scheduling_latency_seconds

# Controller-manager work queue depth
workqueue_depth

# Cloud-controller-manager
cloudprovider_googleapis_com_calls_total

Enable control plane metrics:

bash
gcloud container clusters update my-cluster \
  --enable-cloud-logging \
  --logging-service logging.googleapis.com

Reference Dokumentasi

Informasi teknis di section ini dari:


Summary

  • API Server: Entry point untuk semua Kubernetes operations, gateway untuk etcd
  • Scheduler: Makes Pod → Node placement decisions, runs single-instance active (standby replicas)
  • Controller-Manager: Runs reconciliation loops untuk Kubernetes resources (Deployments, StatefulSets, Services, etc.)
  • Cloud-Controller-Manager: GCP-specific reconcilers (LoadBalancer Services, PV provisioning, Node sync)
  • Semua components communicate via etcd dan watch API
  • GKE manages scaling, HA, updates — bukan customer responsibility
  • Hiểu component roles membantu debug production issues lebih systematic