Kiến Trúc Control Plane Components — API Server, Scheduler, Controller-Manager, Cloud-Controller-Manager
Tại Sao Cần Hiểu Từng Component
Khi container không khởi động, hoặc pod bị stuck pending, hoặc service không được reconcile — vấn đề có thể ở bất kỳ component nào. Hiểu role của từng thành phần giúp:
- Narrow down root cause — Know where to look in logs
- Predict failure modes — Biết component nào fail → impact gì
- Plan resource allocation — Control plane components cần CPU/memory khác nhau
- Design HA patterns — Biết dependencies để tránh single point of failure
High-Level Architecture
┌──────────────────────────────────────────────────┐
│ Control Plane Nodes │
│ (Replicated across 3 zones for HA) │
├──────────────────────────────────────────────────┤
│ ┌─────────────┐ ┌──────────────┐ ┌──────────┐ │
│ │ API Server │ │ Scheduler │ │Controller│ │
│ │ (multiple) │ │ (leader el.) │ │ Manager │ │
│ └─────────────┘ └──────────────┘ └──────────┘ │
│ │ │ │
│ └────────────────┬───────────────────┘ │
│ │ │
│ ┌────▼─────┐ │
│ │ etcd │ │
│ │(consensus)│ │
│ └───────────┘ │
│ │
│ ┌────────────────────────────────────────────┐ │
│ │ Cloud-Controller-Manager │ │
│ │ (GCP-specific reconcilers) │ │
│ └────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────┘
│
┌──────────────┴──────────────┐
│ │
┌────▼─────────┐ ┌─────▼──────┐
│ Worker │ │ Worker │
│ Nodes │ │ Nodes │
│ │ │ │
│ ┌──────────┐ │ │┌──────────┐│
│ │ kubelet │ │ ││ kubelet ││
│ │(runs │ │ ││(watches ││
│ │pods) │ │ ││control ││
│ └──────────┘ │ ││plane) ││
└──────────────┘ └┴──────────┘1. API Server — The Heart of Kubernetes
Role dan Responsibilities
API Server là entry point duy nhất cho tất cả Kubernetes operations:
- REST endpoint mà tất cả clients (kubectl, controllers, kubelets) gọi
- State storage gateway — write/read từ etcd
- Request validation — syntax checking, schema validation
- Admission control — webhooks, mutations
- Watch provider — streaming changes cho clients
Lifecycle
Mỗi request đi qua API Server theo path này:
1. TLS Termination
↓
2. Authentication (certificate/token/OIDC)
↓
3. Authorization (RBAC, ACL)
↓
4. Admission (webhooks, policies)
↓
5. etcd Write/Read
↓
6. Response Serialization
↓
7. Stream/Return to ClientDetailed di Chapter 11 — tapi important tahu overall flow.
Configuration yang Bisa Tune (GKE Context)
Di GKE, API Server configuration limited, tapi beberapa option ada:
# Membuat cluster dengan custom API server flag (GKE limited support)
gcloud container clusters create my-cluster \
--enable-client-certificate=false \
--disable-basic-auth \
--enable-enable-authentication
# Checking API server audit
gcloud container clusters update my-cluster \
--enable-cloud-loggingCommon Failure Modes
| Symptom | Possible Cause | Debug |
|---|---|---|
| 502 / 503 errors dari kubectl | API Server overload atau crash | Check GCP Cloud Logging |
| Watch connection drops | API Server restart atau etcd issue | Check watch reconnect logs |
| Slow API responses | Admission webhooks timeout | Check webhook latency |
| Certificate errors | CA rotation race condition | Check kube-apiserver logs |
Performance Characteristics
- Request rate limit: ~1000s req/sec per server (depends load)
- Watch connections: Max few thousands concurrent watches per server
- Burst capacity: Limited by etcd backend latency
- Network bandwidth: Data size matters (large response objects slow down)
2. Scheduler — Pod Placement Decision Maker
Role dan Responsibilities
Scheduler memutuskan Pod mana berjalan di Node mana. Ini bukan trivial decision.
Input: Pending Pod
↓
┌────────────────────────────────┐
│ Filtering Phase │
│ - Sufficient resources? │
│ - Node affinity/anti-affinity? │
│ - Taints & tolerations OK? │
│ - PVC bindings available? │
└────────────────────────────────┘
↓ (reduced node set)
┌────────────────────────────────┐
│ Scoring Phase │
│ - Resource utilization │
│ - Affinity preferences │
│ - Image locality (optimize) │
│ - Other plugin scores │
└────────────────────────────────┘
↓ (ranked nodes)
┌────────────────────────────────┐
│ Binding Phase │
│ - Update etcd with Pod node │
│ - Notify kubelet to launch │
└────────────────────────────────┘
↓
Output: Pod bound to NodeLeader Election
Scheduler runs dạng single active instance (others are standbys):
# Only one scheduler actively scheduling at time
kubectl get pods -n kube-system -l component=kube-scheduler
# Sẽ see 3 replicas, nhưng chỉ 1 leader
# Check leader
kubectl get lease -n kube-system kube-scheduler -o yamlImplication:
- Jika leader scheduler crash, new leader elected dalam ~5-15 seconds
- Pods tidak bisa scheduled selama transitional period
- Leader election based on etcd lease mechanism
Scheduling Queues
Scheduler maintains queue của pending pods:
┌─────────────┐
│ Pending │ Active queue —
│ Pods │ Pods under scheduling
├─────────────┤
│ Back-off │ Retry pods yang sebelum
│ Pods │ scheduling gagal
├─────────────┤
│ Unschedulable├ Pods yang last attempt failed
│ Pods │ with permanent reasons
└─────────────┘Pods bergerak between queues based:
- Retry exponential backoff (prevent scheduler thrashing)
- Event-based triggers (new node → requeue unschedulable)
Common Failure Modes
| Symptom | Possible Cause | Debug |
|---|---|---|
| Pod stuck in Pending | Insufficient resources atau node selector mismatch | kubectl describe pod shows pending reason |
| Pods scheduled unevenly | Score plugins misconfigured | Check scheduler logs |
| Scheduling delays (10s+) | etcd latency atau webhook delays | Monitor scheduler latency metrics |
| Preemption thrashing | PriorityClass misconfiguration | Check preempted pod patterns |
Production Considerations
Scheduling latency SLO: Target < 5s untuk 99th percentile (dari pod creation → scheduled)
# Monitor scheduling latency (GKE exposes this metric)
kubectl top nodes # simplified view
# More detailed: check Prometheus metrics
# kube_pod_info{condition="ready"}
# scheduler_scheduling_latency_seconds3. Controller-Manager — Reconciliation Engine
Role dan Responsibilities
Controller-Manager runs collection of controllers yang continuously reconcile state:
Expected State (YAML)
↓
Reconciler
↓
Actual State
↓
Compare ≠?
↓
Take Action (create, update, delete resources)
↓
Loop back every N secondsBuilt-in Controllers (Partial List)
| Controller | Reconciles | Action |
|---|---|---|
| Deployment | Desired → actual Pods | Creates/updates ReplicaSet |
| ReplicaSet | Desired → actual Pod count | Creates/deletes Pods |
| StatefulSet | Ordered Pods, stable identities | Manages pod lifecycle + order |
| DaemonSet | Pod on every node | Schedules pods per node |
| Job | Run-to-completion | Creates Pods, tracks completion |
| Service | Endpoint discovery | Updates endpoints as Pods change |
| PersistentVolume | Storage binding | Claims → volumes |
Reconciliation Loop Pattern
Semua controller mengikuti level-triggered design:
// Pseudo-code dari setiap reconciliation loop
for {
// Get desired state dari YAML (etcd)
desired := getDesiredState(namespace, name)
// Get current state dari cluster
actual := getCurrentState(namespace, name)
// Compare
if desired != actual {
// Take action ke actual state match desired
takeCorrectiveAction()
}
// Sleep, wait for next reconciliation
sleep(resyncInterval) // ~2-15 minutes depending controller
}Advantage: Robust ke missed events, eventuelle consistency guaranteed
Disadvantage: Latency antara desired→actual bisa minutes
Leader Election for Reconcilers
Seperti scheduler, hanya satu controller-manager instance aktif:
kubectl get lease -n kube-system kubecontroller-managerIni mencegah race conditions (multiple instances trying reconcile same resource).
Common Failure Modes
| Symptom | Possible Cause | Debug |
|---|---|---|
| Deployments tidak scaling | ReplicaSet controller issue | Check controller-manager logs |
| StatefulSet Pods out of order | Ordering logic bug atau concurrent updates | Check StatefulSet ordinals |
| PVCs tidak binding | PersistentVolume controller issue | Check PVC status |
| Stuck finalizers | Controller crashed before cleanup | Manual intervention needed |
Production Considerations
Reconciliation latency: Typical 30-60 seconds (loop cycle)
- Deployment created → kubelet sees Pod spec → Pod starts ≈ 5-10s
- Pod deleted → controller loop recognizes → finalizers run ≈ 10-30s
- Custom controller performance depends implementation
4. Cloud-Controller-Manager — GCP-Specific Reconcilers
Role dan Responsibilities
Cloud-Controller-Manager adalah GKE-specific component yang reconcile cloud resources:
| Resource | Controller | Action |
|---|---|---|
| Service (type: LoadBalancer) | Service controller | Creates GCP Load Balancer |
| Ingress | Ingress controller | Creates GCP HTTP LB |
| PersistentVolume | Volume controller | Provisions GCP Disks |
| Node | Node controller | Syncs node state dengan GCP |
Example: Service LoadBalancer Reconciliation
# User creates LoadBalancer service
apiVersion: v1
kind: Service
metadata:
name: my-service
spec:
type: LoadBalancer # ← triggers cloud-controller
ports:
- port: 80
targetPort: 8080
selector:
app: myapp
# Cloud-controller-manager:
# 1. Watches Service resource
# 2. Sees type: LoadBalancer
# 3. Calls GCP APIs: create Load Balancer, Backend Service
# 4. Assigns external IP
# 5. Updates Service status.loadBalancer.ingress[].ipGKE-Specific Behavior
Cloud-Controller-Manager handles:
- Node taints — Autopilot vs Standard taints
- Network routes — Managing VPC routes
- Service IP allocation — ClusterIP assignment
- Persistent Volume provisioning — Disk creation
Leader Election
Like scheduler/controller-manager, CCM runs in HA:
kubectl get lease -n kube-system cloud-controller-managerCommon Failure Modes
| Symptom | Possible Cause | Debug |
|---|---|---|
| Service stuck pending (LoadBalancer type) | CCM not running atau GCP API throttle | Check CCM logs |
| External IPs not assigned | Network quota exhausted | Check GCP quota |
| PersistentVolumes not provisioning | GCP Disk quota atau regional constraint | Check CCM logs |
Inter-Component Communication
Dependency Chain
API Server ← (all components watch)
↓
├→ Scheduler (watches unscheduled Pods)
├→ Controller-Manager (watches all resources)
└→ Cloud-Controller-Manager (watches cloud-specific resources)
All write back via API ServerWatch Mechanism
Components don't poll etcd. Mereka gunakan watch API:
# Under the hood, every controller does something like:
watch --resource=pods --field-selector=status.phase=Pending
# Streams events: ADDED, MODIFIED, DELETED
# This is HUGE performance difference vs pollingImplication: Jika watch connection drops, component misses events untuk few seconds, sampe reconnect.
Scaling Control Plane Components
API Server Scaling
GKE auto-scales API Server instances based on:
- Request rate
- Concurrent connections
- etcd throughput
You cannot manually add API Server replicas — Google manages this.
Controller-Manager Scaling
Controllers run single-instance active (others hot-standby). Cannot scale horizontally beyond 1 active.
Workaround untuk custom controllers: Run separate control-plane outside GKE (but enterprise-grade setup).
Scheduler Scaling
Similarly single-instance active. Bottleneck untuk scheduling rate itu scheduler instance compute resources.
Monitoring Control Plane Components
GKE expose metrics via Prometheus endpoint:
# API server latency
kube_apiserver_request_duration_seconds
# Scheduler latency
scheduler_scheduling_attempt_duration_seconds
scheduler_e2e_scheduling_latency_seconds
# Controller-manager work queue depth
workqueue_depth
# Cloud-controller-manager
cloudprovider_googleapis_com_calls_totalEnable control plane metrics:
gcloud container clusters update my-cluster \
--enable-cloud-logging \
--logging-service logging.googleapis.comReference Dokumentasi
Informasi teknis di section ini dari:
- Kubernetes Control Plane Components
- GCP GKE Control Plane Architecture
- Kubernetes Scheduler Documentation
- Kubernetes Controller-Manager
Summary
- API Server: Entry point untuk semua Kubernetes operations, gateway untuk etcd
- Scheduler: Makes Pod → Node placement decisions, runs single-instance active (standby replicas)
- Controller-Manager: Runs reconciliation loops untuk Kubernetes resources (Deployments, StatefulSets, Services, etc.)
- Cloud-Controller-Manager: GCP-specific reconcilers (LoadBalancer Services, PV provisioning, Node sync)
- Semua components communicate via etcd dan watch API
- GKE manages scaling, HA, updates — bukan customer responsibility
- Hiểu component roles membantu debug production issues lebih systematic