Skip to content

Chương 5: GKE Control Plane Internals — Hệ Thống Stateful ở Quy Mô Lớn

Tại Sao Chương Này Quan Trọng

Control plane là "bộ não" của mọi cluster Kubernetes. Hiểu được kiến trúc bên trong, cơ chế hoạt động, và những hạn chế của nó là yêu cầu bắt buộc để:

  • Debug scheduling failures: Pod stuck in Pending không phải lúc nào cũng do node resources
  • Diagnose API server latency: Đôi khi latency không đến từ etcd, mà từ admission webhooks hoặc rate limiting
  • Plan cluster upgrades: Biết control plane SLA, release channels, version skew policies giúp avoid unexpected downtime
  • Optimize control plane performance: Capacity planning cho API server, controller manager, scheduler cần hiểu bottlenecks thực

Mà tệ hơn: misconfiguration ở control plane có thể khiến toàn bộ cluster trở nên unstable, mà symptoms có thể rất subtle.

Điều Kiện Tiên Quyết

  • Kubernetes fundamentals (Pods, Services, Deployments, RBAC)
  • Hiểu cơ bản về distributed systems (consistency, replication)
  • Kinh nghiệm với kubectl commands

Mức Độ Sâu

5/5 — Đây là chuyên sâu nhất trong handbook. Khi hoàn thành chapter này, bạn sẽ:

  • Hiểu packet path từ kubectl call → API server → etcd → controller reconciliation
  • Biết cách debug control plane issues một cách systematic
  • Có ability thiết kế cluster topology cho high availability

Các Chủ Đề Con

1. GKE Managed Control Plane Model — Standard vs Autopilot

Giải thích model quản lý control plane của GCP:

  • Google quản lý gì (infrastructure, HA, upgrades)
  • Customer quản lý gì (node pools, workload configs)
  • Sự khác nhau giữa Standard vs Autopilot
  • Implications cho upgrade process, resource limits, pricing

2. Kiến Trúc Control Plane Components — API Server, Scheduler, Controller-Manager, Cloud-Controller-Manager

Mỗi component là gì, tại sao tồn tại, và chúng giao tiếp như thế nào:

  • API Server: entry point cho tất cả Kubernetes operations
  • Scheduler: dự toán Pod placement
  • Controller-Manager: reconciliation loops cho core Kubernetes objects
  • Cloud-Controller-Manager: GCP-specific controllers (LoadBalancer, PersistentVolume, etc.)
  • Lifecycle, dependencies, failure modes

3. etcd vs Spanner Backend — GKE State Storage & Consistency Model

Understanding state storage trong GKE:

  • etcd classical model (quorum-based)
  • Spanner backend (Google's distributed database)
  • Strong consistency guarantees
  • Latency implications ở scale
  • Backup & recovery strategies

4. etcd Architecture Deep Dive — Quorum, Replication, Watch Mechanism, Compaction

Chuyên sâu về etcd internals:

  • Quorum decision mechanics
  • Replication log, snapshotting
  • Watch mechanism (how changes propagate to clients)
  • Compaction schedule (maintaining database size)
  • etcd performance limits
  • Key vs value size constraints

5. Watch Caching & API Server Local Cache — Stale Reads, Reconnection Behavior

Cơ chế caching giảm etcd load:

  • API server maintains local cache
  • Watch semantics (stale vs fresh reads)
  • Cache invalidation & reconciliation
  • Network interruption impact
  • When to rely on cache vs etcd

6. Kubernetes Informer Pattern — List-Watch Protocol, Local Cache Resync, Re-sync Intervals

Pattern được sử dụng bởi tất cả Kubernetes controllers:

  • List-watch protocol mechanics
  • Local in-memory cache
  • Resync intervals (why, when, impact)
  • Event queue processing
  • Best practices cho custom controllers

7. Controller Reconciliation Loops — Level-Triggered vs Edge-Triggered Design

Cơ chế chính để Kubernetes maintain desired state:

  • Reconciliation loop pattern
  • Level-triggered (robust): check state continuously
  • Edge-triggered (efficient): react to changes
  • Failure modes & retry logic
  • Handling race conditions

8. API Priority and Fairness (APF) — Flow Schemas, Priority Levels, Rate Limiting

Request prioritization & rate limiting:

  • APF architecture (priority levels, flow schemas)
  • How GKE prioritizes different request types
  • Tuning APF rules
  • Burst allowances
  • Debugging APF rejections

9. Admission Control Pipeline — MutatingAdmissionWebhook → ValidatingAdmissionWebhook → etcd Write

Request processing before persistence:

  • Admission webhook execution order
  • Failure policies & timeout implications
  • Webhook reliability & anti-patterns
  • How misconfigured webhooks can crash cluster
  • Debugging admission failures

10. Mutating Admission Policies — CEL-Based Policies, Webhook Alternatives

Modern CEL-based policy enforcement:

  • CEL (Common Expression Language) for policies
  • Mutation rules vs webhook comparison
  • Policy lifecycle & debugging
  • Integration với existing webhooks
  • Performance implications

11. API Server Request Lifecycle — Authentication → Authorization → Admission → Storage

Full path của một API request:

  • TLS termination
  • Authentication (certificates, tokens, OIDC)
  • Authorization (RBAC evaluation)
  • Admission (webhooks, policies)
  • etcd storage write
  • Response serialization & return
  • Where latency actually comes from

12. Control Plane Scalability — Request Rate Limits, Watch Connection Limits, Burst Handling

Hiểu scale limits của control plane:

  • API server connection limits
  • Watch subscription limits
  • Request rate limits (global vs per-priority)
  • Burst allowances
  • How clusters fail at scale
  • Capacity planning

13. Control Plane Connectivity — DNS-Based vs IP-Based Endpoint, Authorized Networks

Network aspects của control plane access:

  • How kubelets connect to API server
  • DNS-based endpoint (what it means in GKE)
  • IP-based endpoint (Autopilot)
  • Authorized networks (ip-based firewall)
  • High availability implications

14. Private Cluster Control Plane — Private Endpoint, Cloud NAT, Node → Control-Plane Access

Control plane security & network isolation:

  • Private control plane endpoint
  • Cloud NAT for node egress
  • Authorized networks enforcement
  • Jump hosts / IAP patterns
  • When to use private clusters

15. Credential Rotation & Zero-Downtime Updates — SSL Certificates, CA Rotation, IP Rotation

Operational continuity khi credentials thay đổi:

  • SSL certificate lifecycle
  • CA rotation mechanics
  • API endpoint IP changes (khi nào xảy ra)
  • Zero-downtime rotation strategies
  • Monitoring & alerting

16. Control Plane SLA, Release Channels, & Versioning Policy

Availability, upgrade cadence, version support:

  • Regional vs zonal cluster SLA (99.95% vs 99.5%)
  • Release channels (Rapid, Regular, Stable, Extended)
  • GKE versioning (minor, patch, support windows)
  • Version skew policy (control plane ↔ kubelet compatibility)
  • Auto-upgrade mechanics
  • Breaking changes & deprecations

Learning Path

Bắt đầu ở đây nếu bạn:

Mới làm quen với Kubernetes

  1. Chapter 2 (Components overview)
  2. Chapter 6 (Informer pattern - foundation)
  3. Chapter 7 (Reconciliation loops - how Kubernetes works)
  4. → Rồi quay lại các chapter khác

Là platform engineer gặp production issues

  1. Chapter 11 (Request lifecycle - debug latency)
  2. Chapter 8 (APF - rate limiting rejections)
  3. Chapter 9 (Admission control - webhook issues)
  4. Chapter 12 (Scalability - overload symptoms)

Đang thiết kế cluster architecture

  1. Chapter 3 (State storage - etcd vs Spanner)
  2. Chapter 13 (Connectivity - network design)
  3. Chapter 14 (Private clusters - security design)
  4. Chapter 16 (SLA & versioning - reliability planning)

Debugging control plane problems

  1. Chapter 11 (Request lifecycle)
  2. Chapter 7 (Reconciliation loops - stuck controllers)
  3. Chapter 4 (etcd internals - state corruption)
  4. Chapter 5 (Watch caching - stale reads)

Key Concepts Tóm Tắt

ConceptWhy MattersWhere It's Explained
Watch MechanismChính cơ chế để clients được notify về state changesChapter 4, 5, 6
Reconciliation LoopCách Kubernetes đạt được desired stateChapter 7
Admission ControlCơ chế validate & mutate requests trước khi lưu trữChapter 9, 10
API Priority & FairnessLàm sao control plane không bị overwhelmChapter 8
Request LifecyclePath của một API call, nơi latency xuất hiệnChapter 11
etcd ConsistencyĐảm bảo state consistency across control planeChapter 3, 4
Version SkewSự tương thích giữa control plane & kubelet versionsChapter 16

Official References

Tất cả nội dung trong chapter này lấy từ:

Mỗi subtopic sẽ có citations cụ thể inline.


Next Steps After This Chapter

Sau khi hoàn thành chapter này, bạn sẽ sẵn sàng cho:

  • Chapter 6: GKE Node Lifecycle (kubelets interact với control plane)
  • Chapter 7: GKE Networking (service discovery, DNS)
  • Chapter 8: GKE Scheduler (detailed scheduling algorithm)
  • Chapter 10: Admission Control & Policy Enforcement (deeper into admission)
  • Chapter 12: Observability (monitoring control plane metrics)