Chương 5: GKE Control Plane Internals — Hệ Thống Stateful ở Quy Mô Lớn
Tại Sao Chương Này Quan Trọng
Control plane là "bộ não" của mọi cluster Kubernetes. Hiểu được kiến trúc bên trong, cơ chế hoạt động, và những hạn chế của nó là yêu cầu bắt buộc để:
- Debug scheduling failures: Pod stuck in Pending không phải lúc nào cũng do node resources
- Diagnose API server latency: Đôi khi latency không đến từ etcd, mà từ admission webhooks hoặc rate limiting
- Plan cluster upgrades: Biết control plane SLA, release channels, version skew policies giúp avoid unexpected downtime
- Optimize control plane performance: Capacity planning cho API server, controller manager, scheduler cần hiểu bottlenecks thực
Mà tệ hơn: misconfiguration ở control plane có thể khiến toàn bộ cluster trở nên unstable, mà symptoms có thể rất subtle.
Điều Kiện Tiên Quyết
- Kubernetes fundamentals (Pods, Services, Deployments, RBAC)
- Hiểu cơ bản về distributed systems (consistency, replication)
- Kinh nghiệm với kubectl commands
Mức Độ Sâu
5/5 — Đây là chuyên sâu nhất trong handbook. Khi hoàn thành chapter này, bạn sẽ:
- Hiểu packet path từ kubectl call → API server → etcd → controller reconciliation
- Biết cách debug control plane issues một cách systematic
- Có ability thiết kế cluster topology cho high availability
Các Chủ Đề Con
1. GKE Managed Control Plane Model — Standard vs Autopilot
Giải thích model quản lý control plane của GCP:
- Google quản lý gì (infrastructure, HA, upgrades)
- Customer quản lý gì (node pools, workload configs)
- Sự khác nhau giữa Standard vs Autopilot
- Implications cho upgrade process, resource limits, pricing
2. Kiến Trúc Control Plane Components — API Server, Scheduler, Controller-Manager, Cloud-Controller-Manager
Mỗi component là gì, tại sao tồn tại, và chúng giao tiếp như thế nào:
- API Server: entry point cho tất cả Kubernetes operations
- Scheduler: dự toán Pod placement
- Controller-Manager: reconciliation loops cho core Kubernetes objects
- Cloud-Controller-Manager: GCP-specific controllers (LoadBalancer, PersistentVolume, etc.)
- Lifecycle, dependencies, failure modes
3. etcd vs Spanner Backend — GKE State Storage & Consistency Model
Understanding state storage trong GKE:
- etcd classical model (quorum-based)
- Spanner backend (Google's distributed database)
- Strong consistency guarantees
- Latency implications ở scale
- Backup & recovery strategies
4. etcd Architecture Deep Dive — Quorum, Replication, Watch Mechanism, Compaction
Chuyên sâu về etcd internals:
- Quorum decision mechanics
- Replication log, snapshotting
- Watch mechanism (how changes propagate to clients)
- Compaction schedule (maintaining database size)
- etcd performance limits
- Key vs value size constraints
5. Watch Caching & API Server Local Cache — Stale Reads, Reconnection Behavior
Cơ chế caching giảm etcd load:
- API server maintains local cache
- Watch semantics (stale vs fresh reads)
- Cache invalidation & reconciliation
- Network interruption impact
- When to rely on cache vs etcd
6. Kubernetes Informer Pattern — List-Watch Protocol, Local Cache Resync, Re-sync Intervals
Pattern được sử dụng bởi tất cả Kubernetes controllers:
- List-watch protocol mechanics
- Local in-memory cache
- Resync intervals (why, when, impact)
- Event queue processing
- Best practices cho custom controllers
7. Controller Reconciliation Loops — Level-Triggered vs Edge-Triggered Design
Cơ chế chính để Kubernetes maintain desired state:
- Reconciliation loop pattern
- Level-triggered (robust): check state continuously
- Edge-triggered (efficient): react to changes
- Failure modes & retry logic
- Handling race conditions
8. API Priority and Fairness (APF) — Flow Schemas, Priority Levels, Rate Limiting
Request prioritization & rate limiting:
- APF architecture (priority levels, flow schemas)
- How GKE prioritizes different request types
- Tuning APF rules
- Burst allowances
- Debugging APF rejections
9. Admission Control Pipeline — MutatingAdmissionWebhook → ValidatingAdmissionWebhook → etcd Write
Request processing before persistence:
- Admission webhook execution order
- Failure policies & timeout implications
- Webhook reliability & anti-patterns
- How misconfigured webhooks can crash cluster
- Debugging admission failures
10. Mutating Admission Policies — CEL-Based Policies, Webhook Alternatives
Modern CEL-based policy enforcement:
- CEL (Common Expression Language) for policies
- Mutation rules vs webhook comparison
- Policy lifecycle & debugging
- Integration với existing webhooks
- Performance implications
11. API Server Request Lifecycle — Authentication → Authorization → Admission → Storage
Full path của một API request:
- TLS termination
- Authentication (certificates, tokens, OIDC)
- Authorization (RBAC evaluation)
- Admission (webhooks, policies)
- etcd storage write
- Response serialization & return
- Where latency actually comes from
12. Control Plane Scalability — Request Rate Limits, Watch Connection Limits, Burst Handling
Hiểu scale limits của control plane:
- API server connection limits
- Watch subscription limits
- Request rate limits (global vs per-priority)
- Burst allowances
- How clusters fail at scale
- Capacity planning
13. Control Plane Connectivity — DNS-Based vs IP-Based Endpoint, Authorized Networks
Network aspects của control plane access:
- How kubelets connect to API server
- DNS-based endpoint (what it means in GKE)
- IP-based endpoint (Autopilot)
- Authorized networks (ip-based firewall)
- High availability implications
14. Private Cluster Control Plane — Private Endpoint, Cloud NAT, Node → Control-Plane Access
Control plane security & network isolation:
- Private control plane endpoint
- Cloud NAT for node egress
- Authorized networks enforcement
- Jump hosts / IAP patterns
- When to use private clusters
15. Credential Rotation & Zero-Downtime Updates — SSL Certificates, CA Rotation, IP Rotation
Operational continuity khi credentials thay đổi:
- SSL certificate lifecycle
- CA rotation mechanics
- API endpoint IP changes (khi nào xảy ra)
- Zero-downtime rotation strategies
- Monitoring & alerting
16. Control Plane SLA, Release Channels, & Versioning Policy
Availability, upgrade cadence, version support:
- Regional vs zonal cluster SLA (99.95% vs 99.5%)
- Release channels (Rapid, Regular, Stable, Extended)
- GKE versioning (minor, patch, support windows)
- Version skew policy (control plane ↔ kubelet compatibility)
- Auto-upgrade mechanics
- Breaking changes & deprecations
Learning Path
Bắt đầu ở đây nếu bạn:
Mới làm quen với Kubernetes
- Chapter 2 (Components overview)
- Chapter 6 (Informer pattern - foundation)
- Chapter 7 (Reconciliation loops - how Kubernetes works)
- → Rồi quay lại các chapter khác
Là platform engineer gặp production issues
- Chapter 11 (Request lifecycle - debug latency)
- Chapter 8 (APF - rate limiting rejections)
- Chapter 9 (Admission control - webhook issues)
- Chapter 12 (Scalability - overload symptoms)
Đang thiết kế cluster architecture
- Chapter 3 (State storage - etcd vs Spanner)
- Chapter 13 (Connectivity - network design)
- Chapter 14 (Private clusters - security design)
- Chapter 16 (SLA & versioning - reliability planning)
Debugging control plane problems
- Chapter 11 (Request lifecycle)
- Chapter 7 (Reconciliation loops - stuck controllers)
- Chapter 4 (etcd internals - state corruption)
- Chapter 5 (Watch caching - stale reads)
Key Concepts Tóm Tắt
| Concept | Why Matters | Where It's Explained |
|---|---|---|
| Watch Mechanism | Chính cơ chế để clients được notify về state changes | Chapter 4, 5, 6 |
| Reconciliation Loop | Cách Kubernetes đạt được desired state | Chapter 7 |
| Admission Control | Cơ chế validate & mutate requests trước khi lưu trữ | Chapter 9, 10 |
| API Priority & Fairness | Làm sao control plane không bị overwhelm | Chapter 8 |
| Request Lifecycle | Path của một API call, nơi latency xuất hiện | Chapter 11 |
| etcd Consistency | Đảm bảo state consistency across control plane | Chapter 3, 4 |
| Version Skew | Sự tương thích giữa control plane & kubelet versions | Chapter 16 |
Official References
Tất cả nội dung trong chapter này lấy từ:
Mỗi subtopic sẽ có citations cụ thể inline.
Next Steps After This Chapter
Sau khi hoàn thành chapter này, bạn sẽ sẵn sàng cho:
- Chapter 6: GKE Node Lifecycle (kubelets interact với control plane)
- Chapter 7: GKE Networking (service discovery, DNS)
- Chapter 8: GKE Scheduler (detailed scheduling algorithm)
- Chapter 10: Admission Control & Policy Enforcement (deeper into admission)
- Chapter 12: Observability (monitoring control plane metrics)