GKE Managed Control Plane Model — Standard vs Autopilot

Tại Sao Vấn Đề Này Quan Trọng Trong Production

Khi chọn tạo cluster GKE, bạn không chỉ chọn machine types — bạn chọn mô hình vận hành hoàn toàn. Google quản lý cái gì, bạn quản lý cái gì, và những hạn chế bạn phải chịu đều depend trên quyết định này.

Sai lầm phổ biến: nhiều team nghĩ "Standard cluster = bạn manage tất cả" và "Autopilot = fully managed". Thực tế phức tạp hơn. Ví dụ:

Standard cluster: Bạn manage node pools, node scaling, OS patches — nhưng Google vẫn manage control plane, bạn không có quyền truy cập direct vào API server binary
Autopilot: Google manage node pools, scaling, security policy — nhưng bạn vẫn phải configure workload resources đúng, nếu không pods bị reject

Hiểu rõ boundary này quyết định:

Cost model (Reserved Instances, Spot, committed use discounts)
Upgrade timeline (bạn control vs Google control)
Feature availability (một số advanced features chỉ available trong một mode)
Troubleshooting approach (debug surface area khác nhau)

Control Plane là Managed Service — Hiểu Đúng Ý Nghĩa

Trước khi dive vào Standard vs Autopilot, cần hiểu một điều fundamental: trong GKE, control plane LUÔN là managed service. Google quản lý:

Availability: Control plane tự động replicate across zones (regional clusters) hoặc tự động failover (zonal clusters)
Updates: Control plane patches được apply rolling-basis, transparent
Monitoring: Google monitors API server health, etcd consistency, scheduler performance
Scaling: Control plane components scale automatically (không có concept "node-less control plane", nhưng scaling này không visible)

Điều bạn không manage:

Bạn không ssh vào control plane nodes
Bạn không tuning etcd parameters trực tiếp
Bạn không install custom admission webhooks trong control plane
Bạn không modify API server flags (có limited options via cluster creation)

Standard Cluster Model

Định Nghĩa

Standard cluster là mô hình mà Google manage control plane, bạn manage node pools sepenuhnya.

Google Manages (Control Plane)

Thành Phần	Chi Tiết
API Server	Deployed, scaled, HA đảm bảo bởi Google
etcd	Replicated backend, backups, disaster recovery
Scheduler	Runs on control plane, không cần config
Controller-Manager	Danh sách managers được run
Updates	Automatic patches, monthly release cadence
Monitoring	Google monitors CPU, memory, latency

Bạn Manage (Data Plane)

Thành Phần	Chi Tiết
Node Pools	Creation, scaling, machine types
Node OS	Container-Optimized OS (COS) versions, patches (auto by default)
Security	Node-level security policies, workload permissions
Network	VPC configuration, firewall rules
Storage	PersistentVolume provisioning, volumes
Add-ons	DNS, logging, monitoring agent configuration

Production Patterns Trong Standard

Multi-Region HA Cluster

yaml

# Standard cluster là good fit khi bạn cần flexibility
# Ví dụ: custom node pools per workload type

gcloud container clusters create my-cluster \
  --region us-central1 \
  --num-nodes 3 \
  --machine-type n2-standard-4

# Rồi tạo specialized pool sau này
gcloud container node-pools create gpu-pool \
  --cluster=my-cluster \
  --region us-central1 \
  --machine-type a2-highgpu-1g \
  --num-nodes 0 \
  --enable-autoscaling \
  --min-nodes 0 --max-nodes 10

Lợi ích:

Cấu hình node pool theo đúng nhu cầu (GPU, high-memory, etc.)
Autoscaling policy riêng per pool
Reserved Instances discount tuning

Tradeoff: Bạn phải monitor node health, patch window, OS issues

Cluster Autoscaler + HPA

Trong Standard cluster, autoscaling có 2 layers:

Cluster Autoscaler (CA): thêm/xóa nodes khi pods pending/underutilized
Horizontal Pod Autoscaler (HPA): scale replicas based on metrics

yaml

# Deploy ứng dụng có HPA + CA
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
spec:
  replicas: 3  # initial
  template:
    spec:
      containers:
      - name: app
        resources:
          requests:
            cpu: 500m
            memory: 256Mi

---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: myapp-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myapp
  minReplicas: 3
  maxReplicas: 100
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

Behavior:

Khi CPU usage > 70%, HPA tăng replicas
Nếu không có node space, CA tăng nodes
Sebaliknya khi traffic turun, HPA scale down, CA remove nodes (after ~10 mins idle)

Autopilot Cluster Model

Định Nghĩa

Autopilot cluster là mô hình mana Google manage cả control plane AND node infrastructure, bạn hanya manage workloads.

Google Manages (Control Plane + Infrastructure)

Thành Phần	Chi Tiết
Control Plane	Penuh seperti Standard
Node Pools	Automated creation, scaling, optimization
Node Selection	Automatic machine type selection based on workload
OS & Patches	Fully automated, zero-downtime updates
Security	Pod Security Standards enforced, RBAC built-in
Networking	VPC, firewall, DNS configuration
Logging & Monitoring	Built-in, opinionated stack

Bạn Manage (Workloads Only)

Thành Phần	Chi Tiết
Pod Definitions	spec, containers, resources
Deployments, Services	Application configuration
IAM	Who can access cluster
Namespaces	Logical organization

Constraints yang Harus Paham

1. Resource Ratio Enforcement

Autopilot menjalankan resource validator pada setiap Pod submission. CPU:Memory ratio harus sesuai with preset profiles.

yaml

# ❌ AKAN DITOLAK - CPU terlalu kecil untuk memory
apiVersion: v1
kind: Pod
metadata:
  name: imbalanced
spec:
  containers:
  - name: app
    image: myapp
    resources:
      requests:
        cpu: 100m      # terlalu kecil!
        memory: 4Gi    # untuk 4GB memory, need minimal 500m CPU

---
# ✅ DITERIMA
apiVersion: v1
kind: Pod
metadata:
  name: balanced
spec:
  containers:
  - name: app
    image: myapp
    resources:
      requests:
        cpu: 500m      # ratio terima
        memory: 2Gi

Ratio rules (simplified):

Balanced: 1 CPU : 3.5 - 4 GB memory
Scale-out: 1 CPU : 8 GB memory (untuk web tier)
Performance: 1 CPU : 1 GB memory (untuk latency-sensitive)
Memory-optimized: 1 CPU : 16 GB memory

Jika Pod spec tidak fit any profile, Autopilot akan:

Coba auto-adjust (mutating webhook)
Jika tidak bisa, Pod rejection

2. Privileged Workload Restrictions

Autopilot memiliki opinionated security posture:

yaml

# ❌ AKAN DITOLAK - privileged container
apiVersion: v1
kind: Pod
spec:
  containers:
  - name: privileged-app
    securityContext:
      privileged: true  # not allowed

---
# ✅ DITERIMA - baseline security
apiVersion: v1
kind: Pod
spec:
  containers:
  - name: app
    securityContext:
      runAsNonRoot: true
      readOnlyRootFilesystem: true

Exception: Beberapa partner workloads (database engines, service meshes) diallow-list oleh Google. Jika butuh privileged mode, harus request Google approval.

3. Node Pool Abstraction

Di Autopilot, "node pools" adalah virtual concept:

bash

# Di Autopilot, ini adalah managed resource
gcloud container node-pools list --cluster=my-autopilot-cluster

# Output:
# default-pool (managed by Google)
# system-pool (for system components, managed by Google)

Banyak teams mencoba membuat custom node pools di Autopilot:

bash

# ❌ TIDAK BISA - Autopilot controls node pool creation
gcloud container node-pools create custom-pool \
  --cluster=my-autopilot-cluster  # ERROR

Workaround: gunakan ComputeClasses untuk mengontrol hardware profile:

yaml

apiVersion: v1
kind: Pod
metadata:
  name: gpu-workload
spec:
  nodeSelector:
    cloud.google.com/compute-class: accelerator  # GPU node
  containers:
  - name: ml-job
    image: ml-framework:latest
    resources:
      requests:
        nvidia.com/gpu: 1

4. Network Constraints

Autopilot enforce tertentu networking rules:

Hanya support container-native load balancing (Pod IPs sebagai NEG endpoints)
hostPort adalah restricted feature (harus enable explicitly)
DaemonSets hanya run di worker nodes, not system nodes

Perbandingan Langsung: Standard vs Autopilot

Aspek	Standard	Autopilot
Control Plane	Managed	Managed
Node Pools	Manual create/configure	Automated, opinionated
Node Selection	Bisa specify machine type	Automatic, validated ratio
OS Updates	Configurable window	Always zero-downtime
Security	Flexible (sesuai need)	Hardened by default
Resource Constraints	Flexible	Strict ratio enforcement
Privileged Workloads	Full support	Limited/approved only
Scaling	Granular control	Simplified, automatic
Cost Transparency	Clear per node	Aggregate, per pod
Learning Curve	Steeper	Gentler
Operational Toil	Higher	Lower

Production Anti-Patterns

Anti-Pattern 1: Choosing Autopilot untuk "Fully Managed" Misconception

Sai lầm: "Autopilot means Google manage everything, zero ops overhead"

Realitas: Autopilot hanya manage infrastructure. Workload reliability, scaling strategy, cost optimization, disaster recovery tetap tanggungjawab bạn.

Solusi: Treat Autopilot sebagai opinionated infrastructure, bukan magic bullet. Tetap perlu:

Load testing & capacity planning
Cost monitoring
Incident response practices
Backup strategies

Anti-Pattern 2: Pushing Strict Resource Limits ke Autopilot

Sai lầm: "Autopilot enforce ratio, jadi saya bisa set 100% resource utilization"

Realitas: Autopilot validation adalah admission check, bukan runtime enforcement. Pods masih bisa OOM atau CPU throttle if actual usage spike.

Solusi: Set requests conservative, maintain headroom:

yaml

# Conservative approach
requests:
  cpu: 250m    # for Balanced: 1 CPU = 3.5GB
  memory: 1Gi  # room for spikes
limits:
  cpu: 500m
  memory: 2Gi

Anti-Pattern 3: Avoiding Standard "Because Autopilot Simpler"

Sai lầm: Pilih Autopilot meski workload need flexibility

Realitas:

Autopilot kan memiliki features yang tidak tersedia di Standard
Beberapa use cases (GPU clusters, mixed-architecture deployments) lebih fit Standard
Standard memberikan granular control untuk specialized needs

Solusi: Choose based on workload characteristic:

Choose Autopilot jika: web/API service, standard compute, tiada special OS needs
Choose Standard jika: GPU/TPU, custom kernel, specialized networking, mixed architectures

GCP Documentation Reference

Semua pernyataan teknis di section ini berdasarkan:

Implikasi untuk Chapters Berikutnya

Model ini (Standard vs Autopilot) berpengaruh ke chapters:

Chapter 6 (Node Lifecycle): Node repairs, upgrades berbeda per model
Chapter 8 (Scheduler): Scheduling constraints depend pada node pool model
Chapter 9 (Autoscaling): Autopilot autoscaling fully automated, Standard require setup
Chapter 12 (Control Plane Scalability): Scaling patterns depend node model

Summary

GKE control plane SELALU managed oleh Google — bukan trade-off antara Standard vs Autopilot
Standard memberikan flexibility, bạn manage node pools sepenuhnya → better untuk specialized workloads
Autopilot memberikan simplicity, opinionated defaults + resource validation → better untuk standard web/API deployments
Pilihan bukan binary — bisa ada hybrid (beberapa clusters Standard, beberapa Autopilot) sesuai workload needs
Production success = memilih model yang sesuai workload characteristics, bukan yang "terlihat easier"

GKE Managed Control Plane Model — Standard vs Autopilot ​

Tại Sao Vấn Đề Này Quan Trọng Trong Production ​

Control Plane là Managed Service — Hiểu Đúng Ý Nghĩa ​

Standard Cluster Model ​

Định Nghĩa ​

Google Manages (Control Plane) ​

Bạn Manage (Data Plane) ​

Production Patterns Trong Standard ​

Multi-Region HA Cluster ​

Cluster Autoscaler + HPA ​

Autopilot Cluster Model ​

Định Nghĩa ​

Google Manages (Control Plane + Infrastructure) ​

Bạn Manage (Workloads Only) ​

Constraints yang Harus Paham ​

1. Resource Ratio Enforcement ​

2. Privileged Workload Restrictions ​

3. Node Pool Abstraction ​

4. Network Constraints ​

Perbandingan Langsung: Standard vs Autopilot ​

Production Anti-Patterns ​

Anti-Pattern 1: Choosing Autopilot untuk "Fully Managed" Misconception ​

Anti-Pattern 2: Pushing Strict Resource Limits ke Autopilot ​

Anti-Pattern 3: Avoiding Standard "Because Autopilot Simpler" ​

GCP Documentation Reference ​

Implikasi untuk Chapters Berikutnya ​

Summary ​

GKE Managed Control Plane Model — Standard vs Autopilot

Tại Sao Vấn Đề Này Quan Trọng Trong Production

Control Plane là Managed Service — Hiểu Đúng Ý Nghĩa

Standard Cluster Model

Định Nghĩa

Google Manages (Control Plane)

Bạn Manage (Data Plane)

Production Patterns Trong Standard

Multi-Region HA Cluster

Cluster Autoscaler + HPA

Autopilot Cluster Model

Định Nghĩa

Google Manages (Control Plane + Infrastructure)

Bạn Manage (Workloads Only)

Constraints yang Harus Paham

1. Resource Ratio Enforcement

2. Privileged Workload Restrictions

3. Node Pool Abstraction

4. Network Constraints

Perbandingan Langsung: Standard vs Autopilot

Production Anti-Patterns

Anti-Pattern 1: Choosing Autopilot untuk "Fully Managed" Misconception

Anti-Pattern 2: Pushing Strict Resource Limits ke Autopilot

Anti-Pattern 3: Avoiding Standard "Because Autopilot Simpler"

GCP Documentation Reference

Implikasi untuk Chapters Berikutnya

Summary