Autopilot Resource Enforcement

Tại sao Autopilot cần enforce resource requests

Trong Standard clusters, resource requests là optional (mặc dù không recommend thiếu chúng). Scheduler dùng requests để quyết định placement, nhưng không có gì ngăn bạn deploy Pod không có requests.

Autopilot hoạt động theo mô hình khác: Google tính tiền theo Pod resources, và Google phải đảm bảo workload chạy được. Điều này tạo ra requirement bắt buộc:

Google cần biết chính xác bao nhiêu resource để provision cho workload của bạn
Nếu không có requests, Autopilot không thể tính toán node capacity cần thiết
Billing model per-Pod-resource yêu cầu requests phải xác định

Do đó, Autopilot luôn đảm bảo Pod có resource requests hợp lệ — bằng cách accept, modify, hoặc reject Pod spec của bạn.

Luồng xử lý khi submit Pod

Khi bạn apply một Pod manifest vào Autopilot cluster, hệ thống thực hiện các bước sau theo thứ tự:

Pod submitted
     │
     ▼
[1] Kiểm tra compute class
     │ ← Nếu không chỉ định → dùng General Purpose
     ▼
[2] Kiểm tra missing requests
     │ ← Nếu thiếu → apply defaults của compute class
     ▼
[3] Kiểm tra CPU:memory ratio
     │ ← Nếu vi phạm → scale up resource nhỏ hơn
     ▼
[4] Kiểm tra giá trị minimum
     │ ← Nếu dưới minimum → tăng lên minimum
     ▼
[5] Kiểm tra giá trị maximum
     │ ← Nếu vượt maximum → REJECT (lỗi 400)
     ▼
Pod spec được modify và persist

Quan trọng: Bước 2-4 xảy ra ngầm, không có warning rõ ràng trừ khi bạn xem Pod events. Bước 5 là lỗi tường minh.

Để xem Autopilot đã điều chỉnh gì:

bash

kubectl describe pod POD_NAME | grep -A 20 "Containers:"
# So sánh với manifest gốc để thấy resource đã bị thay đổi

Hoặc dùng annotation để quan sát:

bash

kubectl get pod POD_NAME -o jsonpath='{.metadata.annotations}' | python3 -m json.tool

Autopilot thêm annotation autopilot.gke.io/resource-adjustment khi điều chỉnh resources.

Default resource requests theo compute class

Khi Pod không khai báo resources, Autopilot áp dụng defaults:

Container thường (không phải DaemonSet)

Compute Class	CPU mặc định	Memory mặc định	Ephemeral Storage
General Purpose	0.5 vCPU	2 GiB	1 GiB
Balanced	0.5 vCPU	4 GiB	1 GiB
Scale-Out	0.5 vCPU	2 GiB	1 GiB

DaemonSet containers

DaemonSets có defaults thấp hơn vì chúng chạy trên mọi node và không nên chiếm quá nhiều capacity:

Resource	Mặc định
CPU	50 mCPU
Memory	100 MiB
Ephemeral storage	100 MiB

Min/Max requests theo compute class

General Purpose (General-purpose / E-series VMs)

Resource	Min (bursting clusters)	Min (non-bursting)	Max
CPU	50 mCPU	250 mCPU	30 vCPU
Memory	52 MiB	512 MiB	110 GiB
Ephemeral storage	10 MiB	10 MiB	10 GiB

Bursting clusters: GKE 1.32.3+ với container-optimized compute platform

Balanced (N-series VMs: N2, N2D, N4)

Resource	Min	Max
CPU	0.25 vCPU	222 vCPU
Memory	0.5 GiB	851 GiB
Ephemeral storage	10 MiB	10 GiB

Scale-Out (T2D, T2A — AMD EPYC và ARM)

Resource	Min	Max (x86/AMD64)	Max (ARM64)
CPU	0.25 vCPU	54 vCPU	43 vCPU
Memory	1 GiB	216 GiB	172 GiB

Scale-Out yêu cầu CPU:memory ratio cố định 1:4, khác với các class khác

Performance (C2, C3, M-series VMs)

Performance class không enforce minimum requests nghiêm ngặt; maximum phụ thuộc vào machine series được chọn:

Machine Series	Max CPU	Max Memory
C2	58 vCPU	218 GiB
C3	176 vCPU	1408 GiB
C3D	360 vCPU	2880 GiB
M4	224 vCPU	5952 GiB

Accelerator class (GPU workloads)

Limits phụ thuộc vào loại GPU:

GPU	Count	CPU	Memory	Ephemeral
NVIDIA A100 (40GB)	1	11 vCPU	74 GiB	1 GiB
NVIDIA A100 (40GB)	8	94 vCPU	632 GiB	—
NVIDIA H100 (80GB)	8	206 vCPU	1,795 GiB	5,250 GiB
NVIDIA L4	1	23 vCPU	52 GiB	—
TPU v5p	—	280 vCPU	448 GiB	56 TiB

CPU:Memory ratio enforcement

Autopilot không cho phép bất kỳ tỷ lệ CPU:memory nào. Mỗi compute class có quy tắc ratio riêng:

Compute Class	Ratio (CPU:Memory)	Hành vi khi vi phạm
General Purpose	1:1 đến 1:6.5	Scale up resource thiếu
Balanced	1:1 đến 1:8	Scale up resource thiếu
Scale-Out	Cố định 1:4	Scale up để đạt đúng 1:4
Performance	Không enforce	Pass through
Accelerator	Không enforce	Pass through

Ví dụ vi phạm ratio

Trường hợp 1: Memory quá cao so với CPU

yaml

resources:
  requests:
    cpu: "100m"     # 0.1 vCPU
    memory: "2Gi"   # 2 GiB → ratio = 1:20, vi phạm max 1:6.5

Autopilot sẽ tăng CPU lên để đạt ratio hợp lệ:

Cần: 2 GiB / 6.5 = 0.308 vCPU minimum
Autopilot sẽ set CPU = 0.308 vCPU (hoặc tăng lên nearest valid increment)

Trường hợp 2: CPU quá cao so với Memory (General Purpose)

yaml

resources:
  requests:
    cpu: "4"      # 4 vCPU
    memory: "1Gi" # 1 GiB → ratio = 4:1, vi phạm min 1:1

Autopilot sẽ tăng memory lên: 4 vCPU × 1 (min ratio) = 4 GiB memory.

Trường hợp 3: Scale-Out với ratio sai

yaml

# nodeSelector: cloud.google.com/compute-class: Scale-Out
resources:
  requests:
    cpu: "2"
    memory: "4Gi"  # Ratio 1:2, cần 1:4

Autopilot sẽ tăng memory lên 8 GiB (2 vCPU × 4 = 8 GiB).

Annotation để kiểm soát container nhận resource tăng thêm

Khi Autopilot tăng resource để đáp ứng ratio, mặc định container đầu tiên trong Pod spec nhận toàn bộ phần tăng thêm. Nếu có nhiều containers, đây là hành vi không mong muốn.

Dùng annotation để chỉ định:

yaml

metadata:
  annotations:
    # Phân phối đồng đều cho tất cả containers
    autopilot.gke.io/additional-containers: '["container-1", "container-2"]'

Hoặc để một container cụ thể nhận hết:

yaml

metadata:
  annotations:
    autopilot.gke.io/primary-container: "my-app"

Giá trị minimum và reject behavior

Khi Pod requests vượt quá maximum của compute class, Autopilot reject Pod với error:

Error: INVALID_ARGUMENT: The following resource requests exceeded the maximum allowed: CPU (requested: X, max: Y)

Khác với Standard clusters (scheduler sẽ queue Pod cho đến khi có đủ node capacity), Autopilot từ chối ngay lập tức. Đây là design decision: Autopilot cam kết provision node, nhưng nếu request vượt mọi node configuration khả thi, không có cách nào fulfill được.

Các lý do thường gặp khiến Pod bị reject:

CPU request vượt maximum của compute class
Memory request vượt maximum
Tổng CPU của tất cả containers trong Pod vượt node capacity tối đa
GPU count không hợp lệ

Ephemeral storage và local SSD

Ephemeral storage (dùng cho emptyDir, container logs, node-local cache) có rules riêng:

yaml

resources:
  requests:
    ephemeral-storage: "5Gi"
  limits:
    ephemeral-storage: "10Gi"

Maximum ephemeral storage cho General Purpose và Balanced class là 10 GiB. Nếu workload cần nhiều hơn (ví dụ: model weights cho AI inference), bạn phải dùng Performance class hoặc nodes với Local SSD.

Với Local SSD nodes, maximum = (số SSD × 375 GiB) − overhead hệ thống.

Init containers và sidecar containers

Init containers được xử lý khác với regular containers:

Nếu init container không có requests: Autopilot set requests = tổng requests của tất cả application containers.

yaml

# Pod với 2 app containers: 1 vCPU + 2 vCPU = 3 vCPU total
# Init container không có requests → Autopilot set 3 vCPU cho init container

Billing cho Pod với init containers: Autopilot tính theo max của:

Tổng requests của tất cả app containers đang chạy đồng thời
Requests của init container lớn nhất

Điều này nghĩa là init container nặng có thể tăng bill dù nó chỉ chạy vài giây.

Burst vs non-burst clusters

GKE 1.32.3+ giới thiệu container-optimized compute platform với bursting capability:

Đặc điểm	Non-burst (truyền thống)	Burst-capable
CPU rounding	Làm tròn lên 0.25 vCPU	Cho phép nhỏ hơn 0.25 vCPU
CPU limits	= requests (nếu không set)	Pod có thể burst vượt requests
Min CPU	250 mCPU	50 mCPU
Min memory	512 MiB	52 MiB
Architecture	x86 only	x86 và ARM

Burst-capable clusters phù hợp cho microservices có traffic spiky — Pod chạy ổn định ở low CPU nhưng có thể burst khi cần, mà không phải trả tiền cho reserved capacity không dùng.

Common pitfalls với resource enforcement

Pitfall 1: Không chú ý đến automatic resource modification

yaml

# Bạn deploy:
resources:
  requests:
    cpu: "100m"
    memory: "128Mi"

# Autopilot thực tế chạy với:
resources:
  requests:
    cpu: "250m"    # tăng lên minimum
    memory: "512Mi" # tăng lên minimum

Bạn đang trả tiền cho 250m CPU và 512 MiB, không phải 100m và 128 MiB. Nhiều team bị shock với bill Autopilot vì không biết điều này.

Fix: Xem actual resource allocation sau khi deploy:

bash

kubectl get pod POD_NAME -o json | jq '.spec.containers[].resources'

Pitfall 2: Set limits thấp hơn adjusted requests

yaml

resources:
  requests:
    cpu: "100m"
  limits:
    cpu: "200m"

Nếu Autopilot tăng request lên 250m nhưng limit vẫn là 200m, kết quả là limit < request — Kubernetes sẽ reject Pod hoàn toàn hoặc tạo ra Burstable QoS class không như ý muốn.

Rule: Trong Autopilot, nếu set limits, phải đảm bảo limits ≥ minimum request của compute class.

Pitfall 3: Quên rằng Autopilot bill theo requests, không phải actual usage

Một Pod với requests cao nhưng actual CPU thấp vẫn bị charge đầy đủ. Autopilot không "trả lại tiền" khi Pod idle. Đây là khác biệt quan trọng so với billing model "per second of actual compute" của Cloud Run.

Implication: VPA (Vertical Pod Autoscaler) là quan trọng hơn trong Autopilot để optimize requests vs cost.

Autopilot Resource Enforcement ​

Tại sao Autopilot cần enforce resource requests ​

Luồng xử lý khi submit Pod ​

Default resource requests theo compute class ​

Container thường (không phải DaemonSet) ​

DaemonSet containers ​

Min/Max requests theo compute class ​

General Purpose (General-purpose / E-series VMs) ​

Balanced (N-series VMs: N2, N2D, N4) ​

Scale-Out (T2D, T2A — AMD EPYC và ARM) ​

Performance (C2, C3, M-series VMs) ​

Accelerator class (GPU workloads) ​

CPU:Memory ratio enforcement ​

Ví dụ vi phạm ratio ​

Annotation để kiểm soát container nhận resource tăng thêm ​

Giá trị minimum và reject behavior ​

Ephemeral storage và local SSD ​

Init containers và sidecar containers ​

Burst vs non-burst clusters ​

Common pitfalls với resource enforcement ​

Pitfall 1: Không chú ý đến automatic resource modification ​

Pitfall 2: Set limits thấp hơn adjusted requests ​

Pitfall 3: Quên rằng Autopilot bill theo requests, không phải actual usage ​

References ​

Autopilot Resource Enforcement

Tại sao Autopilot cần enforce resource requests

Luồng xử lý khi submit Pod

Default resource requests theo compute class

Container thường (không phải DaemonSet)

DaemonSet containers

Min/Max requests theo compute class

General Purpose (General-purpose / E-series VMs)

Balanced (N-series VMs: N2, N2D, N4)

Scale-Out (T2D, T2A — AMD EPYC và ARM)

Performance (C2, C3, M-series VMs)

Accelerator class (GPU workloads)

CPU:Memory ratio enforcement

Ví dụ vi phạm ratio

Annotation để kiểm soát container nhận resource tăng thêm

Giá trị minimum và reject behavior

Ephemeral storage và local SSD

Init containers và sidecar containers

Burst vs non-burst clusters

Common pitfalls với resource enforcement

Pitfall 1: Không chú ý đến automatic resource modification

Pitfall 2: Set limits thấp hơn adjusted requests

Pitfall 3: Quên rằng Autopilot bill theo requests, không phải actual usage

References