Quota Management: Cơ chế giới hạn và chiến lược đối phó
Tại sao Quota Management quan trọng
Quotas là underappreciated aspect của GCP operations. Khi bạn hit quota:
gcloud compute instances create test-vm
ERROR: (gcloud.compute.instances.create) Could not fetch resource:
- Invalid value for field 'resource.projectId': 'my-project'.
Project 'my-project' exceeds quota for 'CPUS' in region 'us-central1'.Hậu quả:
- Production deployments blocked (cannot scale)
- CI/CD pipelines fail (cannot create test environments)
- Cost-related quotas hit → billing shocks
- Cascading failures across microservices
Scale considerations:
- Small org (10 projects): Rarely hit quotas
- Medium org (100 projects): Need careful management
- Large org (1000+ projects): Quota management critical infrastructure
Quota Hierarchy
Unlike IAM policies (which inherit), quotas ở GCP independently managed ở mỗi level:
Organization
├── Project-level quotas (primary)
├── Folder-level quotas (aggregate view)
└── Organization-level quotas (capacity planning)Project-Level Quotas
Project-level quotas adalah primary enforcement point. Mỗi project có independent quota limit:
my-project-prod:
- CPUs (us-central1): 24
- CPUs (us-east1): 0
- External IPs: 5
- Cloud Storage: 100 TB
- Requests/minute: 10,000
my-project-staging:
- CPUs (us-central1): 4
- CPUs (us-east1): 0
- External IPs: 1
- Cloud Storage: 10 TB
- Requests/minute: 1,000Project quotas are independent—hitting quota trong project A tidak affect project B.
Folder-Level Quotas
Folders provide aggregate view của quotas:
Folder/Engineering (aggregated):
- CPUs across all projects: 100 (24+4+... from child projects)
- External IPs across all projects: 10Limitation: Folder-level quotas tidak enforce additional limits—họ chỉ aggregate. Enforcement happens ở project level.
Organization-Level Quotas
Organization quota adalah global capacity limits:
Organization:
- Max Projects: 10,000
- Max Folders: 100,000Organization quotas very rarely hit (unless massive scale). Mais are capacity planning tools.
Types of Quotas
GCP quotas phân thành several categories:
1. Allocation Quotas
Fixed amount của resources mà project có thể allocate:
Example:
- Max 24 CPUs trong project
- Max 10 persistent disks
- Max 100 GB Cloud SQL storageBehavior:
- Once allocated, resource "uses up" quota
- Even if resource idle, quota still consumed
- Deallocating resource (delete VM) releases quota
2. Rate Quotas
Requests per time period:
Example:
- 10 requests/minute untuk Compute Engine API
- 1,000 requests/second untuk Cloud Storage APIBehavior:
- Reset window: per-minute, per-second, per-day
- Rate-limited requests get 429 (Too Many Requests) error
- Temporary—quota resets when time window passes
3. Concurrent Quotas
Simultaneous in-flight operations:
Example:
- Max 100 concurrent VMs being created
- Max 50 concurrent BigQuery jobsBehavior:
- Temporary limit
- Resets when operation completes
Quota Exhaustion Scenarios
Scenario 1: Rapid Scaling
# Application auto-scales during traffic spike
# Each autoscale attempts to create VMs
while traffic > threshold:
create_vm() # Hits quota after 24 CPUs allocated
# Result:
# ✓ 24 VMs created
# ✗ 25th VM creation fails
# ✗ Load balancer cannot reach desired replica count
# ✗ Some traffic dropped (degraded service)Solution: Pre-request quota increase before anticipated scale event.
Scenario 2: Forgotten Resources
T+0: Developer creates test VMs/clusters for testing
T+1 day: Developer forgets to clean up
T+2 weeks: 50 test VMs consuming 100 CPUs
T+3 weeks: Production scaling fails—no quota available
Lesson: Implement resource cleanup (via Cloud Scheduler) or cost alertsScenario 3: Regional Quota Exhaustion
Quotas can be regional:
- US-central1: 24 CPUs (at limit)
- US-east1: 0 CPUs (at limit)
- Europe-west1: 50 CPUs (available)
Problem: Application requirements specify US regions only
Solution: Either increase quota atau redesign for multi-regionQuota Management Strategies
Strategy 1: Rightsizing from Start
Size quotas based on:
1. Peak expected load
2. Desired redundancy factor (HA)
3. Buffer for unexpected spikes (20%)
Example:
- Peak load: 10 concurrent jobs
- Desired HA: 2x for failover
- Buffer: 20%
- Required quota: 10 * 2 * 1.2 = 24 jobsStrategy 2: Request Quota Increases Proactively
# View current quota usage
gcloud compute project-info describe --project=PROJECT_ID \
--format='value(quotas[name=CPUS].usage)'
# Request increase before hitting limit
gcloud compute project-info quotas describe \
--filter='name=CPUS' \
--format='value(limit)'
# Programmatic increase (via Cloud Quotas API)
gcloud quotas preferences update \
--project=PROJECT_ID \
--metric=compute.googleapis.com/cpus_per_region \
--value=100Strategy 3: Quota Alerting
from google.cloud import monitoring_v3
def setup_quota_alert(project_id, quota_metric, threshold=80):
"""Alert when quota usage exceeds threshold"""
client = monitoring_v3.AlertPolicyServiceClient()
# Create condition: quota_usage > 80%
condition = monitoring_v3.AlertPolicy.Condition(
display_name=f"{quota_metric} usage alert",
condition_threshold=monitoring_v3.AlertPolicy.Condition.MetricThreshold(
filter=f'metric.type="serviceruntime.googleapis.com/quota_used_count" AND resource.labels.quota_metric="{quota_metric}"',
comparison=monitoring_v3.ComparisonType.COMPARISON_GT,
threshold_value=threshold,
duration={"seconds": 300}
)
)
# Create notification channel (email)
notify_channel = create_email_notification_channel("team@company.com")
# Create alert policy
policy = monitoring_v3.AlertPolicy(
display_name=f"Alert: {quota_metric} quota usage",
conditions=[condition],
notification_channels=[notify_channel],
alert_strategy=monitoring_v3.AlertPolicy.AlertStrategy(
auto_close={"seconds": 86400}
)
)
client.create_alert_policy(name=f"projects/{project_id}", alert_policy=policy)
print(f"✓ Alert created for {quota_metric}")Strategy 4: Multi-Project Load Distribution
Distribute workloads across multiple projects để avoid quota exhaustion:
import random
def distribute_workload(projects, workload_units):
"""Distribute workload across projects"""
workloads = {}
for unit in workload_units:
# Select project with available capacity
project = select_project_with_capacity(projects)
if project not in workloads:
workloads[project] = []
workloads[project].append(unit)
return workloads
def select_project_with_capacity(projects):
"""Select project with most available quota"""
project_capacities = {}
for project in projects:
available = get_available_quota(project)
project_capacities[project] = available
# Return project with highest available quota
return max(project_capacities, key=project_capacities.get)Quota Override (Capping Usage)
Sometimes bạn intentionally want lower quota untuk prevent runaway costs:
# Set quota override to cap usage
gcloud quotas preferences update \
--project=PROJECT_ID \
--metric=compute.googleapis.com/external_ips \
--value=2 # Cap external IPs at 2 (normally 5)
# Reason: Cost control, prevent accidental resource allocationUse cases:
- Development project: Cap resources to prevent cost overruns
- Specific team: Limit their resource consumption
- Cost control: Hard cap untuk team budgets
Quota Request Process
When you hit quota limit and need increase:
T+0: Submit quota request
gcloud quotas preferences update \
--project=PROJECT_ID \
--metric=METRIC \
--value=NEW_VALUE
T+few minutes: Google reviews request
- Automated checks (no abuse, reasonable)
- Manual review if high increase
T+minutes to hours: Decision
- Approved ✓
- Denied ✗
- Approved with conditions
Result:
- Approved: Quota increased
- Denied: Reason provided (must contact support)Tips untuk approvals:
- Request gradually (don't jump from 10 to 1000)
- Explain business need
- Show usage trends (if possible)
- Reference SLA/uptime requirements
Service-Specific Quota Peculiarities
Compute Engine
Quotas per:
- Region (CPUs, External IPs, Disks)
- Zone (GPU, Local SSD)
- Global (Images, Snapshots, Security Policies)
Example:
- us-central1: 24 CPUs
- us-central1-a: 4 GPUs
- Global: 5 snapshotsCloud Storage
Quotas:
- Per-project storage (unlimited, but billing limit)
- Per-bucket object count (practical: billions)
- API requests: Rate limited (not hard quota)BigQuery
Quotas per project:
- Concurrent queries: 100
- Query timeout: 6 hours max
- Slot hours: If reserved capacity model
- Data size: Unlimited (billing based)Testing Quota Behavior
def test_quota_enforcement():
"""Test project quota enforcement"""
# Create resources up to quota limit
project_quota = get_quota(PROJECT_ID, "CPUS")
vms = []
try:
for i in range(project_quota + 5):
vm = create_vm(f"test-vm-{i}", machine_type="n1-standard-4")
vms.append(vm)
except QuotaExceededException as e:
print(f"✓ Quota enforcement working: {e}")
finally:
# Cleanup
for vm in vms:
delete_vm(vm)
def test_quota_reset_behavior():
"""Test rate quota reset"""
# Make API calls up to rate limit
for i in range(10001): # Assuming 10k/min limit
try:
call_api()
except RateLimitedException:
print(f"✓ Rate limit hit at request {i}")
break
# Wait for window reset
time.sleep(65)
# Verify new requests succeed
try:
call_api()
print("✓ Rate limit reset")
except RateLimitedException:
print("✗ Rate limit not reset")Terraform Quota Management
# Example: Don't have quota manager in Terraform
# Instead, ensure resource creation respects quotas
resource "google_compute_instance" "app" {
count = local.vm_count
# Will fail if quota exceeded
# Catch error, increase quota, retry
name = "app-vm-${count.index}"
machine_type = "n1-standard-4"
zone = "us-central1-a"
# Metadata for tracking
metadata = {
environment = var.environment
app = "production"
}
}
locals {
# Calculate required VMs based on load
desired_vms = var.expected_load / var.requests_per_vm
# Request sufficient quota before creating
required_quota = desired_vms * 4 # 4 CPUs per VM
}
# Output: Verify quota before apply
output "required_quota" {
value = local.required_quota
}Anti-Patterns to Avoid
| Anti-pattern | Problem | Solution |
|---|---|---|
| Ignoring quotas | Hit limit suddenly | Monitor proactively |
| Assuming global quota | Regional exhaustion | Check region-specific |
| Not requesting increase | Blocked scaling | Pre-request increases |
| One huge project | Inflexible | Split across projects |
| No alerting | Silent failures | Set up monitoring |
| Manual tracking | Errors | Automate via APIs |