Skip to content

Quota Management: Cơ chế giới hạn và chiến lược đối phó

Tại sao Quota Management quan trọng

Quotas là underappreciated aspect của GCP operations. Khi bạn hit quota:

gcloud compute instances create test-vm
ERROR: (gcloud.compute.instances.create) Could not fetch resource:
- Invalid value for field 'resource.projectId': 'my-project'. 
  Project 'my-project' exceeds quota for 'CPUS' in region 'us-central1'.

Hậu quả:

  • Production deployments blocked (cannot scale)
  • CI/CD pipelines fail (cannot create test environments)
  • Cost-related quotas hit → billing shocks
  • Cascading failures across microservices

Scale considerations:

  • Small org (10 projects): Rarely hit quotas
  • Medium org (100 projects): Need careful management
  • Large org (1000+ projects): Quota management critical infrastructure

Quota Hierarchy

Unlike IAM policies (which inherit), quotas ở GCP independently managed ở mỗi level:

Organization
├── Project-level quotas (primary)
├── Folder-level quotas (aggregate view)
└── Organization-level quotas (capacity planning)

Project-Level Quotas

Project-level quotas adalah primary enforcement point. Mỗi project có independent quota limit:

my-project-prod:
  - CPUs (us-central1): 24
  - CPUs (us-east1): 0
  - External IPs: 5
  - Cloud Storage: 100 TB
  - Requests/minute: 10,000

my-project-staging:
  - CPUs (us-central1): 4
  - CPUs (us-east1): 0
  - External IPs: 1
  - Cloud Storage: 10 TB
  - Requests/minute: 1,000

Project quotas are independent—hitting quota trong project A tidak affect project B.

Folder-Level Quotas

Folders provide aggregate view của quotas:

Folder/Engineering (aggregated):
  - CPUs across all projects: 100 (24+4+... from child projects)
  - External IPs across all projects: 10

Limitation: Folder-level quotas tidak enforce additional limits—họ chỉ aggregate. Enforcement happens ở project level.

Organization-Level Quotas

Organization quota adalah global capacity limits:

Organization:
  - Max Projects: 10,000
  - Max Folders: 100,000

Organization quotas very rarely hit (unless massive scale). Mais are capacity planning tools.

Types of Quotas

GCP quotas phân thành several categories:

1. Allocation Quotas

Fixed amount của resources mà project có thể allocate:

Example:
- Max 24 CPUs trong project
- Max 10 persistent disks
- Max 100 GB Cloud SQL storage

Behavior:

  • Once allocated, resource "uses up" quota
  • Even if resource idle, quota still consumed
  • Deallocating resource (delete VM) releases quota

2. Rate Quotas

Requests per time period:

Example:
- 10 requests/minute untuk Compute Engine API
- 1,000 requests/second untuk Cloud Storage API

Behavior:

  • Reset window: per-minute, per-second, per-day
  • Rate-limited requests get 429 (Too Many Requests) error
  • Temporary—quota resets when time window passes

3. Concurrent Quotas

Simultaneous in-flight operations:

Example:
- Max 100 concurrent VMs being created
- Max 50 concurrent BigQuery jobs

Behavior:

  • Temporary limit
  • Resets when operation completes

Quota Exhaustion Scenarios

Scenario 1: Rapid Scaling

python
# Application auto-scales during traffic spike
# Each autoscale attempts to create VMs

while traffic > threshold:
    create_vm()  # Hits quota after 24 CPUs allocated
    
# Result:
# ✓ 24 VMs created
# ✗ 25th VM creation fails
# ✗ Load balancer cannot reach desired replica count
# ✗ Some traffic dropped (degraded service)

Solution: Pre-request quota increase before anticipated scale event.

Scenario 2: Forgotten Resources

T+0: Developer creates test VMs/clusters for testing
T+1 day: Developer forgets to clean up
T+2 weeks: 50 test VMs consuming 100 CPUs
T+3 weeks: Production scaling fails—no quota available

Lesson: Implement resource cleanup (via Cloud Scheduler) or cost alerts

Scenario 3: Regional Quota Exhaustion

Quotas can be regional:
- US-central1: 24 CPUs (at limit)
- US-east1: 0 CPUs (at limit)
- Europe-west1: 50 CPUs (available)

Problem: Application requirements specify US regions only
Solution: Either increase quota atau redesign for multi-region

Quota Management Strategies

Strategy 1: Rightsizing from Start

Size quotas based on:
1. Peak expected load
2. Desired redundancy factor (HA)
3. Buffer for unexpected spikes (20%)

Example:
- Peak load: 10 concurrent jobs
- Desired HA: 2x for failover
- Buffer: 20%
- Required quota: 10 * 2 * 1.2 = 24 jobs

Strategy 2: Request Quota Increases Proactively

bash
# View current quota usage
gcloud compute project-info describe --project=PROJECT_ID \
  --format='value(quotas[name=CPUS].usage)'

# Request increase before hitting limit
gcloud compute project-info quotas describe \
  --filter='name=CPUS' \
  --format='value(limit)'

# Programmatic increase (via Cloud Quotas API)
gcloud quotas preferences update \
  --project=PROJECT_ID \
  --metric=compute.googleapis.com/cpus_per_region \
  --value=100

Strategy 3: Quota Alerting

python
from google.cloud import monitoring_v3

def setup_quota_alert(project_id, quota_metric, threshold=80):
    """Alert when quota usage exceeds threshold"""
    
    client = monitoring_v3.AlertPolicyServiceClient()
    
    # Create condition: quota_usage > 80%
    condition = monitoring_v3.AlertPolicy.Condition(
        display_name=f"{quota_metric} usage alert",
        condition_threshold=monitoring_v3.AlertPolicy.Condition.MetricThreshold(
            filter=f'metric.type="serviceruntime.googleapis.com/quota_used_count" AND resource.labels.quota_metric="{quota_metric}"',
            comparison=monitoring_v3.ComparisonType.COMPARISON_GT,
            threshold_value=threshold,
            duration={"seconds": 300}
        )
    )
    
    # Create notification channel (email)
    notify_channel = create_email_notification_channel("team@company.com")
    
    # Create alert policy
    policy = monitoring_v3.AlertPolicy(
        display_name=f"Alert: {quota_metric} quota usage",
        conditions=[condition],
        notification_channels=[notify_channel],
        alert_strategy=monitoring_v3.AlertPolicy.AlertStrategy(
            auto_close={"seconds": 86400}
        )
    )
    
    client.create_alert_policy(name=f"projects/{project_id}", alert_policy=policy)
    print(f"✓ Alert created for {quota_metric}")

Strategy 4: Multi-Project Load Distribution

Distribute workloads across multiple projects để avoid quota exhaustion:

python
import random

def distribute_workload(projects, workload_units):
    """Distribute workload across projects"""
    
    workloads = {}
    for unit in workload_units:
        # Select project with available capacity
        project = select_project_with_capacity(projects)
        
        if project not in workloads:
            workloads[project] = []
        workloads[project].append(unit)
    
    return workloads

def select_project_with_capacity(projects):
    """Select project with most available quota"""
    
    project_capacities = {}
    for project in projects:
        available = get_available_quota(project)
        project_capacities[project] = available
    
    # Return project with highest available quota
    return max(project_capacities, key=project_capacities.get)

Quota Override (Capping Usage)

Sometimes bạn intentionally want lower quota untuk prevent runaway costs:

bash
# Set quota override to cap usage
gcloud quotas preferences update \
  --project=PROJECT_ID \
  --metric=compute.googleapis.com/external_ips \
  --value=2  # Cap external IPs at 2 (normally 5)

# Reason: Cost control, prevent accidental resource allocation

Use cases:

  • Development project: Cap resources to prevent cost overruns
  • Specific team: Limit their resource consumption
  • Cost control: Hard cap untuk team budgets

Quota Request Process

When you hit quota limit and need increase:

T+0: Submit quota request
     gcloud quotas preferences update \
       --project=PROJECT_ID \
       --metric=METRIC \
       --value=NEW_VALUE

T+few minutes: Google reviews request
     - Automated checks (no abuse, reasonable)
     - Manual review if high increase

T+minutes to hours: Decision
     - Approved ✓
     - Denied ✗
     - Approved with conditions

Result:
- Approved: Quota increased
- Denied: Reason provided (must contact support)

Tips untuk approvals:

  • Request gradually (don't jump from 10 to 1000)
  • Explain business need
  • Show usage trends (if possible)
  • Reference SLA/uptime requirements

Service-Specific Quota Peculiarities

Compute Engine

Quotas per:
- Region (CPUs, External IPs, Disks)
- Zone (GPU, Local SSD)
- Global (Images, Snapshots, Security Policies)

Example:
- us-central1: 24 CPUs
- us-central1-a: 4 GPUs
- Global: 5 snapshots

Cloud Storage

Quotas:
- Per-project storage (unlimited, but billing limit)
- Per-bucket object count (practical: billions)
- API requests: Rate limited (not hard quota)

BigQuery

Quotas per project:
- Concurrent queries: 100
- Query timeout: 6 hours max
- Slot hours: If reserved capacity model
- Data size: Unlimited (billing based)

Testing Quota Behavior

python
def test_quota_enforcement():
    """Test project quota enforcement"""
    
    # Create resources up to quota limit
    project_quota = get_quota(PROJECT_ID, "CPUS")
    vms = []
    
    try:
        for i in range(project_quota + 5):
            vm = create_vm(f"test-vm-{i}", machine_type="n1-standard-4")
            vms.append(vm)
    except QuotaExceededException as e:
        print(f"✓ Quota enforcement working: {e}")
    
    finally:
        # Cleanup
        for vm in vms:
            delete_vm(vm)

def test_quota_reset_behavior():
    """Test rate quota reset"""
    
    # Make API calls up to rate limit
    for i in range(10001):  # Assuming 10k/min limit
        try:
            call_api()
        except RateLimitedException:
            print(f"✓ Rate limit hit at request {i}")
            break
    
    # Wait for window reset
    time.sleep(65)
    
    # Verify new requests succeed
    try:
        call_api()
        print("✓ Rate limit reset")
    except RateLimitedException:
        print("✗ Rate limit not reset")

Terraform Quota Management

hcl
# Example: Don't have quota manager in Terraform
# Instead, ensure resource creation respects quotas

resource "google_compute_instance" "app" {
  count = local.vm_count
  
  # Will fail if quota exceeded
  # Catch error, increase quota, retry
  
  name         = "app-vm-${count.index}"
  machine_type = "n1-standard-4"
  zone         = "us-central1-a"
  
  # Metadata for tracking
  metadata = {
    environment = var.environment
    app         = "production"
  }
}

locals {
  # Calculate required VMs based on load
  desired_vms = var.expected_load / var.requests_per_vm
  
  # Request sufficient quota before creating
  required_quota = desired_vms * 4  # 4 CPUs per VM
}

# Output: Verify quota before apply
output "required_quota" {
  value = local.required_quota
}

Anti-Patterns to Avoid

Anti-patternProblemSolution
Ignoring quotasHit limit suddenlyMonitor proactively
Assuming global quotaRegional exhaustionCheck region-specific
Not requesting increaseBlocked scalingPre-request increases
One huge projectInflexibleSplit across projects
No alertingSilent failuresSet up monitoring
Manual trackingErrorsAutomate via APIs

References