Resource Manager API: Quy trình tự động hóa và tính nhất quán
Tại sao Resource Manager API quan trọng
Khi quy mô GCP deployments lên đến hàng trăm projects, management thủ công thông qua Google Cloud Console trở nên impossible at scale. Resource Manager API là programmatic interface để:
- Create/delete projects
- Move projects giữa folders
- List resources theo hierarchy
- Query resource state
- Apply IAM policies hierarchically
- Manage project labels programmatically
Production reality: Tất cả enterprise GCP orgs sử dụng Infrastructure as Code tooling (Terraform, Pulumi, CloudFormation equivalents) để automate resource creation. Resource Manager API là backbone của tất cả tooling này.
Common failure modes:
- Scripting automation không account cho propagation delays → inconsistent state
- Querying APIs trước khi resources fully propagated → 404 errors, race conditions
- Assuming synchronous operations khi thực ra asynchronous → partial failures không được detect
- Retry logic không implement properly → silent failures trong CI/CD pipelines
Resource Manager API Architecture
Resource Manager API tách biệt làm ba main components:
1. Resource API (Control Plane)
Handles project/folder CRUD operations:
# Create project
curl -X POST https://cloudresourcemanager.googleapis.com/v3/projects \
-H "Authorization: Bearer $TOKEN" \
-d '{
"projectId": "my-new-project",
"displayName": "My New Project",
"parent": "folders/1234567890"
}'
# Response trả về Operation object (asynchronous)
{
"name": "operations/cp.123456789",
"done": false,
"createTime": "2026-06-01T10:00:00Z"
}Important: Tất cả project creation là asynchronous. API trả về Operation object với done: false, không phải project resource trực tiếp.
2. Metadata API (Propagation Layer)
Sau khi project được created, metadata cần phải propagate tới tất cả GCP services:
- API enablement metadata
- IAM policy cache
- Quota systems
- Billing systems
- Service-specific metadata
Propagation timeline (typical):
T+0s: Project created ở control plane
T+0.5-2s: Project visible ở GCP Console
T+2-5s: IAM policies begin propagating
T+5-10s: Most GCP services see project
T+10-30s: All services fully consistent (eventual consistency)
T+30s+: Caching/propagation complete3. Query API (Cloud Asset Inventory)
Programmatic query về resource hierarchy:
# Search all resources trong organization
gcloud asset search-all-resources \
--scope=organizations/123456789 \
--asset-types=compute.googleapis.com/Instance,storage.googleapis.com/BucketPhía sau, Cloud Asset Inventory maintain indexed view của tất cả resources—cho phép efficient queries. Nhưng queries cũng affected by eventual consistency.
Eventual Consistency Model
GCP không guarantee strong consistency cho resource hierarchy operations. Thay vì vậy:
- Control plane: Strong consistency (project creation immediately visible ở API)
- Data plane: Eventual consistency (services see project changes after 5-30s)
- Client libraries: May cache results (additional delay)
Why? Strong consistency would require:
- Synchronizing state tới tất cả regions
- Blocking on all dependent services
- Much higher latency (potentially 100s of milliseconds)
Instead, GCP chose high availability + eventual consistency, which is correct trade-off cho majority of workloads.
Practical Implications
Scenario 1: Project Creation + API Enablement
from google.cloud import resourcemanager
from google.cloud import resource_manager
# Create project
rm_client = resourcemanager.Client()
project = rm_client.project(project_id)
project.create()
print(f"Created project {project_id}")
# ❌ PROBLEM: Immediate API enablement
serviceusage_client = service_usage.ServiceUsageClient()
request = service_usage.BatchEnableServicesRequest(
parent=f"projects/{project_id}",
service_names=["compute.googleapis.com"]
)
response = serviceusage_client.batch_enable_services(request)
# May fail with "project not found" if propagation not complete
# ✅ BETTER: Retry with backoff
import time
from google.api_core import retry
@retry.Retry(deadline=60)
def enable_api_with_retry(project_id, api_name):
try:
request = service_usage.BatchEnableServicesRequest(
parent=f"projects/{project_id}",
service_names=[api_name]
)
return serviceusage_client.batch_enable_services(request)
except google.api_core.exceptions.NotFound:
# Project not yet visible to Service Usage API
time.sleep(2)
raise # Retry will handle
enable_api_with_retry(project_id, "compute.googleapis.com")Scenario 2: Project Move + Networking
# Move project tới different folder
move_request = ResourceManager::MoveProjectRequest(
project_name=f"projects/{project_number}",
folder_name=f"folders/{new_folder_id}"
)
response = resource_manager_stub.MoveProject(move_request)
# ❌ PROBLEM: Immediate assumption project moved
print(f"Project moved to folder {new_folder_id}")
# At this point:
# - Project metadata shows new folder (strong consistency)
# - BUT: VPC peering relationships, firewall rules, may not be updated yet
# - Networking APIs may still see project in old location (stale cache)
# ✅ BETTER: Verify through multiple checks
import time
def verify_project_moved(project_id, expected_folder_id, max_retries=30):
for attempt in range(max_retries):
project = resource_manager_client.get_project(project_id)
parent_id = project.parent.id
if parent_id == expected_folder_id:
print(f"✓ Metadata updated (attempt {attempt})")
break
time.sleep(1)
# Additional check: Verify through IAM policy propagation
# If IAM policy changes visible, folder move likely complete
for attempt in range(max_retries):
try:
policy = iam_client.get_iam_policy(project_id)
# If we successfully get policy, likely propagated
break
except Exception:
time.sleep(1)
verify_project_moved(project_id, expected_folder_id)API Quotas & Rate Limiting
Resource Manager API có global rate limits:
| Operation | Limit |
|---|---|
| Projects created per minute | 5 |
| Projects created per day | 500 (per organization) |
| Folder operations per minute | 60 |
| IAM policy updates per minute | 10 per resource |
Production implication: Nếu automation tạo 100 projects, không thể parallel-spawn tất cả—phải batch dengan delays.
# ❌ Too fast - will hit rate limit
for i in range(100):
create_project(f"project-{i}")
# ✅ Better - batch with delays
import concurrent.futures
from time import sleep
def create_projects_with_rate_limit(project_ids, batch_size=5, delay=15):
"""Create projects respecting rate limits"""
for batch in [project_ids[i:i+batch_size] for i in range(0, len(project_ids), batch_size)]:
futures = []
with concurrent.futures.ThreadPoolExecutor(max_workers=batch_size) as executor:
for project_id in batch:
future = executor.submit(create_project, project_id)
futures.append(future)
# Wait for batch to complete
for future in concurrent.futures.as_completed(futures):
try:
future.result()
except Exception as e:
print(f"Error creating project: {e}")
# Delay between batches
sleep(delay)
create_projects_with_rate_limit([f"proj-{i}" for i in range(100)])Querying the Hierarchy
Approach 1: Resource Manager API (Legacy)
# List projects ở folder
gcloud resource-manager projects list \
--filter="parent.id:FOLDER_ID"Problems:
- Chỉ list projects, không list folders
- Không có depth/nesting info
- Không scalable cho deep hierarchies
Approach 2: Cloud Asset Inventory (Recommended)
# Query resources theo hierarchy
gcloud asset search-all-resources \
--scope=organizations/ORG_ID \
--asset-types=cloudresourcemanager.googleapis.com/Project \
--format="table(name,displayName,parent)"
# Filter by parent
gcloud asset search-all-resources \
--scope=organizations/ORG_ID \
--asset-types=cloudresourcemanager.googleapis.com/Project \
--query="parent.display_name:'Engineering'" \
--format="csv(name,displayName,parent.displayName)"Advantages:
- Queryable ở depth
- Support filtering, regex, custom expressions
- Scalable (uses indexed backend)
- Can query across org tại scale
API vs Console Discrepancy
Sometimes GCP Console shows resources, nhưng API queries don't—vì Console caching vs API consistency.
# Console shows project, but:
gcloud projects describe my-project
# Error: Project 'my-project' not found
# Solution: The project exists, but hasn't propagated to Resource Manager API yet
# Wait and retryHandling Failures & Race Conditions
Idempotency & Project Deletion
Project deletion có special behavior:
T+0: gcloud projects delete my-project
→ Project marked DELETE_REQUESTED
→ Can still be restored
T+0 to T+30d: Project in "soft delete" state
→ Still counts against quota
→ IAM policies still exist
→ Billing stops
T+30d: Project permanently deletedProduction implication: Nếu automation tạo project, delete, rồi immediately tạo lại với same ID—sẽ fail vì project ID trong soft-delete window.
# ❌ This will fail
project = create_project("my-project")
delete_project("my-project")
project = create_project("my-project") # Fails! Project ID reserved
# ✅ Solution 1: Wait for permanent deletion (30 days)
# ✅ Solution 2: Use different project IDs
# ✅ Solution 3: Check deletion status first
def safe_create_project(project_id, timeout=30):
try:
response = create_project(project_id)
return response
except google.api_core.exceptions.AlreadyExists:
# Project might be in soft-delete state
project = get_project(project_id)
if project.lifecycle_state == "DELETE_REQUESTED":
print(f"Project {project_id} in soft-delete. Cannot recreate for 30 days.")
raise
else:
raiseCross-API Consistency
When cross-project resources (e.g., shared VPC), Resource Manager operations dapat cause temporary inconsistencies:
T+0: Move project X tới folder Y (host project)
→ Project metadata updated
→ Pero: firewall rules, routes still cached ở data plane
T+5: User tries tạo VM ở project X → firewall enforcement may be inconsistent
→ VM creation may succeed, pero filtering inconsistent
T+30: All services see consistent stateMitigation:
- After moving projects, wait 30-60 seconds sebelum create dependent resources
- Test cross-project functionality after moves
Monitoring API Health
# Check Resource Manager API quota usage
gcloud compute project-info describe --project=PROJECT_ID \
--format='value(quotas[name=PROJECTS].usage)'
# Monitor rate limit errors in logs
gcloud logging read \
'severity=ERROR AND resource.type="api" AND protoPayload.serviceName="cloudresourcemanager.googleapis.com"' \
--limit=50Terraform with Resource Manager
# Terraform handles eventual consistency automatically
terraform {
required_providers {
google = {
source = "hashicorp/google"
version = "~> 5.0"
}
}
}
provider "google" {
project = "my-terraform-project"
region = "us-central1"
}
resource "google_project" "prod" {
name = "Production Project"
project_id = "my-prod-project"
folder_id = google_folder.prod.id
billing_account = var.billing_account_id
}
resource "google_project_service" "compute" {
project = google_project.prod.project_id
service = "compute.googleapis.com"
# Terraform automatically waits for project propagation
depends_on = [google_project.prod]
}
# Terraform handles retries transparentlyImportant: Terraform provider internally retries eventual consistency issues, pero you still dapat hit edge cases:
# These may race if not careful
resource "google_project" "my_project" {
# ...
}
resource "google_compute_network" "default" {
project = google_project.my_project.project_id
name = "default"
# Must wait for project fully propagated
depends_on = [google_project.my_project]
}Best Practices
Always retry with exponential backoff khi tạo resources:
python@retry.Retry(initial=1, maximum=60, multiplier=2) def create_with_retry(project_id): return create_project(project_id)Never assume synchronous operation:
- Project creation → async
- IAM policy propagation → async
- Service enablement → async
Query through consistent channels:
- Cloud Asset Inventory > Resource Manager API
- Asset Inventory has better indexing, consistency guarantees
Implement comprehensive monitoring:
- Log all resource manager operations
- Alert on rate limiting
- Track propagation delays
Test automation thoroughly:
- Test project creation + immediate API usage
- Test project moves + dependent resource operations
- Test across regions