Skip to content

Resource Protection: Locking & Deletion Prevention

Tại sao Resource Protection cần thiết

Accidental deletion hoặc modification là major source of production incidents:

Scenario 1: Engineer deletes production database by mistake
Result: Data loss, downtime, regulatory violations

Scenario 2: Buggy Terraform destroy script
Result: Critical infrastructure destroyed

Scenario 3: Malicious actor with edit access
Result: Service sabotage

Scenario 4: Over-permissioned service account
Result: Automation error cascades

GCP Resource Protection mechanisms:

  1. Resource locks (billing accounts, projects)
  2. Deletion protection (resources specific settings)
  3. Soft-delete windows (projects, backups)
  4. Audit logging (track who deleted what)
  5. IAM controls (restrict who can delete)

Project-Level Protection

Delete Protection

Projects có built-in soft-delete mechanism:

T+0: gcloud projects delete my-project
     → Status: DELETE_REQUESTED
     → Project still exists, visible, usable

T+0 to T+30 days: Soft-delete window
     → Projects can be undeleted
     → Still counts toward project quota
     → Billing stops

T+30 days: Permanent deletion
     → Project permanently gone
     → Project ID can be reused (after 30 days)
     → Quota becomes available

Advantages:

  • Accidental deletion recovery possible
  • 30-day grace period for restoration
  • No permanent data loss if caught early

Restore from deletion:

bash
# List deleted projects
gcloud projects list --filter="lifecycleState:DELETE_REQUESTED"

# Restore deleted project
gcloud projects undelete my-project

# Restore with new parent (if moving folders)
gcloud projects move my-project \
  --folder-id=NEW_FOLDER_ID

Project Quota for Deletion

Deleted projects still count against quota during grace period:

Organization quota: 10 projects

Scenario:
- Project 1-9: Active
- Project 10: Active
- Quota: 10/10 (at limit)

Delete Project 10:
- T+0 to T+30d: Project 10 in soft-delete (still counts)
- Cannot create Project 11 → would be 11/10
- Must wait 30 days OR restore & delete properly

Solution: Delete & wait 30 days before counting towards quota

Resource-Level Protection

Different resources have different protection mechanisms:

Cloud Storage Buckets

bash
# Versioning: Enables object restore
gsutil versioning set on gs://my-bucket

# Delete protection: Prevent bucket deletion
gsutil retention set 1w gs://my-bucket  # 1-week retention

# Lifecycle policies: Auto-delete old versions
gsutil lifecycle set policy.json gs://my-bucket

Persistent Disks (Compute Engine)

bash
# Create snapshot before deleting disk
gcloud compute disks snapshot my-disk \
  --snapshot-names=my-disk-backup

# Snapshots can restore disk if needed (data recovery)

Cloud SQL Databases

bash
# Automatic backups: Enabled by default
gcloud sql instances patch my-instance \
  --backup-start-time=03:00 \
  --retained-backups-count=7

# Point-in-time recovery: Restore to any point
gcloud sql backups restore BACKUP_ID \
  --backup-instance=my-instance

# Important: Backup retention default is 7 days
# If delete immediately after backup scheduled, may not have backup

BigQuery Datasets

bash
# Default table expiration: Can prevent accidental data loss
bq update \
  --default_table_expiration=7776000 \  # 90 days
  project_id:dataset_id

# But can be overridden per table/job
# Recommendation: Remove default expiration for critical datasets

IAM-Based Protection

Prevent deletions via IAM

bash
# Deny service accounts from deleting resources
gcloud iam deny-policies create deny-deletions \
  --location=projects/PROJECT_ID \
  --rules='
    deny {
      permissions: [
        "compute.instances.delete",
        "compute.disks.delete",
        "storage.buckets.delete",
        "sqladmin.instances.delete"
      ]
      principals: ["principalSet://goog/public:all"]
      deny_rule {
        deny_condition {
          expression: "resource.matchTag(\"environment\", \"production\")"
        }
      }
    }
  '

Role-based access control

bash
# Restrict deletion to specific roles

# Create custom role: Can manage but not delete
gcloud iam roles create projects/PROJECT_ID/roles/resourceManager \
  --title="Resource Manager" \
  --description="Can create/update but not delete resources" \
  --permissions=\
compute.instances.create,\
compute.instances.get,\
compute.instances.setMetadata

# Grant role instead of Owner/Editor
gcloud projects add-iam-policy-binding PROJECT_ID \
  --member=group:developers@company.com \
  --role=projects/PROJECT_ID/roles/resourceManager

Audit Logging for Deletions

Track who deleted what

bash
# Query audit logs for delete operations
gcloud logging read \
  'protoPayload.methodName="compute.instances.delete" AND severity=WARNING' \
  --limit=20 \
  --format=json

# Filter by resource and time
gcloud logging read \
  'protoPayload.resourceName="projects/my-project/zones/us-central1-a/instances/my-vm" AND 
   protoPayload.methodName="compute.instances.delete"' \
  --format=json

Alert on deletions

python
from google.cloud import monitoring_v3

def create_deletion_alert(project_id):
    """Alert when production resources deleted"""
    
    client = monitoring_v3.AlertPolicyServiceClient()
    
    # Condition: Any delete operation on prod resources
    condition = monitoring_v3.AlertPolicy.Condition(
        display_name="Production resource deleted",
        condition_threshold=monitoring_v3.AlertPolicy.Condition.MetricThreshold(
            filter='''
            resource.type="gce_instance"
            AND metric.type="logging.googleapis.com/user/resource_deletion"
            AND resource.labels.environment="production"
            ''',
            comparison=monitoring_v3.ComparisonType.COMPARISON_GT,
            threshold_value=0,
            duration={"seconds": 60}
        )
    )
    
    # Create alert
    policy = monitoring_v3.AlertPolicy(
        display_name="Production resource deletion alert",
        conditions=[condition],
        notification_channels=[create_notification_channel()]
    )
    
    client.create_alert_policy(
        name=f"projects/{project_id}",
        alert_policy=policy
    )

Backup Strategies

3-2-1 Backup Rule

For critical data:

- 3 copies: Original + 2 backups
- 2 different media types: Disk + Cloud Storage
- 1 offsite: Different region/project
python
def backup_critical_data(source_disk):
    """Implement 3-2-1 backup for critical data"""
    
    # Copy 1: Snapshot (disk format)
    snapshot1 = create_snapshot(source_disk, name="backup-snapshot-1")
    
    # Copy 2: Cross-regional snapshot
    snapshot2 = create_snapshot_in_region(
        source_disk,
        name="backup-snapshot-2",
        region="us-east1"  # Different region
    )
    
    # Copy 3: Exported to Cloud Storage (offsite)
    export_snapshot_to_storage(
        snapshot1,
        bucket="gs://backups-project",
        path="backups/critical-data/"
    )
    
    return {
        "snapshots": [snapshot1.name, snapshot2.name],
        "storage_export": "gs://backups-project/backups/critical-data/"
    }

Immutable Backups

bash
# Cloud Storage bucket with Object Lock
gsutil bucket-lock set gs://backups-immutable

# Now objects cannot be deleted/modified
# Even admin cannot remove until retention expires

Disaster Recovery Testing

python
def dr_test():
    """Periodic DR test: Create resources from backup"""
    
    import time
    
    # Create test project
    test_project = create_project(f"dr-test-{int(time.time())}")
    
    try:
        # Restore from backup
        restore_from_snapshot(
            snapshot="snapshots/critical-data-backup",
            destination_project=test_project
        )
        
        # Validate restored data
        assert validate_data_integrity(test_project)
        print("✓ DR test passed")
        
    finally:
        # Clean up test project (after retention period)
        schedule_project_deletion(test_project, delay_days=7)

Terraform Destruction Protection

hcl
# Prevent accidental terraform destroy

resource "google_compute_instance" "production" {
  name = "production-vm"
  
  # Add lifecycle rule to prevent destruction
  lifecycle {
    prevent_destroy = true
  }
}

# Alternative: Require approval
resource "null_resource" "approval_gate" {
  triggers = {
    production_vm = google_compute_instance.production.id
  }
  
  provisioner "local-exec" {
    command = "echo 'Require manual approval before destroy'; exit 1"
    when    = destroy
  }
}

# To actually destroy, must remove prevent_destroy
terraform destroy -auto-approve  # Will fail if prevent_destroy=true

Recovery Runbook

bash
#!/bin/bash
# recover-deleted-project.sh

PROJECT_ID=$1

# Step 1: Check if project in soft-delete
STATUS=$(gcloud projects describe $PROJECT_ID --format='value(lifecycleState)')

if [ "$STATUS" == "DELETE_REQUESTED" ]; then
    echo "✓ Project in soft-delete state"
    
    # Step 2: Undelete project
    gcloud projects undelete $PROJECT_ID
    echo "✓ Project undeleted"
    
    # Step 3: Verify resources
    RESOURCE_COUNT=$(gcloud compute instances list \
      --project=$PROJECT_ID --format=json | jq 'length')
    echo "✓ Found $RESOURCE_COUNT instances"
    
    # Step 4: Re-enable services
    gcloud services enable compute.googleapis.com \
      --project=$PROJECT_ID
    
    echo "✓ Recovery complete"
else
    echo "✗ Project not in soft-delete (cannot recover)"
    exit 1
fi

Anti-Patterns to Avoid

Anti-patternProblemSolution
No backupsData loss is permanentImplement 3-2-1 backups
Overpermissioned SACan delete anythingRestrict IAM to least privilege
No audit loggingCannot trace deletionsEnable audit logging
Terraform destroy without safeguardsAccidental destructionAdd lifecycle protection
No alert on deletionsSilent failuresMonitor deletion audit logs
Backups in same projectBackup deleted with projectCross-project backups

References