Resource Protection: Locking & Deletion Prevention
Tại sao Resource Protection cần thiết
Accidental deletion hoặc modification là major source of production incidents:
Scenario 1: Engineer deletes production database by mistake
Result: Data loss, downtime, regulatory violations
Scenario 2: Buggy Terraform destroy script
Result: Critical infrastructure destroyed
Scenario 3: Malicious actor with edit access
Result: Service sabotage
Scenario 4: Over-permissioned service account
Result: Automation error cascadesGCP Resource Protection mechanisms:
- Resource locks (billing accounts, projects)
- Deletion protection (resources specific settings)
- Soft-delete windows (projects, backups)
- Audit logging (track who deleted what)
- IAM controls (restrict who can delete)
Project-Level Protection
Delete Protection
Projects có built-in soft-delete mechanism:
T+0: gcloud projects delete my-project
→ Status: DELETE_REQUESTED
→ Project still exists, visible, usable
T+0 to T+30 days: Soft-delete window
→ Projects can be undeleted
→ Still counts toward project quota
→ Billing stops
T+30 days: Permanent deletion
→ Project permanently gone
→ Project ID can be reused (after 30 days)
→ Quota becomes availableAdvantages:
- Accidental deletion recovery possible
- 30-day grace period for restoration
- No permanent data loss if caught early
Restore from deletion:
bash
# List deleted projects
gcloud projects list --filter="lifecycleState:DELETE_REQUESTED"
# Restore deleted project
gcloud projects undelete my-project
# Restore with new parent (if moving folders)
gcloud projects move my-project \
--folder-id=NEW_FOLDER_IDProject Quota for Deletion
Deleted projects still count against quota during grace period:
Organization quota: 10 projects
Scenario:
- Project 1-9: Active
- Project 10: Active
- Quota: 10/10 (at limit)
Delete Project 10:
- T+0 to T+30d: Project 10 in soft-delete (still counts)
- Cannot create Project 11 → would be 11/10
- Must wait 30 days OR restore & delete properly
Solution: Delete & wait 30 days before counting towards quotaResource-Level Protection
Different resources have different protection mechanisms:
Cloud Storage Buckets
bash
# Versioning: Enables object restore
gsutil versioning set on gs://my-bucket
# Delete protection: Prevent bucket deletion
gsutil retention set 1w gs://my-bucket # 1-week retention
# Lifecycle policies: Auto-delete old versions
gsutil lifecycle set policy.json gs://my-bucketPersistent Disks (Compute Engine)
bash
# Create snapshot before deleting disk
gcloud compute disks snapshot my-disk \
--snapshot-names=my-disk-backup
# Snapshots can restore disk if needed (data recovery)Cloud SQL Databases
bash
# Automatic backups: Enabled by default
gcloud sql instances patch my-instance \
--backup-start-time=03:00 \
--retained-backups-count=7
# Point-in-time recovery: Restore to any point
gcloud sql backups restore BACKUP_ID \
--backup-instance=my-instance
# Important: Backup retention default is 7 days
# If delete immediately after backup scheduled, may not have backupBigQuery Datasets
bash
# Default table expiration: Can prevent accidental data loss
bq update \
--default_table_expiration=7776000 \ # 90 days
project_id:dataset_id
# But can be overridden per table/job
# Recommendation: Remove default expiration for critical datasetsIAM-Based Protection
Prevent deletions via IAM
bash
# Deny service accounts from deleting resources
gcloud iam deny-policies create deny-deletions \
--location=projects/PROJECT_ID \
--rules='
deny {
permissions: [
"compute.instances.delete",
"compute.disks.delete",
"storage.buckets.delete",
"sqladmin.instances.delete"
]
principals: ["principalSet://goog/public:all"]
deny_rule {
deny_condition {
expression: "resource.matchTag(\"environment\", \"production\")"
}
}
}
'Role-based access control
bash
# Restrict deletion to specific roles
# Create custom role: Can manage but not delete
gcloud iam roles create projects/PROJECT_ID/roles/resourceManager \
--title="Resource Manager" \
--description="Can create/update but not delete resources" \
--permissions=\
compute.instances.create,\
compute.instances.get,\
compute.instances.setMetadata
# Grant role instead of Owner/Editor
gcloud projects add-iam-policy-binding PROJECT_ID \
--member=group:developers@company.com \
--role=projects/PROJECT_ID/roles/resourceManagerAudit Logging for Deletions
Track who deleted what
bash
# Query audit logs for delete operations
gcloud logging read \
'protoPayload.methodName="compute.instances.delete" AND severity=WARNING' \
--limit=20 \
--format=json
# Filter by resource and time
gcloud logging read \
'protoPayload.resourceName="projects/my-project/zones/us-central1-a/instances/my-vm" AND
protoPayload.methodName="compute.instances.delete"' \
--format=jsonAlert on deletions
python
from google.cloud import monitoring_v3
def create_deletion_alert(project_id):
"""Alert when production resources deleted"""
client = monitoring_v3.AlertPolicyServiceClient()
# Condition: Any delete operation on prod resources
condition = monitoring_v3.AlertPolicy.Condition(
display_name="Production resource deleted",
condition_threshold=monitoring_v3.AlertPolicy.Condition.MetricThreshold(
filter='''
resource.type="gce_instance"
AND metric.type="logging.googleapis.com/user/resource_deletion"
AND resource.labels.environment="production"
''',
comparison=monitoring_v3.ComparisonType.COMPARISON_GT,
threshold_value=0,
duration={"seconds": 60}
)
)
# Create alert
policy = monitoring_v3.AlertPolicy(
display_name="Production resource deletion alert",
conditions=[condition],
notification_channels=[create_notification_channel()]
)
client.create_alert_policy(
name=f"projects/{project_id}",
alert_policy=policy
)Backup Strategies
3-2-1 Backup Rule
For critical data:
- 3 copies: Original + 2 backups
- 2 different media types: Disk + Cloud Storage
- 1 offsite: Different region/projectpython
def backup_critical_data(source_disk):
"""Implement 3-2-1 backup for critical data"""
# Copy 1: Snapshot (disk format)
snapshot1 = create_snapshot(source_disk, name="backup-snapshot-1")
# Copy 2: Cross-regional snapshot
snapshot2 = create_snapshot_in_region(
source_disk,
name="backup-snapshot-2",
region="us-east1" # Different region
)
# Copy 3: Exported to Cloud Storage (offsite)
export_snapshot_to_storage(
snapshot1,
bucket="gs://backups-project",
path="backups/critical-data/"
)
return {
"snapshots": [snapshot1.name, snapshot2.name],
"storage_export": "gs://backups-project/backups/critical-data/"
}Immutable Backups
bash
# Cloud Storage bucket with Object Lock
gsutil bucket-lock set gs://backups-immutable
# Now objects cannot be deleted/modified
# Even admin cannot remove until retention expiresDisaster Recovery Testing
python
def dr_test():
"""Periodic DR test: Create resources from backup"""
import time
# Create test project
test_project = create_project(f"dr-test-{int(time.time())}")
try:
# Restore from backup
restore_from_snapshot(
snapshot="snapshots/critical-data-backup",
destination_project=test_project
)
# Validate restored data
assert validate_data_integrity(test_project)
print("✓ DR test passed")
finally:
# Clean up test project (after retention period)
schedule_project_deletion(test_project, delay_days=7)Terraform Destruction Protection
hcl
# Prevent accidental terraform destroy
resource "google_compute_instance" "production" {
name = "production-vm"
# Add lifecycle rule to prevent destruction
lifecycle {
prevent_destroy = true
}
}
# Alternative: Require approval
resource "null_resource" "approval_gate" {
triggers = {
production_vm = google_compute_instance.production.id
}
provisioner "local-exec" {
command = "echo 'Require manual approval before destroy'; exit 1"
when = destroy
}
}
# To actually destroy, must remove prevent_destroy
terraform destroy -auto-approve # Will fail if prevent_destroy=trueRecovery Runbook
bash
#!/bin/bash
# recover-deleted-project.sh
PROJECT_ID=$1
# Step 1: Check if project in soft-delete
STATUS=$(gcloud projects describe $PROJECT_ID --format='value(lifecycleState)')
if [ "$STATUS" == "DELETE_REQUESTED" ]; then
echo "✓ Project in soft-delete state"
# Step 2: Undelete project
gcloud projects undelete $PROJECT_ID
echo "✓ Project undeleted"
# Step 3: Verify resources
RESOURCE_COUNT=$(gcloud compute instances list \
--project=$PROJECT_ID --format=json | jq 'length')
echo "✓ Found $RESOURCE_COUNT instances"
# Step 4: Re-enable services
gcloud services enable compute.googleapis.com \
--project=$PROJECT_ID
echo "✓ Recovery complete"
else
echo "✗ Project not in soft-delete (cannot recover)"
exit 1
fiAnti-Patterns to Avoid
| Anti-pattern | Problem | Solution |
|---|---|---|
| No backups | Data loss is permanent | Implement 3-2-1 backups |
| Overpermissioned SA | Can delete anything | Restrict IAM to least privilege |
| No audit logging | Cannot trace deletions | Enable audit logging |
| Terraform destroy without safeguards | Accidental destruction | Add lifecycle protection |
| No alert on deletions | Silent failures | Monitor deletion audit logs |
| Backups in same project | Backup deleted with project | Cross-project backups |