Private DNS Zones: VPC Binding & Zone Discovery
Tại sao điều này quan trọng
Private DNS zones là foundational untuk service discovery trong GCP. Hiểu cơ chế binding là critical để tránh:
- Services không resolve từ các VPCs khác
- Security violations (wrong VPC can access records)
- Operational confusion (which zone, which VPC?)
Private Zone Mechanics
VPC Binding Model
Private zone chỉ resolve trong authorized VPCs:
Zone "db.internal.example.com"
Bound to: VPC A, VPC B
VPC A resources: ✓ Can resolve db.internal.example.com
VPC B resources: ✓ Can resolve db.internal.example.com
VPC C resources: ✗ Cannot resolve (not bound)Resolver Behavior
VPC A VM (10.0.1.5) queries: service.db.internal.example.com
Step 1: VPC resolver (169.254.169.254) intercepts query
Step 2: Check all private zones bound to VPC A
Step 3: Match found: db.internal.example.com zone
Step 4: Return records from zone (10.0.3.50 for service)
Step 5: Result: 10.0.3.50
VPC C VM (172.16.1.5) queries: service.db.internal.example.com
Step 1: VPC resolver intercepts query
Step 2: Check all private zones bound to VPC C
Step 3: No match (zone not bound to VPC C)
Step 4: Forward to public DNS (8.8.8.8)
Step 5: Result: NXDOMAINImplementation
Single VPC Setup
# Create private zone bound to VPC
gcloud dns managed-zones create db-internal \
--dns-name=db.internal.example.com \
--visibility=private \
--networks=projects/PROJECT/global/networks/default
# Add records
gcloud dns record-sets transaction start --zone=db-internal
gcloud dns record-sets transaction add 10.0.3.50 \
--name=primary.db.internal.example.com \
--type=A \
--ttl=300 \
--zone=db-internal
gcloud dns record-sets transaction add 10.0.3.51 \
--name=secondary.db.internal.example.com \
--type=A \
--ttl=300 \
--zone=db-internal
gcloud dns record-sets transaction execute --zone=db-internalMulti-VPC Setup
# Create zone bound to multiple VPCs
gcloud dns managed-zones create shared-internal \
--dns-name=shared.internal.example.com \
--visibility=private \
--networks=projects/PROJECT/global/networks/vpc-a,projects/PROJECT/global/networks/vpc-b,projects/PROJECT/global/networks/vpc-c
# Update to add more VPCs later
gcloud dns managed-zones update shared-internal \
--networks=projects/PROJECT/global/networks/vpc-a,projects/PROJECT/global/networks/vpc-b,projects/PROJECT/global/networks/vpc-c,projects/PROJECT/global/networks/vpc-dTerraform
resource "google_compute_network" "vpc_a" {
name = "vpc-a"
auto_create_subnetworks = false
}
resource "google_compute_network" "vpc_b" {
name = "vpc-b"
auto_create_subnetworks = false
}
resource "google_dns_managed_zone" "shared" {
name = "shared-zone"
dns_name = "shared.internal.example.com."
visibility = "private"
private_visibility_config {
networks_list {
network_url = google_compute_network.vpc_a.id
}
networks_list {
network_url = google_compute_network.vpc_b.id
}
}
}
resource "google_dns_record_set" "service" {
name = "api.shared.internal.example.com."
type = "A"
ttl = 300
managed_zone = google_dns_managed_zone.shared.name
rrdatas = ["10.0.1.100", "10.0.2.100"] # Dual-region
}Zone Discovery
How Zone Discovery Works
When VM in VPC queries domain, resolver checks all bound zones in order:
VM in VPC A queries: api.example.com
Resolver checks bound zones:
1. api.example.com zone? → Not bound to VPC A
2. example.com zone? → Bound to VPC A, but no api record
3. com zone? → Not in Cloud DNS
→ Default to public DNS
→ Result: Falls through to internet resolverZone Naming Hierarchy
Correct naming enables efficient zone discovery:
Root zone: example.com
└── Bound to VPC A
Records: example.com A 10.0.1.5
Sub-zone: api.example.com
└── Bound to VPC A
Records: api.example.com A 10.0.2.5
Query: api.example.com
Resolver checks api.example.com zone first
→ Found, returns 10.0.2.5
Query: service.api.example.com
Resolver checks zones in hierarchy
→ No zone for service.api.example.com
→ Check api.example.com zone
→ Found? Then returns records
→ Not found? Check example.com zoneImplication: Zone names should form delegation tree:
example.com (root)
├── api.example.com (API services)
├── db.example.com (Database services)
└── internal.example.com (Internal tools)When query arrives, resolver traverses tree efficiently.
Production Patterns
Pattern 1: Environment-Based Zones
prod.internal.example.com (bound to prod-vpc)
└── services in prod-vpc resolve here
staging.internal.example.com (bound to staging-vpc)
└── services in staging-vpc resolve here
dev.internal.example.com (bound to dev-vpc)
└── services in dev-vpc resolve hereBenefit: Complete isolation, cannot cross-environment by mistake.
Pattern 2: Shared VPC with Centralized Zones
Host Project (central DNS):
└── Shared VPC
└── Private Zone "internal.example.com"
Bound to: Shared VPC
Service Projects (use shared VPC):
└── Resources in Shared VPC
└── Can resolve internal.example.comBenefit: One zone, managed centrally, accessible by all teams.
Pattern 3: Service-Specific Zones
auth.internal.example.com (auth team zone)
└── Bound to: auth-vpc
backend.internal.example.com (backend team zone)
└── Bound to: backend-vpc
data.internal.example.com (data team zone)
└── Bound to: data-vpcBenefit: Per-team ownership, independent zones.
Multi-VPC Challenges
Challenge 1: Zone Binding Delays
When binding zone to new VPC, propagation takes ~60 seconds:
T+0: Update zone, add VPC binding
T+0-60s: New VPC resources still cannot resolve (cache/propagation delay)
T+60s: Zone binding activeImplication: Don't expect immediate resolution after binding update.
Challenge 2: Cross-Project VPC
Private zones only bound within same project's VPCs:
Project A: VPC A
└── Private Zone "a.internal" (bound to VPC A)
Project B: VPC B
└── Cannot bind directly to Project A's zone (different project)
Solution: Use DNS PeeringWhen cross-project resolution needed: Use DNS Peering (separate pattern).
Challenge 3: VPC Quota
Private zone can bound to max ~100 VPCs
(May vary by quota)
If more than 100 VPCs:
→ Cannot single zone
→ Solution: Zone replication or peeringGKE Integration
GKE Automatic Service DNS
GKE automatically creates service DNS entries:
Service "api" in namespace "default":
Kubernetes DNS name: api.default.svc.cluster.local
GKE configures private zone:
cluster.local zone (private, bound to GKE VPC)
Records:
- api.default.svc.cluster.local A 10.4.0.50 (service IP)Result: Pods automatically can resolve service names without additional config.
Exposing GKE Services
To expose GKE service beyond cluster:
# Create private zone for external access
gcloud dns managed-zones create gke-services \
--dns-name=k8s.internal.example.com \
--visibility=private \
--networks=projects/PROJECT/global/networks/default
# Add service record
gcloud dns record-sets transaction start --zone=gke-services
gcloud dns record-sets transaction add 10.4.0.50 \
--name=api.k8s.internal.example.com \
--type=A \
--ttl=60 \
--zone=gke-services
gcloud dns record-sets transaction execute --zone=gke-services
# Now external services (same VPC) can resolve
# api.k8s.internal.example.com → 10.4.0.50Troubleshooting
Issue 1: Cannot Resolve Private Zone Record
# From VM in VPC:
nslookup api.internal.example.com
# Result: NXDOMAIN
Debug:
1. Check zone exists and is private
gcloud dns managed-zones describe api-zone
2. Check VPC is bound
gcloud dns managed-zones describe api-zone \
--format="value(privateVisibilityConfig.networks[].networkUrl)"
Should show your VPC
3. Check record exists
gcloud dns record-sets list --zone=api-zone \
--filter="name:api.internal.example.com"
4. Check VM's VPC
gcloud compute instances describe VM_NAME --zone=ZONE \
--format="value(networkInterfaces[0].network)"
Should match bound VPC
5. Test from VM
If using custom /etc/resolv.conf, might not use VPC resolver
cat /etc/resolv.conf | grep nameserver
Should include 169.254.169.254 or be empty (use default)Issue 2: Zone Exists but Cannot Resolve from Some VPCs
Zone bound to: VPC A, VPC B
VM in VPC C cannot resolve
Debug:
Confirm VM is in VPC C (not wrong VPC)
If truly VPC C: Zone must be rebound to include VPC C
Solution:
gcloud dns managed-zones update zone-name \
--networks=VPC_A,VPC_B,VPC_CIssue 3: Negative Caching (Records Deleted but Still Resolving)
Zone record: api.internal.example.com A 10.0.1.5
TTL: 300
Delete record at T+0
Query at T+100 (still within TTL):
VM cache still has old record
→ Returns 10.0.1.5 (outdated)
Query at T+300+1 (after TTL):
Cache expires
→ New query to Cloud DNS
→ Returns updated recordMitigation: Reduce TTL before planned record changes.
Monitoring
# Monitor zone bindings
for zone in $(gcloud dns managed-zones list --filter="visibility:PRIVATE" --format="value(name)"); do
echo "Zone: $zone"
gcloud dns managed-zones describe $zone --format="value(privateVisibilityConfig.networks[].networkUrl)" | wc -l
done
# Monitor resolution errors
gcloud logging read "resource.type=dns_query AND severity=ERROR" --limit=50
# Alert if zone binding count drops (indicates failure)Best Practices
- Use zone hierarchy (example.com → api.example.com)
- Clear naming convention (prod.internal.example.com)
- Document which VPCs are bound to each zone
- Test cross-VPC resolution before production
- Plan for growth (what if add 50 more VPCs?)
- Monitor zone health (queries failing?)
- Backup zone configs (Terraform IaC)
- TTL tuning based on change frequency