Skip to content

Private DNS Zones: VPC Binding & Zone Discovery

Tại sao điều này quan trọng

Private DNS zones là foundational untuk service discovery trong GCP. Hiểu cơ chế binding là critical để tránh:

  • Services không resolve từ các VPCs khác
  • Security violations (wrong VPC can access records)
  • Operational confusion (which zone, which VPC?)

Private Zone Mechanics

VPC Binding Model

Private zone chỉ resolve trong authorized VPCs:

Zone "db.internal.example.com"
  Bound to: VPC A, VPC B
  
VPC A resources: ✓ Can resolve db.internal.example.com
VPC B resources: ✓ Can resolve db.internal.example.com
VPC C resources: ✗ Cannot resolve (not bound)

Resolver Behavior

VPC A VM (10.0.1.5) queries: service.db.internal.example.com

Step 1: VPC resolver (169.254.169.254) intercepts query
Step 2: Check all private zones bound to VPC A
Step 3: Match found: db.internal.example.com zone
Step 4: Return records from zone (10.0.3.50 for service)
Step 5: Result: 10.0.3.50

VPC C VM (172.16.1.5) queries: service.db.internal.example.com

Step 1: VPC resolver intercepts query
Step 2: Check all private zones bound to VPC C
Step 3: No match (zone not bound to VPC C)
Step 4: Forward to public DNS (8.8.8.8)
Step 5: Result: NXDOMAIN

Implementation

Single VPC Setup

bash
# Create private zone bound to VPC
gcloud dns managed-zones create db-internal \
  --dns-name=db.internal.example.com \
  --visibility=private \
  --networks=projects/PROJECT/global/networks/default

# Add records
gcloud dns record-sets transaction start --zone=db-internal
gcloud dns record-sets transaction add 10.0.3.50 \
  --name=primary.db.internal.example.com \
  --type=A \
  --ttl=300 \
  --zone=db-internal
gcloud dns record-sets transaction add 10.0.3.51 \
  --name=secondary.db.internal.example.com \
  --type=A \
  --ttl=300 \
  --zone=db-internal
gcloud dns record-sets transaction execute --zone=db-internal

Multi-VPC Setup

bash
# Create zone bound to multiple VPCs
gcloud dns managed-zones create shared-internal \
  --dns-name=shared.internal.example.com \
  --visibility=private \
  --networks=projects/PROJECT/global/networks/vpc-a,projects/PROJECT/global/networks/vpc-b,projects/PROJECT/global/networks/vpc-c

# Update to add more VPCs later
gcloud dns managed-zones update shared-internal \
  --networks=projects/PROJECT/global/networks/vpc-a,projects/PROJECT/global/networks/vpc-b,projects/PROJECT/global/networks/vpc-c,projects/PROJECT/global/networks/vpc-d

Terraform

hcl
resource "google_compute_network" "vpc_a" {
  name = "vpc-a"
  auto_create_subnetworks = false
}

resource "google_compute_network" "vpc_b" {
  name = "vpc-b"
  auto_create_subnetworks = false
}

resource "google_dns_managed_zone" "shared" {
  name        = "shared-zone"
  dns_name    = "shared.internal.example.com."
  visibility  = "private"
  
  private_visibility_config {
    networks_list {
      network_url = google_compute_network.vpc_a.id
    }
    networks_list {
      network_url = google_compute_network.vpc_b.id
    }
  }
}

resource "google_dns_record_set" "service" {
  name            = "api.shared.internal.example.com."
  type            = "A"
  ttl             = 300
  managed_zone    = google_dns_managed_zone.shared.name
  rrdatas         = ["10.0.1.100", "10.0.2.100"]  # Dual-region
}

Zone Discovery

How Zone Discovery Works

When VM in VPC queries domain, resolver checks all bound zones in order:

VM in VPC A queries: api.example.com

Resolver checks bound zones:
  1. api.example.com zone? → Not bound to VPC A
  2. example.com zone? → Bound to VPC A, but no api record
  3. com zone? → Not in Cloud DNS
  → Default to public DNS
  → Result: Falls through to internet resolver

Zone Naming Hierarchy

Correct naming enables efficient zone discovery:

Root zone: example.com
  └── Bound to VPC A
      Records: example.com A 10.0.1.5

Sub-zone: api.example.com
  └── Bound to VPC A
      Records: api.example.com A 10.0.2.5

Query: api.example.com
  Resolver checks api.example.com zone first
  → Found, returns 10.0.2.5
  
Query: service.api.example.com
  Resolver checks zones in hierarchy
  → No zone for service.api.example.com
  → Check api.example.com zone
  → Found? Then returns records
  → Not found? Check example.com zone

Implication: Zone names should form delegation tree:

example.com (root)
├── api.example.com (API services)
├── db.example.com (Database services)
└── internal.example.com (Internal tools)

When query arrives, resolver traverses tree efficiently.

Production Patterns

Pattern 1: Environment-Based Zones

prod.internal.example.com (bound to prod-vpc)
  └── services in prod-vpc resolve here

staging.internal.example.com (bound to staging-vpc)
  └── services in staging-vpc resolve here

dev.internal.example.com (bound to dev-vpc)
  └── services in dev-vpc resolve here

Benefit: Complete isolation, cannot cross-environment by mistake.

Pattern 2: Shared VPC with Centralized Zones

Host Project (central DNS):
  └── Shared VPC
      └── Private Zone "internal.example.com"
          Bound to: Shared VPC

Service Projects (use shared VPC):
  └── Resources in Shared VPC
      └── Can resolve internal.example.com

Benefit: One zone, managed centrally, accessible by all teams.

Pattern 3: Service-Specific Zones

auth.internal.example.com (auth team zone)
  └── Bound to: auth-vpc

backend.internal.example.com (backend team zone)
  └── Bound to: backend-vpc

data.internal.example.com (data team zone)
  └── Bound to: data-vpc

Benefit: Per-team ownership, independent zones.

Multi-VPC Challenges

Challenge 1: Zone Binding Delays

When binding zone to new VPC, propagation takes ~60 seconds:

T+0: Update zone, add VPC binding
T+0-60s: New VPC resources still cannot resolve (cache/propagation delay)
T+60s: Zone binding active

Implication: Don't expect immediate resolution after binding update.

Challenge 2: Cross-Project VPC

Private zones only bound within same project's VPCs:

Project A: VPC A
  └── Private Zone "a.internal" (bound to VPC A)

Project B: VPC B
  └── Cannot bind directly to Project A's zone (different project)
      Solution: Use DNS Peering

When cross-project resolution needed: Use DNS Peering (separate pattern).

Challenge 3: VPC Quota

Private zone can bound to max ~100 VPCs
(May vary by quota)

If more than 100 VPCs:
  → Cannot single zone
  → Solution: Zone replication or peering

GKE Integration

GKE Automatic Service DNS

GKE automatically creates service DNS entries:

Service "api" in namespace "default":
  Kubernetes DNS name: api.default.svc.cluster.local
  
GKE configures private zone:
  cluster.local zone (private, bound to GKE VPC)
  Records:
    - api.default.svc.cluster.local A 10.4.0.50 (service IP)

Result: Pods automatically can resolve service names without additional config.

Exposing GKE Services

To expose GKE service beyond cluster:

bash
# Create private zone for external access
gcloud dns managed-zones create gke-services \
  --dns-name=k8s.internal.example.com \
  --visibility=private \
  --networks=projects/PROJECT/global/networks/default

# Add service record
gcloud dns record-sets transaction start --zone=gke-services
gcloud dns record-sets transaction add 10.4.0.50 \
  --name=api.k8s.internal.example.com \
  --type=A \
  --ttl=60 \
  --zone=gke-services
gcloud dns record-sets transaction execute --zone=gke-services

# Now external services (same VPC) can resolve
# api.k8s.internal.example.com → 10.4.0.50

Troubleshooting

Issue 1: Cannot Resolve Private Zone Record

bash
# From VM in VPC:
nslookup api.internal.example.com
# Result: NXDOMAIN

Debug:
1. Check zone exists and is private
   gcloud dns managed-zones describe api-zone

2. Check VPC is bound
   gcloud dns managed-zones describe api-zone \
     --format="value(privateVisibilityConfig.networks[].networkUrl)"
   Should show your VPC

3. Check record exists
   gcloud dns record-sets list --zone=api-zone \
     --filter="name:api.internal.example.com"

4. Check VM's VPC
   gcloud compute instances describe VM_NAME --zone=ZONE \
     --format="value(networkInterfaces[0].network)"
   Should match bound VPC

5. Test from VM
   If using custom /etc/resolv.conf, might not use VPC resolver
   cat /etc/resolv.conf | grep nameserver
   Should include 169.254.169.254 or be empty (use default)

Issue 2: Zone Exists but Cannot Resolve from Some VPCs

Zone bound to: VPC A, VPC B
VM in VPC C cannot resolve

Debug:
  Confirm VM is in VPC C (not wrong VPC)
  If truly VPC C: Zone must be rebound to include VPC C
  
Solution:
  gcloud dns managed-zones update zone-name \
    --networks=VPC_A,VPC_B,VPC_C

Issue 3: Negative Caching (Records Deleted but Still Resolving)

Zone record: api.internal.example.com A 10.0.1.5
TTL: 300

Delete record at T+0

Query at T+100 (still within TTL):
  VM cache still has old record
  → Returns 10.0.1.5 (outdated)

Query at T+300+1 (after TTL):
  Cache expires
  → New query to Cloud DNS
  → Returns updated record

Mitigation: Reduce TTL before planned record changes.

Monitoring

bash
# Monitor zone bindings
for zone in $(gcloud dns managed-zones list --filter="visibility:PRIVATE" --format="value(name)"); do
  echo "Zone: $zone"
  gcloud dns managed-zones describe $zone --format="value(privateVisibilityConfig.networks[].networkUrl)" | wc -l
done

# Monitor resolution errors
gcloud logging read "resource.type=dns_query AND severity=ERROR" --limit=50

# Alert if zone binding count drops (indicates failure)

Best Practices

  1. Use zone hierarchy (example.com → api.example.com)
  2. Clear naming convention (prod.internal.example.com)
  3. Document which VPCs are bound to each zone
  4. Test cross-VPC resolution before production
  5. Plan for growth (what if add 50 more VPCs?)
  6. Monitor zone health (queries failing?)
  7. Backup zone configs (Terraform IaC)
  8. TTL tuning based on change frequency

References