Skip to content

Static Routes & Next Hops — Customizing VPC Routing

Executive Summary

GCP routes packets dựa trên destination IP address trong routing table.

Có 3 loại routes:

  • Subnet routes (tự động): mỗi subnet CIDR tạo route tới instance
  • Static routes (custom): define manually, target specific CIDR + next hop
  • Dynamic routes (BGP): Cloud Router học từ on-prem qua BGP

Hiểu routing order và next hop types là key để debug "packets không đến nơi".

Route Fundamentals

Route Components

Mỗi route trong VPC chỉ định:

yaml
Destination Range:     # CIDR block (VPC destination)
  10.0.0.0/16

Next Hop Type:         # Where packets go
  - Instance           # Specific VM
  - Network Interface  # VM's secondary NIC
  - Internal LB        # Load balancer
  - VPN Gateway        # Cloud VPN tunnel
  - Interconnect       # Cloud Interconnect VLAN attachment
  - Default IGW        # Internet Gateway
  - peering-vpc-name   # VPC Peering

Priority (Metric):     # Lower = higher priority
  1000

Name:                  # Route identifier
  "route-to-onprem"

Network:               # Which VPC
  "prod-vpc"

Enabled:               # true/false
  true

Description:          # Optional documentation
  "On-prem reachability via Interconnect"

Route Matching Algorithm

Khi packet đi tới destination 10.20.1.5:

VPC routing table (sorted by priority):

Priority  Destination    Next Hop        Type
100       10.20.0.0/16   instance-1      Static
1000      10.0.0.0/8     vpn-gateway     Static
65534     0.0.0.0/0      default-igw     System
65535     10.0.0.0/16    local            System

Packet: dst=10.20.1.5

Step 1: Check 10.20.0.0/16 match?
  10.20.1.5 in 10.20.0.0/16? YES ✓
  Next hop: instance-1
  Priority: 100
  → MATCHED, STOP

Route 10.0.0.0/8 ignored (lower priority despite match)

Rule: Most specific prefix match wins, broken by priority

Subnet Routes (Automatic)

GCP tự động tạo route cho mỗi subnet CIDR:

Subnet: prod-app (10.0.1.0/24) in us-central1

Auto-generated route:
  Destination: 10.0.1.0/24
  Next Hop: Local network
  Priority: 0 (highest)
  Type: Subnet route

Effect: Packets to 10.0.1.0/24 vào VPC locally
        không exit network

Subnet Route Lifecycle

Step 1: Create subnet 10.0.2.0/24
  → Automatic route creation: 10.0.2.0/24 → local
  → Propagates to all regions (global VPC)
  → Takes ~1 second

Step 2: Delete subnet
  → Route deletion
  → Instances trong subnet: still have route table entries
  → Packets to deleted subnet: fall through to next route
  → May hit default 0.0.0.0/0 (internet) unexpectedly!

Anti-pattern: Delete subnet, re-create with different CIDR
  → Old routes lingering, confusing routing behavior

Subnet Route Limitations

❌ Cannot delete subnet routes (immutable) ❌ Cannot modify subnet route priority ✅ Can delete entire subnet (route deletes automatically)

bash
# This FAILS:
gcloud compute routes delete 10-0-1-0-24

# Must delete subnet instead:
gcloud compute networks subnets delete prod-app \
  --region=us-central1

Custom Static Routes

Creating Static Routes

bash
gcloud compute routes create route-to-onprem \
  --destination-range=192.168.0.0/16 \
  --network=prod-vpc \
  --next-hop-vpn-tunnel=vpn-tunnel-1 \
  --priority=1000

# Alternative: route through instance
gcloud compute routes create route-via-gateway-vm \
  --destination-range=172.16.0.0/12 \
  --network=prod-vpc \
  --next-hop-instance=gateway-vm \
  --next-hop-instance-zone=us-central1-a \
  --priority=2000

Next Hop Types in Detail

Next Hop: Compute Instance

Route: 192.168.0.0/16 → instance "gateway-vm"

Packet forwarding:
  1. Packet arrives at gateway-vm
  2. Linux stack checks routing table
  3. If VM not routing (no forwarding enabled)
     → Packet dropped!
  
Enable IP forwarding pada VM:
gcloud compute instances create gateway-vm \
  --can-ip-forward

Or modify existing:
gcloud compute instances modify gateway-vm \
  --can-ip-forward

Key: VM must have IP forwarding + iptables/routing configured

Next Hop: Internal Load Balancer (ILB)

Route: 10.20.0.0/16 → ilb-prod

Use case: Ha Proxy untuk next-hop, load balance across backend VMs

Architecture:
  Packet → ILB (health check backends) → Backend VM
  
ILB requires:
  - Health check: backends must respond
  - Backend service: group of backends
  - Forwarding rule: captures packets

Advantage: Redundancy (multiple backend VMs)
Disadvantage: Extra hop (latency)

Next Hop: VPN Tunnel

Route: 203.0.113.0/24 → vpn-tunnel-site-a

On-premises routing:
  GCP subnet 10.0.0.0/16 → VPN tunnel → Site A 203.0.113.0/24

VPN tunnel must be:
  - Created and active
  - Connected to Cloud VPN gateway
  - BGP session established (for dynamic routes)

Traffic flow:
  Packet dst=203.0.113.5
  → VPC matches route 203.0.113.0/24 → vpn-tunnel
  → Packet encrypted, sent to on-prem
  → On-prem receives, decrypts

Next Hop: Cloud Interconnect

Route: 192.168.0.0/16 → interconnect-vlan-attachment

High-performance on-premises connectivity:
  - 10 Gbps or 100 Gbps dedicated connection
  - Lower latency than VPN
  - BGP session for dynamic routing

Next Hop: Default Internet Gateway

Route: 0.0.0.0/0 → default-igw

System-generated default route:
  Destination: 0.0.0.0/0
  Next Hop: Default internet gateway
  Priority: 65535 (lowest)

Effect: Unmatched destination → internet egress
        External IP required for return traffic

Use case: VMs with public IPs reaching internet
          Private VMs: dropped (no return path)

Route Priority & Conflicts

Priority Mechanism

Routes in VPC:

Priority  Destination    Next Hop
1000      10.0.0.0/8     instance-a
1000      10.0.1.0/16    instance-b
2000      10.0.0.0/16    instance-c

Packet: dst=10.0.1.5

Match against 10.0.1.0/16?
  10.0.1.5 in 10.0.1.0/16? YES ✓ Priority 1000
  10.0.1.5 in 10.0.0.0/8? YES ✓ Priority 1000

Same priority, two matches → UNDEFINED
  GCP uses internal algorithm (no guarantee)
  
Result: Unpredictable routing (bad!)

Solution: Use unique priorities or non-overlapping CIDRs

Resolving Conflicts

Bad setup (overlapping CIDRs):
  Route A: 10.0.0.0/8 → instance-a (priority 1000)
  Route B: 10.0.1.0/16 → instance-b (priority 1000)

Good setup (unique priorities):
  Route A: 10.0.0.0/8 → instance-a (priority 2000)
  Route B: 10.0.1.0/16 → instance-b (priority 1000)
  
Now 10.0.1.x goes to instance-b (priority 1000 wins)
    10.0.2.x goes to instance-a (priority 2000)

Production Patterns

Pattern 1: Multi-Region On-Premises Connectivity

Architecture:

GCP us-central1 (10.0.0.0/16) → Cloud VPN → Site A (192.168.1.0/16)
GCP europe-west1 (10.1.0.0/16) → Cloud VPN → Site A (192.168.1.0/16)

Routes:
gcloud compute routes create route-usa-to-site-a \
  --destination-range=192.168.1.0/16 \
  --network=prod-vpc \
  --next-hop-vpn-tunnel=vpn-usa \
  --priority=1000

gcloud compute routes create route-eu-to-site-a \
  --destination-range=192.168.1.0/16 \
  --network=prod-vpc \
  --next-hop-vpn-tunnel=vpn-eu \
  --priority=1000

⚠️ Problem: Same destination 192.168.1.0/16 from two tunnels
   Packets asymmetric (request via vpn-usa, response via vpn-eu)
   
Solution: Use Cloud Router with BGP instead
  → BGP learns best path per region
  → Symmetric routing

Pattern 2: Gateway Redundancy

Setup:

Gateway VM 1 (10.0.1.10): Primary router
Gateway VM 2 (10.0.1.11): Backup router

Route to on-prem:
gcloud compute routes create route-to-onprem-primary \
  --destination-range=192.168.0.0/16 \
  --network=prod-vpc \
  --next-hop-instance=gateway-vm-1 \
  --next-hop-instance-zone=us-central1-a \
  --priority=1000

gcloud compute routes create route-to-onprem-backup \
  --destination-range=192.168.0.0/16 \
  --network=prod-vpc \
  --next-hop-instance=gateway-vm-2 \
  --next-hop-instance-zone=us-central1-b \
  --priority=2000

Traffic:
  - Gateway 1 up: packets → gateway-vm-1 (priority 1000)
  - Gateway 1 down: packets → gateway-vm-2 (priority 2000)
  
⚠️ Limitation: No automatic failover detection
   GCP doesn't check if next-hop is healthy
   Must manually update routes or use Cloud Router

Pattern 3: Split-Horizon Routing

Use case: Production vs Staging environments
          Same on-prem CIDR, different routing

Prod VPC:
gcloud compute routes create prod-to-onprem \
  --destination-range=192.168.0.0/16 \
  --network=prod-vpc \
  --next-hop-vpn-tunnel=vpn-prod \
  --priority=1000

Staging VPC:
gcloud compute routes create staging-to-onprem \
  --destination-range=192.168.0.0/16 \
  --network=staging-vpc \
  --next-hop-vpn-tunnel=vpn-staging \
  --priority=1000

Effect: Same destination, different next hops per VPC
        Complete isolation

Troubleshooting Routes

Symptom: "Destination Unreachable"

Diagnosis:

1. Check VM routing table:
   gcloud compute routes list \
     --filter="network:prod-vpc" \
     --format=table

2. Look for matching route:
   $ Route for destination 192.168.1.5?
   $ Found: 192.168.0.0/16 → vpn-tunnel (priority 1000)

3. Check tunnel status:
   gcloud compute vpn-tunnels describe vpn-tunnel-1
   Status: ESTABLISHED ✓

4. Check next-hop:
   If instance next-hop:
     gcloud compute instances describe gateway-vm \
       --format="value(metadata.enable-oslogin, canIpForward)"
     
   canIpForward must be true
   
5. Test packet path:
   - From source VM: ping -c 1 192.168.1.5
   - Check firewall rules at source (may block ICMP)
   - Check firewall rules at destination

6. Check GCP flow logs:
   gcloud compute networks list-peering-routes \
     --network=prod-vpc

Symptom: Asymmetric Routing

Problem: A → B works, B → A fails

Causes:
1. Different routing tables (check both VPCs)
2. Firewall rule block in one direction
3. Different next-hop for return path

Solution:
  - List all routes in both VPCs
  - Ensure symmetric next-hops
  - Check firewall egress/ingress rules both ways

Symptom: High Latency or Routing Loop

Routing loop example:

Route A: 10.1.0.0/16 → instance-1
Route B: 10.1.0.0/16 → instance-2 (via instance-1's interface)

If instance-1 not configured to forward:
  Packets bounce between routes
  
Symptoms: 
  - High latency
  - ttl exceeded
  - mtr shows packet loop

Fix:
  - Ensure instance has can-ip-forward=true
  - Or use ILB for next-hop instead
  - Or delete conflicting routes

Route Limits & Quotas

Per VPC:
  - Max static routes: 500 (quota)
  - Max dynamic routes (BGP): 10,000

Per project:
  - Max VPCs: 15
  - Max routes total: 500 × 15 = 7,500 static

Quota can be increased:
  gcloud compute project-info describe --project=PROJECT_ID
  
If hitting limits:
  - Summarize routes (10.0.0.0/8 instead of individual subnets)
  - Use Shared VPC to centralize routing
  - Use Cloud Router for dynamic aggregation

Route Observability

bash
# List all routes:
gcloud compute routes list --network=prod-vpc \
  --format=table

# Describe specific route:
gcloud compute routes describe route-to-onprem

# Monitor route changes:
gcloud compute routes list --network=prod-vpc \
  --filter="creationTimestamp>2026-05-18"

# Check effective routes for instance:
gcloud compute instances describe vm1 \
  --format="value(networkInterfaces[0].network)"
  
# Then list routes for that network

Best Practices

Do:

  • Use unique priorities for predictable routing
  • Document route purpose in description
  • Use Cloud Router for on-premises connectivity (automatic failover)
  • Test routes in staging first
  • Monitor route changes via audit logs

Don't:

  • Create overlapping CIDR routes with same priority
  • Use instance as next-hop without can-ip-forward
  • Assume GCP health-checks route next-hops (it doesn't)
  • Mix static routes and dynamic routes (BGP) for same destination
  • Forget firewall rules (routing + firewall = full picture)

Conclusion

Static routes provide fine-grained control but require discipline:

  • Subnet routes: automatic, immutable, best for VPC-local traffic
  • Static routes: manual, mutable, needed for on-prem/cross-VPC
  • Next hops: choose based on redundancy needs (instance, ILB, VPN)
  • Priority: unique values prevent undefined routing

For complex scenarios (multi-region, failover), Cloud Router + BGP is recommended over static routes.