Static Routes & Next Hops — Customizing VPC Routing
Executive Summary
GCP routes packets dựa trên destination IP address trong routing table.
Có 3 loại routes:
- Subnet routes (tự động): mỗi subnet CIDR tạo route tới instance
- Static routes (custom): define manually, target specific CIDR + next hop
- Dynamic routes (BGP): Cloud Router học từ on-prem qua BGP
Hiểu routing order và next hop types là key để debug "packets không đến nơi".
Route Fundamentals
Route Components
Mỗi route trong VPC chỉ định:
yaml
Destination Range: # CIDR block (VPC destination)
10.0.0.0/16
Next Hop Type: # Where packets go
- Instance # Specific VM
- Network Interface # VM's secondary NIC
- Internal LB # Load balancer
- VPN Gateway # Cloud VPN tunnel
- Interconnect # Cloud Interconnect VLAN attachment
- Default IGW # Internet Gateway
- peering-vpc-name # VPC Peering
Priority (Metric): # Lower = higher priority
1000
Name: # Route identifier
"route-to-onprem"
Network: # Which VPC
"prod-vpc"
Enabled: # true/false
true
Description: # Optional documentation
"On-prem reachability via Interconnect"Route Matching Algorithm
Khi packet đi tới destination 10.20.1.5:
VPC routing table (sorted by priority):
Priority Destination Next Hop Type
100 10.20.0.0/16 instance-1 Static
1000 10.0.0.0/8 vpn-gateway Static
65534 0.0.0.0/0 default-igw System
65535 10.0.0.0/16 local System
Packet: dst=10.20.1.5
Step 1: Check 10.20.0.0/16 match?
10.20.1.5 in 10.20.0.0/16? YES ✓
Next hop: instance-1
Priority: 100
→ MATCHED, STOP
Route 10.0.0.0/8 ignored (lower priority despite match)Rule: Most specific prefix match wins, broken by priority
Subnet Routes (Automatic)
GCP tự động tạo route cho mỗi subnet CIDR:
Subnet: prod-app (10.0.1.0/24) in us-central1
Auto-generated route:
Destination: 10.0.1.0/24
Next Hop: Local network
Priority: 0 (highest)
Type: Subnet route
Effect: Packets to 10.0.1.0/24 vào VPC locally
không exit networkSubnet Route Lifecycle
Step 1: Create subnet 10.0.2.0/24
→ Automatic route creation: 10.0.2.0/24 → local
→ Propagates to all regions (global VPC)
→ Takes ~1 second
Step 2: Delete subnet
→ Route deletion
→ Instances trong subnet: still have route table entries
→ Packets to deleted subnet: fall through to next route
→ May hit default 0.0.0.0/0 (internet) unexpectedly!
Anti-pattern: Delete subnet, re-create with different CIDR
→ Old routes lingering, confusing routing behaviorSubnet Route Limitations
❌ Cannot delete subnet routes (immutable) ❌ Cannot modify subnet route priority ✅ Can delete entire subnet (route deletes automatically)
bash
# This FAILS:
gcloud compute routes delete 10-0-1-0-24
# Must delete subnet instead:
gcloud compute networks subnets delete prod-app \
--region=us-central1Custom Static Routes
Creating Static Routes
bash
gcloud compute routes create route-to-onprem \
--destination-range=192.168.0.0/16 \
--network=prod-vpc \
--next-hop-vpn-tunnel=vpn-tunnel-1 \
--priority=1000
# Alternative: route through instance
gcloud compute routes create route-via-gateway-vm \
--destination-range=172.16.0.0/12 \
--network=prod-vpc \
--next-hop-instance=gateway-vm \
--next-hop-instance-zone=us-central1-a \
--priority=2000Next Hop Types in Detail
Next Hop: Compute Instance
Route: 192.168.0.0/16 → instance "gateway-vm"
Packet forwarding:
1. Packet arrives at gateway-vm
2. Linux stack checks routing table
3. If VM not routing (no forwarding enabled)
→ Packet dropped!
Enable IP forwarding pada VM:
gcloud compute instances create gateway-vm \
--can-ip-forward
Or modify existing:
gcloud compute instances modify gateway-vm \
--can-ip-forward
Key: VM must have IP forwarding + iptables/routing configuredNext Hop: Internal Load Balancer (ILB)
Route: 10.20.0.0/16 → ilb-prod
Use case: Ha Proxy untuk next-hop, load balance across backend VMs
Architecture:
Packet → ILB (health check backends) → Backend VM
ILB requires:
- Health check: backends must respond
- Backend service: group of backends
- Forwarding rule: captures packets
Advantage: Redundancy (multiple backend VMs)
Disadvantage: Extra hop (latency)Next Hop: VPN Tunnel
Route: 203.0.113.0/24 → vpn-tunnel-site-a
On-premises routing:
GCP subnet 10.0.0.0/16 → VPN tunnel → Site A 203.0.113.0/24
VPN tunnel must be:
- Created and active
- Connected to Cloud VPN gateway
- BGP session established (for dynamic routes)
Traffic flow:
Packet dst=203.0.113.5
→ VPC matches route 203.0.113.0/24 → vpn-tunnel
→ Packet encrypted, sent to on-prem
→ On-prem receives, decryptsNext Hop: Cloud Interconnect
Route: 192.168.0.0/16 → interconnect-vlan-attachment
High-performance on-premises connectivity:
- 10 Gbps or 100 Gbps dedicated connection
- Lower latency than VPN
- BGP session for dynamic routingNext Hop: Default Internet Gateway
Route: 0.0.0.0/0 → default-igw
System-generated default route:
Destination: 0.0.0.0/0
Next Hop: Default internet gateway
Priority: 65535 (lowest)
Effect: Unmatched destination → internet egress
External IP required for return traffic
Use case: VMs with public IPs reaching internet
Private VMs: dropped (no return path)Route Priority & Conflicts
Priority Mechanism
Routes in VPC:
Priority Destination Next Hop
1000 10.0.0.0/8 instance-a
1000 10.0.1.0/16 instance-b
2000 10.0.0.0/16 instance-c
Packet: dst=10.0.1.5
Match against 10.0.1.0/16?
10.0.1.5 in 10.0.1.0/16? YES ✓ Priority 1000
10.0.1.5 in 10.0.0.0/8? YES ✓ Priority 1000
Same priority, two matches → UNDEFINED
GCP uses internal algorithm (no guarantee)
Result: Unpredictable routing (bad!)
Solution: Use unique priorities or non-overlapping CIDRsResolving Conflicts
Bad setup (overlapping CIDRs):
Route A: 10.0.0.0/8 → instance-a (priority 1000)
Route B: 10.0.1.0/16 → instance-b (priority 1000)
Good setup (unique priorities):
Route A: 10.0.0.0/8 → instance-a (priority 2000)
Route B: 10.0.1.0/16 → instance-b (priority 1000)
Now 10.0.1.x goes to instance-b (priority 1000 wins)
10.0.2.x goes to instance-a (priority 2000)Production Patterns
Pattern 1: Multi-Region On-Premises Connectivity
Architecture:
GCP us-central1 (10.0.0.0/16) → Cloud VPN → Site A (192.168.1.0/16)
GCP europe-west1 (10.1.0.0/16) → Cloud VPN → Site A (192.168.1.0/16)
Routes:
gcloud compute routes create route-usa-to-site-a \
--destination-range=192.168.1.0/16 \
--network=prod-vpc \
--next-hop-vpn-tunnel=vpn-usa \
--priority=1000
gcloud compute routes create route-eu-to-site-a \
--destination-range=192.168.1.0/16 \
--network=prod-vpc \
--next-hop-vpn-tunnel=vpn-eu \
--priority=1000
⚠️ Problem: Same destination 192.168.1.0/16 from two tunnels
Packets asymmetric (request via vpn-usa, response via vpn-eu)
Solution: Use Cloud Router with BGP instead
→ BGP learns best path per region
→ Symmetric routingPattern 2: Gateway Redundancy
Setup:
Gateway VM 1 (10.0.1.10): Primary router
Gateway VM 2 (10.0.1.11): Backup router
Route to on-prem:
gcloud compute routes create route-to-onprem-primary \
--destination-range=192.168.0.0/16 \
--network=prod-vpc \
--next-hop-instance=gateway-vm-1 \
--next-hop-instance-zone=us-central1-a \
--priority=1000
gcloud compute routes create route-to-onprem-backup \
--destination-range=192.168.0.0/16 \
--network=prod-vpc \
--next-hop-instance=gateway-vm-2 \
--next-hop-instance-zone=us-central1-b \
--priority=2000
Traffic:
- Gateway 1 up: packets → gateway-vm-1 (priority 1000)
- Gateway 1 down: packets → gateway-vm-2 (priority 2000)
⚠️ Limitation: No automatic failover detection
GCP doesn't check if next-hop is healthy
Must manually update routes or use Cloud RouterPattern 3: Split-Horizon Routing
Use case: Production vs Staging environments
Same on-prem CIDR, different routing
Prod VPC:
gcloud compute routes create prod-to-onprem \
--destination-range=192.168.0.0/16 \
--network=prod-vpc \
--next-hop-vpn-tunnel=vpn-prod \
--priority=1000
Staging VPC:
gcloud compute routes create staging-to-onprem \
--destination-range=192.168.0.0/16 \
--network=staging-vpc \
--next-hop-vpn-tunnel=vpn-staging \
--priority=1000
Effect: Same destination, different next hops per VPC
Complete isolationTroubleshooting Routes
Symptom: "Destination Unreachable"
Diagnosis:
1. Check VM routing table:
gcloud compute routes list \
--filter="network:prod-vpc" \
--format=table
2. Look for matching route:
$ Route for destination 192.168.1.5?
$ Found: 192.168.0.0/16 → vpn-tunnel (priority 1000)
3. Check tunnel status:
gcloud compute vpn-tunnels describe vpn-tunnel-1
Status: ESTABLISHED ✓
4. Check next-hop:
If instance next-hop:
gcloud compute instances describe gateway-vm \
--format="value(metadata.enable-oslogin, canIpForward)"
canIpForward must be true
5. Test packet path:
- From source VM: ping -c 1 192.168.1.5
- Check firewall rules at source (may block ICMP)
- Check firewall rules at destination
6. Check GCP flow logs:
gcloud compute networks list-peering-routes \
--network=prod-vpcSymptom: Asymmetric Routing
Problem: A → B works, B → A fails
Causes:
1. Different routing tables (check both VPCs)
2. Firewall rule block in one direction
3. Different next-hop for return path
Solution:
- List all routes in both VPCs
- Ensure symmetric next-hops
- Check firewall egress/ingress rules both waysSymptom: High Latency or Routing Loop
Routing loop example:
Route A: 10.1.0.0/16 → instance-1
Route B: 10.1.0.0/16 → instance-2 (via instance-1's interface)
If instance-1 not configured to forward:
Packets bounce between routes
Symptoms:
- High latency
- ttl exceeded
- mtr shows packet loop
Fix:
- Ensure instance has can-ip-forward=true
- Or use ILB for next-hop instead
- Or delete conflicting routesRoute Limits & Quotas
Per VPC:
- Max static routes: 500 (quota)
- Max dynamic routes (BGP): 10,000
Per project:
- Max VPCs: 15
- Max routes total: 500 × 15 = 7,500 static
Quota can be increased:
gcloud compute project-info describe --project=PROJECT_ID
If hitting limits:
- Summarize routes (10.0.0.0/8 instead of individual subnets)
- Use Shared VPC to centralize routing
- Use Cloud Router for dynamic aggregationRoute Observability
bash
# List all routes:
gcloud compute routes list --network=prod-vpc \
--format=table
# Describe specific route:
gcloud compute routes describe route-to-onprem
# Monitor route changes:
gcloud compute routes list --network=prod-vpc \
--filter="creationTimestamp>2026-05-18"
# Check effective routes for instance:
gcloud compute instances describe vm1 \
--format="value(networkInterfaces[0].network)"
# Then list routes for that networkBest Practices
✅ Do:
- Use unique priorities for predictable routing
- Document route purpose in description
- Use Cloud Router for on-premises connectivity (automatic failover)
- Test routes in staging first
- Monitor route changes via audit logs
❌ Don't:
- Create overlapping CIDR routes with same priority
- Use instance as next-hop without can-ip-forward
- Assume GCP health-checks route next-hops (it doesn't)
- Mix static routes and dynamic routes (BGP) for same destination
- Forget firewall rules (routing + firewall = full picture)
Conclusion
Static routes provide fine-grained control but require discipline:
- Subnet routes: automatic, immutable, best for VPC-local traffic
- Static routes: manual, mutable, needed for on-prem/cross-VPC
- Next hops: choose based on redundancy needs (instance, ILB, VPN)
- Priority: unique values prevent undefined routing
For complex scenarios (multi-region, failover), Cloud Router + BGP is recommended over static routes.