Static Routes & Next Hops — Customizing VPC Routing

Executive Summary

GCP routes packets dựa trên destination IP address trong routing table.

Có 3 loại routes:

Subnet routes (tự động): mỗi subnet CIDR tạo route tới instance
Static routes (custom): define manually, target specific CIDR + next hop
Dynamic routes (BGP): Cloud Router học từ on-prem qua BGP

Hiểu routing order và next hop types là key để debug "packets không đến nơi".

Route Fundamentals

Route Components

Mỗi route trong VPC chỉ định:

yaml

Destination Range:     # CIDR block (VPC destination)
  10.0.0.0/16

Next Hop Type:         # Where packets go
  - Instance           # Specific VM
  - Network Interface  # VM's secondary NIC
  - Internal LB        # Load balancer
  - VPN Gateway        # Cloud VPN tunnel
  - Interconnect       # Cloud Interconnect VLAN attachment
  - Default IGW        # Internet Gateway
  - peering-vpc-name   # VPC Peering

Priority (Metric):     # Lower = higher priority
  1000

Name:                  # Route identifier
  "route-to-onprem"

Network:               # Which VPC
  "prod-vpc"

Enabled:               # true/false
  true

Description:          # Optional documentation
  "On-prem reachability via Interconnect"

Route Matching Algorithm

Khi packet đi tới destination 10.20.1.5:

VPC routing table (sorted by priority):

Priority  Destination    Next Hop        Type
100       10.20.0.0/16   instance-1      Static
1000      10.0.0.0/8     vpn-gateway     Static
65534     0.0.0.0/0      default-igw     System
65535     10.0.0.0/16    local            System

Packet: dst=10.20.1.5

Step 1: Check 10.20.0.0/16 match?
  10.20.1.5 in 10.20.0.0/16? YES ✓
  Next hop: instance-1
  Priority: 100
  → MATCHED, STOP

Route 10.0.0.0/8 ignored (lower priority despite match)

Rule: Most specific prefix match wins, broken by priority

Subnet Routes (Automatic)

GCP tự động tạo route cho mỗi subnet CIDR:

Subnet: prod-app (10.0.1.0/24) in us-central1

Auto-generated route:
  Destination: 10.0.1.0/24
  Next Hop: Local network
  Priority: 0 (highest)
  Type: Subnet route

Effect: Packets to 10.0.1.0/24 vào VPC locally
        không exit network

Subnet Route Lifecycle

Step 1: Create subnet 10.0.2.0/24
  → Automatic route creation: 10.0.2.0/24 → local
  → Propagates to all regions (global VPC)
  → Takes ~1 second

Step 2: Delete subnet
  → Route deletion
  → Instances trong subnet: still have route table entries
  → Packets to deleted subnet: fall through to next route
  → May hit default 0.0.0.0/0 (internet) unexpectedly!

Anti-pattern: Delete subnet, re-create with different CIDR
  → Old routes lingering, confusing routing behavior

Subnet Route Limitations

❌ Cannot delete subnet routes (immutable) ❌ Cannot modify subnet route priority ✅ Can delete entire subnet (route deletes automatically)

bash

# This FAILS:
gcloud compute routes delete 10-0-1-0-24

# Must delete subnet instead:
gcloud compute networks subnets delete prod-app \
  --region=us-central1

Custom Static Routes

Creating Static Routes

bash

gcloud compute routes create route-to-onprem \
  --destination-range=192.168.0.0/16 \
  --network=prod-vpc \
  --next-hop-vpn-tunnel=vpn-tunnel-1 \
  --priority=1000

# Alternative: route through instance
gcloud compute routes create route-via-gateway-vm \
  --destination-range=172.16.0.0/12 \
  --network=prod-vpc \
  --next-hop-instance=gateway-vm \
  --next-hop-instance-zone=us-central1-a \
  --priority=2000

Next Hop Types in Detail

Next Hop: Compute Instance

Route: 192.168.0.0/16 → instance "gateway-vm"

Packet forwarding:
  1. Packet arrives at gateway-vm
  2. Linux stack checks routing table
  3. If VM not routing (no forwarding enabled)
     → Packet dropped!
  
Enable IP forwarding pada VM:
gcloud compute instances create gateway-vm \
  --can-ip-forward

Or modify existing:
gcloud compute instances modify gateway-vm \
  --can-ip-forward

Key: VM must have IP forwarding + iptables/routing configured

Next Hop: Internal Load Balancer (ILB)

Route: 10.20.0.0/16 → ilb-prod

Use case: Ha Proxy untuk next-hop, load balance across backend VMs

Architecture:
  Packet → ILB (health check backends) → Backend VM
  
ILB requires:
  - Health check: backends must respond
  - Backend service: group of backends
  - Forwarding rule: captures packets

Advantage: Redundancy (multiple backend VMs)
Disadvantage: Extra hop (latency)

Next Hop: VPN Tunnel

Route: 203.0.113.0/24 → vpn-tunnel-site-a

On-premises routing:
  GCP subnet 10.0.0.0/16 → VPN tunnel → Site A 203.0.113.0/24

VPN tunnel must be:
  - Created and active
  - Connected to Cloud VPN gateway
  - BGP session established (for dynamic routes)

Traffic flow:
  Packet dst=203.0.113.5
  → VPC matches route 203.0.113.0/24 → vpn-tunnel
  → Packet encrypted, sent to on-prem
  → On-prem receives, decrypts

Next Hop: Cloud Interconnect

Route: 192.168.0.0/16 → interconnect-vlan-attachment

High-performance on-premises connectivity:
  - 10 Gbps or 100 Gbps dedicated connection
  - Lower latency than VPN
  - BGP session for dynamic routing

Next Hop: Default Internet Gateway

Route: 0.0.0.0/0 → default-igw

System-generated default route:
  Destination: 0.0.0.0/0
  Next Hop: Default internet gateway
  Priority: 65535 (lowest)

Effect: Unmatched destination → internet egress
        External IP required for return traffic

Use case: VMs with public IPs reaching internet
          Private VMs: dropped (no return path)

Route Priority & Conflicts

Priority Mechanism

Routes in VPC:

Priority  Destination    Next Hop
1000      10.0.0.0/8     instance-a
1000      10.0.1.0/16    instance-b
2000      10.0.0.0/16    instance-c

Packet: dst=10.0.1.5

Match against 10.0.1.0/16?
  10.0.1.5 in 10.0.1.0/16? YES ✓ Priority 1000
  10.0.1.5 in 10.0.0.0/8? YES ✓ Priority 1000

Same priority, two matches → UNDEFINED
  GCP uses internal algorithm (no guarantee)
  
Result: Unpredictable routing (bad!)

Solution: Use unique priorities or non-overlapping CIDRs

Resolving Conflicts

Bad setup (overlapping CIDRs):
  Route A: 10.0.0.0/8 → instance-a (priority 1000)
  Route B: 10.0.1.0/16 → instance-b (priority 1000)

Good setup (unique priorities):
  Route A: 10.0.0.0/8 → instance-a (priority 2000)
  Route B: 10.0.1.0/16 → instance-b (priority 1000)
  
Now 10.0.1.x goes to instance-b (priority 1000 wins)
    10.0.2.x goes to instance-a (priority 2000)

Production Patterns

Pattern 1: Multi-Region On-Premises Connectivity

Architecture:

GCP us-central1 (10.0.0.0/16) → Cloud VPN → Site A (192.168.1.0/16)
GCP europe-west1 (10.1.0.0/16) → Cloud VPN → Site A (192.168.1.0/16)

Routes:
gcloud compute routes create route-usa-to-site-a \
  --destination-range=192.168.1.0/16 \
  --network=prod-vpc \
  --next-hop-vpn-tunnel=vpn-usa \
  --priority=1000

gcloud compute routes create route-eu-to-site-a \
  --destination-range=192.168.1.0/16 \
  --network=prod-vpc \
  --next-hop-vpn-tunnel=vpn-eu \
  --priority=1000

⚠️ Problem: Same destination 192.168.1.0/16 from two tunnels
   Packets asymmetric (request via vpn-usa, response via vpn-eu)
   
Solution: Use Cloud Router with BGP instead
  → BGP learns best path per region
  → Symmetric routing

Pattern 2: Gateway Redundancy

Setup:

Gateway VM 1 (10.0.1.10): Primary router
Gateway VM 2 (10.0.1.11): Backup router

Route to on-prem:
gcloud compute routes create route-to-onprem-primary \
  --destination-range=192.168.0.0/16 \
  --network=prod-vpc \
  --next-hop-instance=gateway-vm-1 \
  --next-hop-instance-zone=us-central1-a \
  --priority=1000

gcloud compute routes create route-to-onprem-backup \
  --destination-range=192.168.0.0/16 \
  --network=prod-vpc \
  --next-hop-instance=gateway-vm-2 \
  --next-hop-instance-zone=us-central1-b \
  --priority=2000

Traffic:
  - Gateway 1 up: packets → gateway-vm-1 (priority 1000)
  - Gateway 1 down: packets → gateway-vm-2 (priority 2000)
  
⚠️ Limitation: No automatic failover detection
   GCP doesn't check if next-hop is healthy
   Must manually update routes or use Cloud Router

Pattern 3: Split-Horizon Routing

Use case: Production vs Staging environments
          Same on-prem CIDR, different routing

Prod VPC:
gcloud compute routes create prod-to-onprem \
  --destination-range=192.168.0.0/16 \
  --network=prod-vpc \
  --next-hop-vpn-tunnel=vpn-prod \
  --priority=1000

Staging VPC:
gcloud compute routes create staging-to-onprem \
  --destination-range=192.168.0.0/16 \
  --network=staging-vpc \
  --next-hop-vpn-tunnel=vpn-staging \
  --priority=1000

Effect: Same destination, different next hops per VPC
        Complete isolation

Troubleshooting Routes

Symptom: "Destination Unreachable"

Diagnosis:

1. Check VM routing table:
   gcloud compute routes list \
     --filter="network:prod-vpc" \
     --format=table

2. Look for matching route:
   $ Route for destination 192.168.1.5?
   $ Found: 192.168.0.0/16 → vpn-tunnel (priority 1000)

3. Check tunnel status:
   gcloud compute vpn-tunnels describe vpn-tunnel-1
   Status: ESTABLISHED ✓

4. Check next-hop:
   If instance next-hop:
     gcloud compute instances describe gateway-vm \
       --format="value(metadata.enable-oslogin, canIpForward)"
     
   canIpForward must be true
   
5. Test packet path:
   - From source VM: ping -c 1 192.168.1.5
   - Check firewall rules at source (may block ICMP)
   - Check firewall rules at destination

6. Check GCP flow logs:
   gcloud compute networks list-peering-routes \
     --network=prod-vpc

Symptom: Asymmetric Routing

Problem: A → B works, B → A fails

Causes:
1. Different routing tables (check both VPCs)
2. Firewall rule block in one direction
3. Different next-hop for return path

Solution:
  - List all routes in both VPCs
  - Ensure symmetric next-hops
  - Check firewall egress/ingress rules both ways

Symptom: High Latency or Routing Loop

Routing loop example:

Route A: 10.1.0.0/16 → instance-1
Route B: 10.1.0.0/16 → instance-2 (via instance-1's interface)

If instance-1 not configured to forward:
  Packets bounce between routes
  
Symptoms: 
  - High latency
  - ttl exceeded
  - mtr shows packet loop

Fix:
  - Ensure instance has can-ip-forward=true
  - Or use ILB for next-hop instead
  - Or delete conflicting routes

Route Limits & Quotas

Per VPC:
  - Max static routes: 500 (quota)
  - Max dynamic routes (BGP): 10,000

Per project:
  - Max VPCs: 15
  - Max routes total: 500 × 15 = 7,500 static

Quota can be increased:
  gcloud compute project-info describe --project=PROJECT_ID
  
If hitting limits:
  - Summarize routes (10.0.0.0/8 instead of individual subnets)
  - Use Shared VPC to centralize routing
  - Use Cloud Router for dynamic aggregation

Route Observability

bash

# List all routes:
gcloud compute routes list --network=prod-vpc \
  --format=table

# Describe specific route:
gcloud compute routes describe route-to-onprem

# Monitor route changes:
gcloud compute routes list --network=prod-vpc \
  --filter="creationTimestamp>2026-05-18"

# Check effective routes for instance:
gcloud compute instances describe vm1 \
  --format="value(networkInterfaces[0].network)"
  
# Then list routes for that network

Best Practices

✅ Do:

Use unique priorities for predictable routing
Document route purpose in description
Use Cloud Router for on-premises connectivity (automatic failover)
Test routes in staging first
Monitor route changes via audit logs

❌ Don't:

Create overlapping CIDR routes with same priority
Use instance as next-hop without can-ip-forward
Assume GCP health-checks route next-hops (it doesn't)
Mix static routes and dynamic routes (BGP) for same destination
Forget firewall rules (routing + firewall = full picture)

Conclusion

Static routes provide fine-grained control but require discipline:

Subnet routes: automatic, immutable, best for VPC-local traffic
Static routes: manual, mutable, needed for on-prem/cross-VPC
Next hops: choose based on redundancy needs (instance, ILB, VPN)
Priority: unique values prevent undefined routing

For complex scenarios (multi-region, failover), Cloud Router + BGP is recommended over static routes.

Static Routes & Next Hops — Customizing VPC Routing ​

Executive Summary ​

Route Fundamentals ​

Route Components ​

Route Matching Algorithm ​

Subnet Routes (Automatic) ​

Subnet Route Lifecycle ​

Subnet Route Limitations ​

Custom Static Routes ​

Creating Static Routes ​

Next Hop Types in Detail ​

Next Hop: Compute Instance ​

Next Hop: Internal Load Balancer (ILB) ​

Next Hop: VPN Tunnel ​

Next Hop: Cloud Interconnect ​

Next Hop: Default Internet Gateway ​

Route Priority & Conflicts ​

Priority Mechanism ​

Resolving Conflicts ​

Production Patterns ​

Pattern 1: Multi-Region On-Premises Connectivity ​

Pattern 2: Gateway Redundancy ​

Pattern 3: Split-Horizon Routing ​

Troubleshooting Routes ​

Symptom: "Destination Unreachable" ​

Symptom: Asymmetric Routing ​

Symptom: High Latency or Routing Loop ​

Route Limits & Quotas ​

Route Observability ​

Best Practices ​

Conclusion ​

Static Routes & Next Hops — Customizing VPC Routing

Executive Summary

Route Fundamentals

Route Components

Route Matching Algorithm

Subnet Routes (Automatic)

Subnet Route Lifecycle

Subnet Route Limitations

Custom Static Routes

Creating Static Routes

Next Hop Types in Detail

Next Hop: Compute Instance

Next Hop: Internal Load Balancer (ILB)

Next Hop: VPN Tunnel

Next Hop: Cloud Interconnect

Next Hop: Default Internet Gateway

Route Priority & Conflicts

Priority Mechanism

Resolving Conflicts

Production Patterns

Pattern 1: Multi-Region On-Premises Connectivity

Pattern 2: Gateway Redundancy

Pattern 3: Split-Horizon Routing

Troubleshooting Routes

Symptom: "Destination Unreachable"

Symptom: Asymmetric Routing

Symptom: High Latency or Routing Loop

Route Limits & Quotas

Route Observability

Best Practices

Conclusion