Dynamic Routes & Cloud Router — BGP-based Routing Automation
Executive Summary
Cloud Router = managed BGP router tại VPC level, cho phép dynamic route learning từ on-premises.
Key points:
- ✅ Automatic failover (không cần manual route updates)
- ✅ Bi-directional route advertisement (GCP routes → on-prem, on-prem → GCP)
- ✅ Multi-region hub-and-spoke topologies possible
- ❌ Adds latency (distributed BGP state management)
- ❌ Requires BGP expertise (ASN configuration, communities, filtering)
Cloud Router Architecture
Regional Scope
Cloud Router là regional resource:
Organization
├── VPC prod-vpc (global)
│ ├── Cloud Router us-central1 (regional)
│ ├── Cloud Router europe-west1 (regional)
│ └── Cloud Router asia-southeast1 (regional)Mỗi region có riêng BGP session, independently learns/advertises routes.
Implication:
Route learned in us-central1 via BGP
→ NOT automatically propagated to europe-west1
Must configure separate BGP session per region
để advertise same routeBGP Session Components
yaml
Cloud Router (us-central1):
- ASN: 64514 (Google ASN, private range 64512-65534)
- Interface IP: 169.254.1.1/30 (BGP session interface)
On-premises:
- ASN: 65001 (Customer ASN)
- Interface IP: 169.254.1.2/30
BGP Session:
- Neighbors: 169.254.1.1 ↔ 169.254.1.2
- OPEN handshake
- KEEPALIVE every 60 seconds
- UPDATE messages: route advertisementsBGP Configuration
Creating Cloud Router
bash
gcloud compute routers create router-us-central1 \
--network=prod-vpc \
--region=us-central1 \
--asn=64514
# Verify:
gcloud compute routers describe router-us-central1 \
--region=us-central1Creating BGP Peer (On-Premises)
bash
# For Cloud VPN tunnel:
gcloud compute routers add-bgp-peer router-us-central1 \
--peer-name=bgp-site-a \
--interface=vpn-interface-site-a \
--peer-asn=65001 \
--region=us-central1
# Alternatively, for Cloud Interconnect VLAN attachment:
gcloud compute routers add-bgp-peer router-us-central1 \
--peer-name=bgp-interconnect \
--interface=ic-vlan-interface \
--peer-asn=65000 \
--region=us-central1Route Advertisement
bash
# Advertise specific subnets (instead of all VPC subnets):
gcloud compute routers update router-us-central1 \
--region=us-central1 \
--advertisement-mode=custom \
--set-advertisement-groups=all_subnets \
--set-advertisement-ranges=10.0.1.0/24,10.0.2.0/24
# Or import learned routes:
gcloud compute routers update router-us-central1 \
--region=us-central1 \
--advertisement-mode=custom \
--set-advertisement-groups=all_routesRoute Learning & Propagation
BGP UPDATE Messages
On-prem BGP peer sends UPDATE:
NLRI: 192.168.0.0/24
AS_PATH: 65001
NEXT_HOP: 169.254.1.2
Cloud Router receives:
- Learns 192.168.0.0/24 reachable via AS 65001
- Converts to GCP route:
Destination: 192.168.0.0/24
Next Hop: Cloud Router
Type: Dynamic (BGP)
Priority: 200 (high)
Propagates within VPC:
- Instances in all regions can reach 192.168.0.0/24
- Return path automatic (symmetric routing)Route Propagation Delays
Timeline:
t=0: On-prem advertises 192.168.0.0/24 via BGP
t=1: Cloud Router receives BGP UPDATE
t=1-5: Route propagates within GCP (eventual consistency)
t=5: Instances in other regions see route
Result: 1-5 second delay from BGP learn to VM seeing route
Impact:
- Not instantaneous failover
- Queries fail briefly during propagation
- Use TCP retry logic to tolerateRegional vs Global Modes
Regional Mode (Default)
Scenario: Separate BGP session per region
Region: us-central1
Cloud Router: learning 192.168.0.0/16
Routes: available to us-central1 VMs
Region: europe-west1
Cloud Router: NOT learning anything (no BGP peer)
Routes: NOT available to europe-west1 VMs
Solution: Set up separate BGP peer in europe-west1Global Mode (Preview)
Scenario: Global route learning
Cloud Router us-central1: learns 192.168.0.0/16 via BGP
→ Propagates to ALL regions (not just us-central1)
Cloud Router europe-west1: NOT configured
→ Inherits routes from us-central1
Advantage: Single BGP session for multi-region
Disadvantage: Asymmetric traffic (may exit/enter different regions)
Example packet flow:
us-central1 VM → on-prem: exits us-central1 region
on-prem → europe-west1 VM: ingress europe-west1 region
Result: Cross-region hop (latency, costs)Production Patterns
Pattern 1: Hub-and-Spoke with Cloud Router
Architecture:
On-Premises (Site A)
192.168.0.0/16
↑ BGP
│
VPN Tunnel
│
┌──────────────────────┴──────────────────────┐
│ prod-vpc (GCP) │
│ 10.0.0.0/16 │
│ │
│ Cloud Router us-central1 (HUB) │
│ ASN: 64514 │
│ BGP session → Site A │
│ │
│ Advertises: │
│ - 10.0.1.0/24 (us-central1 subnet) │
│ - 10.0.2.0/24 (europe-west1 subnet) │
└──────────────────────┬──────────────────────┘
Routing:
us-central1 VM:
Route: 192.168.0.0/16 → Cloud Router us-central1
Exit: VPN tunnel
europe-west1 VM:
Route: 192.168.0.0/16 → Cloud Router us-central1 (dynamic)
Transit: Cross-region through VPC backbone
Exit: VPN tunnel via us-central1
Result: All traffic → on-prem exits us-central1
Asymmetric routing (ingress ≠ egress region)
Fix: Set up Cloud Router europe-west1 with separate BGP session
Or use global mode (if available)Pattern 2: Multi-Site BGP with Failover
Architecture:
Site A (192.168.1.0/16)
↑ BGP
│ AS_PATH: 65001
↓
Cloud Router us-central1
Site B (192.168.2.0/16)
↑ BGP
│ AS_PATH: 65002
↓
Cloud Router us-central1
GCP receives two different routes:
- 192.168.1.0/16 via AS 65001 (Site A)
- 192.168.2.0/16 via AS 65002 (Site B)
If Site A fails:
- BGP session down
- Route 192.168.1.0/16 withdrawn
- Traffic to Site A drops
Automatic failover NOT possible (different CIDRs)
Better pattern: Advertise same CIDR from both sites
- Site A: advertises 192.168.0.0/16 (AS_PATH: 65001)
- Site B: advertises 192.168.0.0/16 (AS_PATH: 65002 65001)
GCP BGP path selection: shortest AS_PATH wins
- Prefers Site A (65001 < 65002 65001)
- If Site A fails, falls back to Site B
Automatic failover happens!Pattern 3: Route Filtering with Communities
Scenario: Different routing policies per region
On-premises advertises:
- 192.168.1.0/24 (critical apps) + community 65001:100
- 192.168.2.0/24 (dev apps) + community 65001:200
GCP Cloud Router can filter:
us-central1 (production):
Import: Accept community 65001:100 only
Effect: Only critical routes learned
europe-west1 (development):
Import: Accept community 65001:200 only
Effect: Only dev routes learned
Result: Different routing policies per regionPattern 4: Dynamic Failover with Multiple Tunnels
Setup:
Primary VPN tunnel: us-central1 → Site A
BGP session 1: Cloud Router us-central1 ← Site A
ASN: 65001
Backup VPN tunnel: europe-west1 → Site B
BGP session 2: Cloud Router europe-west1 ← Site B
ASN: 65002
Routes:
Route 192.168.0.0/16 via AS 65001 (primary)
Route 192.168.0.0/16 via AS 65002 (backup)
Failover:
Normal:
Packets 192.168.0.0/16 → VPN us-central1 → Site A
Site A down:
BGP session 1 times out
Route via 65001 withdrawn
Fall back to route via 65002
Packets → VPN europe-west1 → Site B
Automatic failover (1-3 second detection)
Advantage over static routes:
- No manual intervention needed
- Health check built-in (BGP KEEPALIVE)
- Fast failoverBGP Best Practices
BGP Communities for Policy
bash
# Tag routes with community for filtering:
On-premises:
interface BGP 65001
address-family ipv4
route-map ADD-COMMUNITY out
route-map ADD-COMMUNITY permit 10
set community 65001:100 (critical)
route-map ADD-COMMUNITY permit 20
set community 65001:200 (noncritical)
GCP Cloud Router import policy:
(Configure via custom import/export policies)Graceful Shutdown
bash
# When taking down BGP session:
gcloud compute routers update-bgp-peer router-us-central1 \
--peer-name=bgp-site-a \
--region=us-central1 \
--bfd-mode=enabled # Fast failure detection
# Gracefully disable:
gcloud compute routers update-bgp-peer router-us-central1 \
--peer-name=bgp-site-a \
--region=us-central1 \
--clear-advertised-ranges # Stop advertising routes
# Then:
gcloud compute routers delete-bgp-peer router-us-central1 \
--peer-name=bgp-site-a \
--region=us-central1Troubleshooting BGP
Symptom: Routes Not Appearing
bash
Diagnosis:
1. Check BGP session status:
gcloud compute routers get-status router-us-central1 \
--region=us-central1 \
--format=json
Output:
"bgpPeerStatus": [{
"name": "bgp-site-a",
"state": "UP", # or "DOWN"
"uptime": "3600s",
"prefixesReceived": 5
}]
2. If state=DOWN:
- Check VPN tunnel: gcloud compute vpn-tunnels list
- Check BGP peer configuration: gcloud compute routers describe
- Check on-prem BGP neighbor state
3. If state=UP but prefixesReceived=0:
- Check import policy (maybe filtering routes)
- Check on-prem is advertising routes
- Check BGP ASN/interface IPs match
4. Check advertised routes:
gcloud compute routers get-status router-us-central1 \
--region=us-central1 \
--format="table(bestRoutesForRouter[].destination)"Symptom: Asymmetric Routing
Problem: GCP → on-prem works, on-prem → GCP fails
Cause: GCP advertises subnets, on-prem doesn't receive
Solution:
1. Check what GCP is advertising:
gcloud compute routers get-status router-us-central1 \
--region=us-central1
2. Check on-prem BGP table:
(On-prem router) show ip bgp
3. If on-prem shows GCP routes:
→ BGP working, problem is GCP firewall/route
Check ingress firewall: gcloud compute firewall-rules list
Check destination routing tables
4. If on-prem doesn't show GCP routes:
→ BGP not advertising properly
Check export policy
gcloud compute routers describe router-us-central1 \
--region=us-central1Symptom: BGP Session Flapping
Problem: BGP session going up/down repeatedly
Symptoms:
- Routes disappear/reappear
- High latency
- Packet loss
Causes:
1. Network instability: packet loss in VPN tunnel
→ BGP KEEPALIVE timeout (180 seconds default)
2. MTU mismatch: packets fragmented, BGP UPDATE dropped
Check: GCP VPN MTU (1460), on-prem MTU (must match)
3. BGP configuration mismatch: ASN/interface IP differs
Solution:
- Enable BFD for fast failure detection
- Check tunnel throughput/packet loss
- Verify MTU end-to-end
- Increase BGP timers if stable but slowRoute Limits
Per Cloud Router:
- Max BGP peers: 100
- Max learned routes: 10,000
- Max advertised routes: 10,000
Per VPC:
- Max dynamic routes: 10,000 (combined from all routers)
- Plus max 500 static routes
Quota increases: Contact Google Cloud support
If approaching limits:
- Use route summarization (advertise 10.0.0.0/8 instead of individual subnets)
- Split into multiple VPCs
- Use VPC Peering instead of routingMonitoring BGP
bash
# Monitor route changes:
gcloud logging read \
"resource.type=Cloud Router AND resource.labels.router_id=router-us-central1" \
--limit=50 \
--format=json
# List BGP peer status:
gcloud compute routers get-status router-us-central1 \
--region=us-central1 \
--format=table
# Monitor specific peer:
watch -n 5 'gcloud compute routers get-status router-us-central1 \
--region=us-central1 \
--format="table(bgpPeerStatus[name, state, uptime, prefixesReceived])"'Conclusion
Cloud Router provides enterprise-grade routing automation:
✅ Advantages:
- Automatic failover via BGP
- Bi-directional routing
- Multi-site connectivity
- Route filtering via policies
❌ Disadvantages:
- Extra complexity (BGP configuration)
- Potential for asymmetric routing (multi-region)
- BGP flapping can destabilize network
Best for: Production environments with on-premises connectivity requiring automatic failover
Not needed: Simple VPN-only scenarios (static routes sufficient)
For large-scale multi-site deployments, Cloud Router is essential.