Anycast Routing with Global Load Balancer

Vì sao quan trọng trong production

Anycast là kỹ thuật cho phép single IP address serve từ multiple locations. Global Load Balancer trong GCP sử dụng anycast để:

Announce cùng IP từ tất cả regions
Users tự động route tới vị trí gần nhất (via BGP)
Zero application changes — application chỉ biết về single IP

Điều này cho phép bạn:

Deploy infrastructure ở nhiều chỗ
Users tự động reach nearest endpoint (low latency)
Failover automatic nếu region down
Geo-distribution mà không cần DNS tricks

Internal Model: Anycast Mechanism

Traditional Unicast (Có 1 IP per location)

Application: api.example.com

DNS record:
├─ api-us.example.com → 35.201.123.45 (US region)
├─ api-eu.example.com → 34.159.123.45 (EU region)
└─ api-ap.example.com → 34.87.123.45 (AP region)

Application code:
```javascript
if (user.location == 'US') {
  fetch('https://api-us.example.com')
} else if (user.location == 'EU') {
  fetch('https://api-eu.example.com')
} // ... logic for every region

Problems: ├─ Application must geo-geolocate users ├─ Must manage multiple hostnames ├─ Failover requires DNS changes (slow) └─ Complex client-side logic


### Anycast (Single IP, Multiple Locations)

Application: api.example.com

DNS record: └─ api.example.com → 35.201.123.45 (single anycast IP)

BGP announcements (behind scenes): ├─ From us-central1: "I have 35.201.123.45" (AS64512 path 1) ├─ From eu-west1: "I have 35.201.123.45" (AS64512 path 1) └─ From asia-southeast1: "I have 35.201.123.45" (AS64512 path 1)

User queries: ├─ User in San Francisco: BGP route "35.201.123.45 via as-path 1 (us-central1)" ├─ User in Berlin: BGP route "35.201.123.45 via as-path 1 (eu-west1)" ├─ User in Tokyo: BGP route "35.201.123.45 via as-path 1 (asia-southeast1)"

Result: ├─ Same IP for all users ├─ Automatic geo-routing via BGP ├─ Application: Single fetch('https://api.example.com') └─ Transparency: Clients don't need geo-awareness


### BGP Anycast Advertisement

GCP Global Load Balancer announces routes:

Announcement detail: ├─ Prefix: 35.201.123.0/24 (contains 35.201.123.45) ├─ Origin AS: Google's AS (15169) ├─ From region: us-central1, eu-west1, asia-southeast1 (all announce) ├─ Path metric: Equal (same AS path length) │ └─ All 3 paths have same cost (ECMP capable) │ └─ ISP routing table sees: ├─ Route 1: 35.201.123.0/24 via PoP-us (shortest) ├─ Route 2: 35.201.123.0/24 via PoP-eu (equal cost) └─ Route 3: 35.201.123.0/24 via PoP-ap (equal cost)

BGP best path selection (for user traffic): ├─ User ISP: "Which path to 35.201.123.45?" ├─ All 3 equal AS path length → use local BGP policy ├─ Usually: Prefer route via closest PoP (IGP cost) └─ Result: User traffic naturally geo-routed


## Production Architecture Patterns

### Pattern 1: Global Multi-Region Deployment

Deployment: ├─ Backend in us-central1 (North America) ├─ Backend in eu-west1 (Europe) ├─ Backend in asia-southeast1 (Southeast Asia) │ └─ Global anycast IP: 35.201.123.45 └─ Announced from all 3 regions

Traffic flow: ┌──────────────────────────────────────────────┐ │ Internet User (any location) │ │ DNS query: "api.example.com?" │ │ Response: 35.201.123.45 │ └──────────────────┬───────────────────────────┘ │ ┌────────────┼────────────┐ │ │ │ ┌────▼─────┐ ┌───▼──────┐ ┌──▼────────┐ │ User USA │ │ User EU │ │ User Asia │ └────┬─────┘ └───┬──────┘ └──┬────────┘ │ │ │ (BGP routing via PoPs) ┌────▼──────┐ ┌───▼──────┐ ┌──▼────────┐ │ PoP-US │ │ PoP-EU │ │ PoP-ASIA │ └────┬──────┘ └───┬──────┘ └──┬────────┘ │ │ │ └────────┬───┴────┬───────┘ │ │ ┌─────▼──┬─────▼──┬──────────┐ │ GL │ Load │ Balancer │ │ B │ │ Decision │ └─────────┴───────┴──────────┘ │ ┌─────▼──────────┐ │ Backend (all 3 │ │ regions can │ │ receive │ │ traffic) │ └────────────────┘

Result: ├─ User in US → Routed to us-central1 (via PoP-US) ├─ User in EU → Routed to eu-west1 (via PoP-EU) ├─ User in Asia → Routed to asia-southeast1 (via PoP-ASIA) └─ Failover: If one region down, users reroute to next nearest


### Pattern 2: Active-Active with Asymmetric Routing

Deployment: ├─ Primary: us-central1 (60% of traffic) ├─ Secondary: eu-west1 (40% of traffic) │ └─ Single anycast IP with Traffic Director policy └─ Incoming: User→GLB routes to nearest └─ Outgoing: Can exit from different region (asymmetric)

Example flow: ├─ User in London sends request │ └─ Inbound: PoP-EU routes to eu-west1 (primary incoming) │ └─ Backend in eu-west1 processes, sends response │ └─ Outbound: Response exits via PoP-EU (return path) │ └─ PoP-EU failure scenario: └─ Inbound: PoP-UK reroutes to us-central1 └─ Backend in us-central1 processes └─ Outbound: Response exits via PoP-US (asymmetric return!) └─ Network handles asymmetric paths (stateful connections)


### Pattern 3: Gradual Traffic Migration (Canary Deployment)

Initial state: 100% traffic to us-central1 ├─ Anycast IP: 35.201.123.45 announced from us-central1 only ├─ eu-west1 prepared but not announced

Migration: ├─ Step 1: Announce from eu-west1 (start with low preference) │ └─ BGP metric: eu-west1 path higher cost │ └─ Traffic: ~5% reaches eu-west1 │ ├─ Step 2: Gradually increase eu-west1 preference │ └─ Step 2a: 10% traffic to eu-west1 │ └─ Step 2b: 25% traffic to eu-west1 │ └─ Step 2c: 50% traffic to eu-west1 │ └─ Step 3: Complete migration (if no issues) └─ Deannounce us-central1, 100% to eu-west1

Monitoring: ├─ Each step: Monitor error rates, latency, health ├─ Roll-back: Re-announce us-central1, revert traffic └─ Result: Zero-downtime migration to new region


## Real-world Failure Scenarios

### Scenario 1: PoP Failure (Anycast Automatic Failover)

Setup: 3-region anycast deployment ├─ us-central1, eu-west1, asia-southeast1 └─ All announcing same anycast IP

Failure: PoP-EU becomes unreachable ├─ Symptom: Users in Europe see latency spike ├─ BGP convergence: ISPs detect PoP-EU withdrawal ├─ New routes: Users reroute to PoP-US or PoP-AP

Recovery timeline: ├─ T+0s: PoP-EU fails ├─ T+10-30s: ISPs detect failure (BGP timeout) ├─ T+30-60s: User traffic reroutes to alternate PoP ├─ T+60-180s: Latency normalizes (users on alternate path) └─ T+5-10min: PoP-EU restored, routes converge back

Anycast benefit: ├─ Automatic failover: No manual intervention ├─ Transparent: Users don't care which backend serves └─ Graceful: Traffic gradually shifts, not sudden loss


### Scenario 2: Region Down (Anycast Masking Failure)

Failure: us-central1 datacenter down (power issue)

Normal unicast (multiple IPs): ├─ Users attempting to reach 35.201.123.45 (us-central1 IP) → Timeout ├─ Need: Manual DNS failover or client code changes └─ Result: 10-60 second outage per user

Anycast (single IP): ├─ Users reach 35.201.123.45 (anycast IP) ├─ BGP: Route to us-central1 withdrawn ├─ New routes: PoP-US automatically reroutes to eu-west1 or asia-southeast1 ├─ Latency: Higher (now cross-region) but accessible └─ Result: Automatic failover, <1 second perceived outage

Anycast advantage: └─ Failure masked automatically, no app changes needed


### Scenario 3: BGP Hijack / Route Leak (Security)

Threat: Attacker announces 35.201.123.0/24 from AS64000 (rogue AS)

Normal scenario: ├─ Google announces: 35.201.123.0/24 from AS15169 (AS path 1) ├─ Rogue announces: 35.201.123.0/24 from AS64000 (AS path 1) ├─ BGP tie-break: Prefer shorter/better AS path ├─ If rogue has shorter path: Traffic hijacked

GCP mitigation: ├─ RPKI (Resource Public Key Infrastructure): Crypto validation ├─ Google signs: "AS15169 can announce 35.201.123.0/24" ├─ ISP filters: Reject announcements not signed by Google ├─ Result: Rogue announcements blocked


## Common Mistakes & Anti-Patterns

### Mistake 1: Expecting Same Performance via Anycast

❌ **Wrong thinking**:

"Anycast means all users get identical latency"


✅ **Correct understanding**:
- Anycast routes to nearest (geographic location-wise)
- Nearest ≠ best latency (depends on BGP metrics)
- Different PoPs have different latencies
- If backend down, traffic might route to far location

**Prevention**: Monitor per-region latencies. Set alerts for region failures.

### Mistake 2: Not Handling Asymmetric Routes

❌ **Wrong thinking**:

"Request and response always take same path"


✅ **Correct understanding**:
- Anycast can cause asymmetric routing
- Inbound: user→PoP-A→backend-A
- Outbound: backend-A→PoP-B (different!)
- Network must handle asymmetric paths (stateful firewalls complicate)

**Prevention**: Test asymmetric routing scenarios. Understand implications for TCP/UDP connections.

### Mistake 3: Over-Relying on Anycast for Failover

❌ **Wrong thinking**:

"Anycast handles all failover automatically, no need for health checks"


✅ **Correct understanding**:
- Anycast failover: Depends on BGP convergence (~10-30 seconds)
- Health checks: Can detect and shift traffic faster (seconds)
- Combination: Use both for fastest failover

**Prevention**: Enable health checks on backends. Monitor failover times.

## GCP-native Implementation Guidance

### Setting Up Global Load Balancer with Anycast

```bash
# Create global IP (anycast by default)
gcloud compute addresses create global-static-ip \
  --global \
  --address-type=EXTERNAL

# Create health check
gcloud compute health-checks create tcp \
  --name=tcp-health-check \
  --port=443

# Create backend services for each region
gcloud compute backend-services create backend-us \
  --global \
  --protocol=HTTPS \
  --health-checks=tcp-health-check

gcloud compute backend-services create backend-eu \
  --global \
  --protocol=HTTPS \
  --health-checks=tcp-health-check

# Add backends
gcloud compute backend-services add-backends backend-us \
  --global \
  --instance-group=ig-us-central1 \
  --instance-group-zone=us-central1-a

gcloud compute backend-services add-backends backend-eu \
  --global \
  --instance-group=ig-eu-west1 \
  --instance-group-zone=europe-west1-b

# Create URL map for routing
gcloud compute url-maps create my-url-map \
  --default-service=backend-us

# Create HTTPS proxy
gcloud compute target-https-proxies create my-https-proxy \
  --url-map=my-url-map \
  --ssl-certificates=my-cert

# Create forwarding rule (anycast announced automatically)
gcloud compute forwarding-rules create my-forwarding-rule \
  --global \
  --target-https-proxy=my-https-proxy \
  --address=global-static-ip \
  --ports=443

# Result: Single IP announced from all regions via anycast

Monitoring Anycast Health

bash

# Check backend health
gcloud compute backend-services get-health backend-us --global

# Monitor traffic distribution
gcloud logging read \
  "resource.type=http_load_balancer AND jsonPayload.backend_region" \
  --format='json' | \
  jq -r '.[] | "\(.jsonPayload.backend_region) \(.jsonPayload.bytes_sent)"' | \
  sort | uniq -c

# Check if all regions are announcing
gcloud compute routes list --filter="dest_range~35.201.123" --format='table(dest_range,next_hop_gateway)'

References

Global Load Balancing with Anycast — Official documentation
How Anycast Works in GCP — Architecture details
Traffic Director (Advanced Anycast) — Advanced anycast features
BGP Routing in GCP — Underlying routing mechanism

Next: Cold Potato vs Hot Potato Routing — Strategic choices in traffic path selection

Anycast Routing with Global Load Balancer ​

Vì sao quan trọng trong production ​

Internal Model: Anycast Mechanism ​

Traditional Unicast (Có 1 IP per location) ​