Latency SLA & Fiber Path Engineering
Vì sao quan trọng trong production
Google công bố latency SLA cho từng region pair. Ví dụ:
- us-central1 → us-east1: <5ms p50
- us-central1 → eu-west1: <80ms p50
- us-central1 → asia-southeast1: <120ms p50
Những con số này không phải ngẫu nhiên — chúng kết quả từ fiber path engineering cẩn thận. Hiểu cách Google achieve chúng giúp bạn:
- Thiết kế realistic SLA cho application (không promise 10ms latency từ Europe)
- Predict latency trong multi-region deployments
- Debug latency issues khi thực tế chậm hơn SLA
- Optimize topology để meet latency requirements
Internal Model: Fiber Path Engineering
Physical Fiber Infrastructure
Google owns/partners with fiber providers để build:
Transcontinental Fiber (Google backbone):
├─ US Tier: Multiple paths across US
│ ├─ Northern Route: Seattle → Chicago → New York
│ ├─ Central Route: Denver → Kansas City → DC
│ └─ Southern Route: Los Angeles → Texas → Miami
│
├─ Transatlantic: Multiple cable systems
│ ├─ North Atlantic: Cable A, Cable B (diverse)
│ └─ South Atlantic: Alternative if North fails
│
├─ Trans-Pacific: Multiple cable systems
│ ├─ US→Asia: Via Japan, Southeast Asia (multiple routes)
│ └─ Intra-Asia: Singapore → Tokyo → Australia
│
└─ Intra-Regional: Dense fiber meshes
├─ Connect all zones within region
└─ Sub-millisecond latencyLatency Components
Total latency = fiber propagation + switch/router processing + congestion
Example: us-central1 (Iowa) → eu-west1 (Belgium)
1. Fiber propagation (speed of light)
└─ Distance: ~6,500 km
└─ Speed in fiber: ~200,000 km/s (2/3 speed of light)
└─ Minimum: 6,500 / 200,000 = 32.5ms
2. Router/switch hops (processing)
└─ Each hop adds: 0.1-1ms
└─ Typical: 10-15 hops per path
└─ Processing: 1-15ms
3. Queuing (congestion)
└─ Light load: <1ms
└─ Heavy load: 5-20ms
└─ Burst: 50-100ms
Total estimate:
└─ Minimum (ideal): 32.5ms
└─ Normal (light load): 45-65ms
└─ High load: 65-100ms
└─ Peak: >100ms (CongestionControl kicks in)
Google SLA: <80ms p50 (covers normal + some congestion)Multi-Path Routing (Redundancy + Performance)
us-central1 → eu-west1 paths:
Path A (Primary):
└─ Iowa → Chicago → New York → Transatlantic Cable → London → Belgium
└─ 10 hops, ~80ms latency
Path B (Backup):
└─ Iowa → Denver → Los Angeles → Transpacific Cable (US-Asia)
→ Singapore → Suez Canal → Europe
└─ 15 hops, ~150ms latency (longer)
Path C (Secondary):
└─ Iowa → Chicago → Miami → Transatlantic South Cable → Europe
└─ 12 hops, ~85ms latency
Routing decision:
├─ Normal: Use Path A (shortest)
├─ Path A degraded: Switch to Path C
├─ Major event: Use Path B (long but available)
└─ Result: <80ms p50 SLA maintainedProduction Architecture Patterns
Pattern 1: Database Replication with Latency Guarantees
Primary: us-central1 (Iowa)
└─ Main application tier
Sync replica: us-east1 (South Carolina)
├─ Distance: ~1,400 km
├─ Fiber path: via Chicago
├─ Latency SLA: <10ms p99
├─ Strategy: Synchronous replication (waits for confirmation)
Async replica: eu-west1 (Belgium)
├─ Distance: ~6,500 km
├─ Fiber path: transatlantic cable
├─ Latency: ~80ms p50 (ok for async)
├─ Strategy: Asynchronous, eventual consistency
Design decision:
├─ Sync replica in same region pair (<10ms) → strong consistency
├─ Async replica far away (>80ms) → high availability but eventual consistency
└─ Result: Meeting both low-latency and high-availability requirementsPattern 2: Global Content Distribution (CDN Strategy)
Users worldwide requesting content:
├─ User in Tokyo
│ └─ Query: asia-southeast1 (Singapore) cache
│ └─ Latency: ~30ms (intra-Asia fiber)
│ └─ Miss: Fetch from us-central1
│ └─ Latency: ~120ms (trans-Pacific)
│
├─ User in São Paulo
│ └─ Query: regional cache (future)
│ └─ Miss: Fetch from us-central1
│ └─ Latency: ~80ms (transatlantic via southern route)
│
└─ User in London
└─ Query: eu-west1 cache
└─ Latency: ~5ms (intra-Europe fiber)
└─ Miss: Fetch from us-central1
└─ Latency: ~80ms (transatlantic cable)
Result: Cache hit = users get <30ms latency globallyPattern 3: Anycast Traffic Engineering
Global anycast IP: 35.201.123.45 announced from all regions
User BGP routing:
├─ User in Tokyo sees: "35.201.123.45 via asia-southeast1 (AS path 2)"
├─ User in London sees: "35.201.123.45 via eu-west1 (AS path 2)"
└─ User in São Paulo sees: "35.201.123.45 via us-central1 (AS path 3)"
Result:
├─ User routed to nearest edge
├─ Latency: naturally optimized via BGP
├─ No geographic awareness needed in application
└─ SLA met automaticallyReal-world Failure Scenarios
Scenario 1: Undersea Cable Cut (Transatlantic)
Event: Ship anchor cuts transatlantic fiber
Symptoms:
├─ All us-central1 ↔ eu-west1 traffic latency increases
├─ Latency: 80ms → 250ms (rerouting via Pacific)
├─ Packet loss: Minimal (reroute within milliseconds)
├─ Duration: ~24 hours (cable repair ship arrives)
Impact:
├─ SLA violated (80ms → 250ms > SLA)
├─ Application timeouts possible
├─ Users report slowness
Mitigation:
├─ Multi-path routing: Primary via Atlantic, secondary via Pacific
├─ Automatic failover: ECMP detects primary down, uses secondary
├─ Result: Latency increases but within acceptable bounds
Recovery:
├─ Repair crew: Can take 24-48 hours to reach and repair
├─ Temporary: Use satellite backup (expensive, rarely deployed)
└─ Long-term: Install redundant cableScenario 2: Software Bug Causes Path Loop (BGP Misconfiguration)
Symptom:
├─ Specific region pairs see 10x latency increase
├─ Example: us-central1 → asia-northeast1 (Tokyo)
├─ Normal: 100ms → Observed: 1000ms+
Root cause:
└─ BGP misconfiguration: routing loop
└─ Packet: us-central1 → intermediate1 → intermediate2 → back to us-central1
└─ Loop until TTL expires (~128 hops)
Investigation:
├─ traceroute shows excessive hops (>20)
├─ tcpdump reveals packet circling
├─ BGP logs show duplicate AS in path
Fix:
├─ Roll back BGP configuration
├─ Apply fix: filter out self-referencing routes
├─ Recovery: Within 30 minutes
└─ SLA: Violated during incidentScenario 3: Congestion During DDoS (Peak Latency)
Event: DDoS attack hitting eu-west1 region
Symptoms:
├─ eu-west1 local latency: Normal
├─ All traffic TO eu-west1: Latency increases
├─ us-central1 → eu-west1: 80ms → 200ms+
Root cause:
└─ DDoS traffic saturating transatlantic cable
└─ Legitimate traffic delayed behind attack traffic
└─ No packet loss (traffic shaped, not dropped at GCP edge)
Impact:
├─ Users from Americas accessing eu-west1: Slow
├─ Users from Europe accessing eu-west1: Normal
├─ SLA violated for cross-Atlantic connections
Mitigation:
├─ DDoS scrubbing at PoP: Filters attack traffic early
├─ Traffic Engineering: Reroute via alternative paths
├─ Burst control: Limited to prevent cascade
Resolution:
├─ DDoS mitigation: Reduce attack traffic
├─ Capacity increase: Add more backbone capacity
└─ Recovery: Within hours as attack diminishesCommon Mistakes & Anti-Patterns
Mistake 1: Assuming Same Latency Between All Region Pairs
❌ Wrong thinking:
"All Google regions have similar latency to each other"✅ Correct understanding:
- Latency depends on distance and fiber routing
- us-east1 ↔ us-west1: ~40ms
- us-central1 ↔ eu-west1: ~80ms
- us-central1 ↔ asia-northeast1: ~100ms+
- Must check actual SLA for your regions
Prevention: Always consult Google's latency dashboard. Test actual latency before deploying.
Mistake 2: Not Planning for Latency Tail (p99)
❌ Wrong thinking:
"80ms p50 latency means worst case is 100ms"✅ Correct understanding:
- p50: 80ms median
- p99: 150-200ms (2-3x median)
- p99.9: 300-500ms (during peaks)
- Must design for p99, not p50
Prevention: Set timeout values based on p99, not median.
Mistake 3: Ignoring Congestion During Peak
❌ Wrong thinking:
"Latency SLA guarantees consistent latency all day"✅ Correct understanding:
- SLA is for normal conditions
- Peak traffic can cause congestion
- Latency during Black Friday / New Year: Can spike 2-3x
- Need to design with burst allowance
Prevention: Load test during peak hours. Plan for 2x normal latency.
GCP-native Implementation Guidance
Measuring Real Latency
bash
# Between VMs in different regions
gcloud compute ssh vm-us-central1 --zone=us-central1-a
# Inside VM: ping target VM in eu-west1
ping vm-eu-west1-external-ip
# Measure p50/p99:
for i in {1..1000}; do ping -c 1 target-ip; done | tee ping-results.txt
# Parse: sort times, calculate percentiles
# Better: Use hping3 for TCP latency
hping3 -S -p 443 target-ip --fast
# Shows TCP SYN-ACK latency (includes application stack)Multi-Region Setup for Latency Guarantees
bash
# Primary region (low latency to users)
gcloud compute instances create app-us-central1 \
--zone=us-central1-a \
--machine-type=e2-medium
# Secondary region (high availability, acceptable latency)
gcloud compute instances create app-eu-west1 \
--zone=europe-west1-b \
--machine-type=e2-medium
# Create global load balancer (automatically uses latency-based routing)
gcloud compute backend-services create global-backend \
--global \
--protocol=HTTPS \
--health-checks=tcp-health-check
# Add backends from both regions
gcloud compute backend-services add-backends global-backend \
--global \
--instance-group=ig-us-central1 \
--instance-group-zone=us-central1-a \
--balancing-mode=RATE \
--max-rate-per-instance=1000
gcloud compute backend-services add-backends global-backend \
--global \
--instance-group=ig-eu-west1 \
--instance-group-zone=europe-west1-b \
--balancing-mode=RATE \
--max-rate-per-instance=1000
# Result: Traffic automatically routed to lowest-latency regionReferences
- Inter-region Latency Report — Real-time latency measurements
- GCP Network Performance Metrics — Official latency specifications
- Global Load Balancing Best Practices — Multi-region architecture
- Cloud Interconnect for Guaranteed Bandwidth — For mission-critical latency
Next: Anycast Routing with Global Load Balancer — How traffic automatically reaches nearest region