Skip to content

Latency SLA & Fiber Path Engineering

Vì sao quan trọng trong production

Google công bố latency SLA cho từng region pair. Ví dụ:

  • us-central1 → us-east1: <5ms p50
  • us-central1 → eu-west1: <80ms p50
  • us-central1 → asia-southeast1: <120ms p50

Những con số này không phải ngẫu nhiên — chúng kết quả từ fiber path engineering cẩn thận. Hiểu cách Google achieve chúng giúp bạn:

  • Thiết kế realistic SLA cho application (không promise 10ms latency từ Europe)
  • Predict latency trong multi-region deployments
  • Debug latency issues khi thực tế chậm hơn SLA
  • Optimize topology để meet latency requirements

Internal Model: Fiber Path Engineering

Physical Fiber Infrastructure

Google owns/partners with fiber providers để build:

Transcontinental Fiber (Google backbone):
├─ US Tier: Multiple paths across US
│  ├─ Northern Route: Seattle → Chicago → New York
│  ├─ Central Route: Denver → Kansas City → DC
│  └─ Southern Route: Los Angeles → Texas → Miami

├─ Transatlantic: Multiple cable systems
│  ├─ North Atlantic: Cable A, Cable B (diverse)
│  └─ South Atlantic: Alternative if North fails

├─ Trans-Pacific: Multiple cable systems
│  ├─ US→Asia: Via Japan, Southeast Asia (multiple routes)
│  └─ Intra-Asia: Singapore → Tokyo → Australia

└─ Intra-Regional: Dense fiber meshes
   ├─ Connect all zones within region
   └─ Sub-millisecond latency

Latency Components

Total latency = fiber propagation + switch/router processing + congestion

Example: us-central1 (Iowa) → eu-west1 (Belgium)

1. Fiber propagation (speed of light)
   └─ Distance: ~6,500 km
   └─ Speed in fiber: ~200,000 km/s (2/3 speed of light)
   └─ Minimum: 6,500 / 200,000 = 32.5ms

2. Router/switch hops (processing)
   └─ Each hop adds: 0.1-1ms
   └─ Typical: 10-15 hops per path
   └─ Processing: 1-15ms

3. Queuing (congestion)
   └─ Light load: <1ms
   └─ Heavy load: 5-20ms
   └─ Burst: 50-100ms

Total estimate:
└─ Minimum (ideal): 32.5ms
└─ Normal (light load): 45-65ms
└─ High load: 65-100ms
└─ Peak: >100ms (CongestionControl kicks in)

Google SLA: <80ms p50 (covers normal + some congestion)

Multi-Path Routing (Redundancy + Performance)

us-central1 → eu-west1 paths:

Path A (Primary):
└─ Iowa → Chicago → New York → Transatlantic Cable → London → Belgium
   └─ 10 hops, ~80ms latency

Path B (Backup):
└─ Iowa → Denver → Los Angeles → Transpacific Cable (US-Asia) 
   → Singapore → Suez Canal → Europe
   └─ 15 hops, ~150ms latency (longer)

Path C (Secondary):
└─ Iowa → Chicago → Miami → Transatlantic South Cable → Europe
   └─ 12 hops, ~85ms latency

Routing decision:
├─ Normal: Use Path A (shortest)
├─ Path A degraded: Switch to Path C
├─ Major event: Use Path B (long but available)
└─ Result: <80ms p50 SLA maintained

Production Architecture Patterns

Pattern 1: Database Replication with Latency Guarantees

Primary: us-central1 (Iowa)
└─ Main application tier

Sync replica: us-east1 (South Carolina)
├─ Distance: ~1,400 km
├─ Fiber path: via Chicago
├─ Latency SLA: <10ms p99
├─ Strategy: Synchronous replication (waits for confirmation)

Async replica: eu-west1 (Belgium)
├─ Distance: ~6,500 km
├─ Fiber path: transatlantic cable
├─ Latency: ~80ms p50 (ok for async)
├─ Strategy: Asynchronous, eventual consistency

Design decision:
├─ Sync replica in same region pair (<10ms) → strong consistency
├─ Async replica far away (>80ms) → high availability but eventual consistency
└─ Result: Meeting both low-latency and high-availability requirements

Pattern 2: Global Content Distribution (CDN Strategy)

Users worldwide requesting content:
├─ User in Tokyo
│  └─ Query: asia-southeast1 (Singapore) cache
│  └─ Latency: ~30ms (intra-Asia fiber)
│  └─ Miss: Fetch from us-central1
│     └─ Latency: ~120ms (trans-Pacific)

├─ User in São Paulo
│  └─ Query: regional cache (future)
│  └─ Miss: Fetch from us-central1
│     └─ Latency: ~80ms (transatlantic via southern route)

└─ User in London
   └─ Query: eu-west1 cache
   └─ Latency: ~5ms (intra-Europe fiber)
   └─ Miss: Fetch from us-central1
      └─ Latency: ~80ms (transatlantic cable)

Result: Cache hit = users get <30ms latency globally

Pattern 3: Anycast Traffic Engineering

Global anycast IP: 35.201.123.45 announced from all regions

User BGP routing:
├─ User in Tokyo sees: "35.201.123.45 via asia-southeast1 (AS path 2)"
├─ User in London sees: "35.201.123.45 via eu-west1 (AS path 2)"
└─ User in São Paulo sees: "35.201.123.45 via us-central1 (AS path 3)"

Result:
├─ User routed to nearest edge
├─ Latency: naturally optimized via BGP
├─ No geographic awareness needed in application
└─ SLA met automatically

Real-world Failure Scenarios

Scenario 1: Undersea Cable Cut (Transatlantic)

Event: Ship anchor cuts transatlantic fiber

Symptoms:
├─ All us-central1 ↔ eu-west1 traffic latency increases
├─ Latency: 80ms → 250ms (rerouting via Pacific)
├─ Packet loss: Minimal (reroute within milliseconds)
├─ Duration: ~24 hours (cable repair ship arrives)

Impact:
├─ SLA violated (80ms → 250ms > SLA)
├─ Application timeouts possible
├─ Users report slowness

Mitigation:
├─ Multi-path routing: Primary via Atlantic, secondary via Pacific
├─ Automatic failover: ECMP detects primary down, uses secondary
├─ Result: Latency increases but within acceptable bounds

Recovery:
├─ Repair crew: Can take 24-48 hours to reach and repair
├─ Temporary: Use satellite backup (expensive, rarely deployed)
└─ Long-term: Install redundant cable

Scenario 2: Software Bug Causes Path Loop (BGP Misconfiguration)

Symptom:
├─ Specific region pairs see 10x latency increase
├─ Example: us-central1 → asia-northeast1 (Tokyo)
├─ Normal: 100ms → Observed: 1000ms+

Root cause:
└─ BGP misconfiguration: routing loop
   └─ Packet: us-central1 → intermediate1 → intermediate2 → back to us-central1
   └─ Loop until TTL expires (~128 hops)

Investigation:
├─ traceroute shows excessive hops (>20)
├─ tcpdump reveals packet circling
├─ BGP logs show duplicate AS in path

Fix:
├─ Roll back BGP configuration
├─ Apply fix: filter out self-referencing routes
├─ Recovery: Within 30 minutes
└─ SLA: Violated during incident

Scenario 3: Congestion During DDoS (Peak Latency)

Event: DDoS attack hitting eu-west1 region

Symptoms:
├─ eu-west1 local latency: Normal
├─ All traffic TO eu-west1: Latency increases
├─ us-central1 → eu-west1: 80ms → 200ms+

Root cause:
└─ DDoS traffic saturating transatlantic cable
   └─ Legitimate traffic delayed behind attack traffic
   └─ No packet loss (traffic shaped, not dropped at GCP edge)

Impact:
├─ Users from Americas accessing eu-west1: Slow
├─ Users from Europe accessing eu-west1: Normal
├─ SLA violated for cross-Atlantic connections

Mitigation:
├─ DDoS scrubbing at PoP: Filters attack traffic early
├─ Traffic Engineering: Reroute via alternative paths
├─ Burst control: Limited to prevent cascade

Resolution:
├─ DDoS mitigation: Reduce attack traffic
├─ Capacity increase: Add more backbone capacity
└─ Recovery: Within hours as attack diminishes

Common Mistakes & Anti-Patterns

Mistake 1: Assuming Same Latency Between All Region Pairs

Wrong thinking:

"All Google regions have similar latency to each other"

Correct understanding:

  • Latency depends on distance and fiber routing
  • us-east1 ↔ us-west1: ~40ms
  • us-central1 ↔ eu-west1: ~80ms
  • us-central1 ↔ asia-northeast1: ~100ms+
  • Must check actual SLA for your regions

Prevention: Always consult Google's latency dashboard. Test actual latency before deploying.

Mistake 2: Not Planning for Latency Tail (p99)

Wrong thinking:

"80ms p50 latency means worst case is 100ms"

Correct understanding:

  • p50: 80ms median
  • p99: 150-200ms (2-3x median)
  • p99.9: 300-500ms (during peaks)
  • Must design for p99, not p50

Prevention: Set timeout values based on p99, not median.

Mistake 3: Ignoring Congestion During Peak

Wrong thinking:

"Latency SLA guarantees consistent latency all day"

Correct understanding:

  • SLA is for normal conditions
  • Peak traffic can cause congestion
  • Latency during Black Friday / New Year: Can spike 2-3x
  • Need to design with burst allowance

Prevention: Load test during peak hours. Plan for 2x normal latency.

GCP-native Implementation Guidance

Measuring Real Latency

bash
# Between VMs in different regions
gcloud compute ssh vm-us-central1 --zone=us-central1-a

# Inside VM: ping target VM in eu-west1
ping vm-eu-west1-external-ip

# Measure p50/p99:
for i in {1..1000}; do ping -c 1 target-ip; done | tee ping-results.txt
# Parse: sort times, calculate percentiles

# Better: Use hping3 for TCP latency
hping3 -S -p 443 target-ip --fast
# Shows TCP SYN-ACK latency (includes application stack)

Multi-Region Setup for Latency Guarantees

bash
# Primary region (low latency to users)
gcloud compute instances create app-us-central1 \
  --zone=us-central1-a \
  --machine-type=e2-medium

# Secondary region (high availability, acceptable latency)
gcloud compute instances create app-eu-west1 \
  --zone=europe-west1-b \
  --machine-type=e2-medium

# Create global load balancer (automatically uses latency-based routing)
gcloud compute backend-services create global-backend \
  --global \
  --protocol=HTTPS \
  --health-checks=tcp-health-check

# Add backends from both regions
gcloud compute backend-services add-backends global-backend \
  --global \
  --instance-group=ig-us-central1 \
  --instance-group-zone=us-central1-a \
  --balancing-mode=RATE \
  --max-rate-per-instance=1000

gcloud compute backend-services add-backends global-backend \
  --global \
  --instance-group=ig-eu-west1 \
  --instance-group-zone=europe-west1-b \
  --balancing-mode=RATE \
  --max-rate-per-instance=1000

# Result: Traffic automatically routed to lowest-latency region

References


Next: Anycast Routing with Global Load Balancer — How traffic automatically reaches nearest region