Cold Potato vs Hot Potato Routing
Vì sao quan trọng trong production
Khi packet của bạn từ origin tới destination, có 2 cách để route:
Cold Potato (GCP default):
- Keep traffic inside GCP network as long as possible
- Exit close to destination (egress point near destination)
- Lower latency, more control, but more expensive (internal backbone usage)
Hot Potato (alternative):
- Exit GCP network as soon as possible
- Let public internet route to destination
- Lower cost (less backbone usage), but higher latency + less control
Hiểu trade-off này giúp bạn tối ưu hóa networking decisions — khi nào use which strategy?
Internal Model: Egress Point Selection
Cold Potato Routing
┌──────────────────┐
│ Source (us-east1)│
│ IP: 203.0.113.1 │
└────────┬─────────┘
│
│ (Keep on GCP backbone)
│
┌────▼────────────────────────┐
│ GCP Private Backbone Network │
│ (Premium Tier network) │
└────┬─────────────────────────┘
│
│ (Route toward destination)
│
┌────▼─────────────────────┐
│ PoP close to destination │
│ (e.g., PoP-LA for user in│
│ Southern California) │
└────┬────────────────────┘
│ (Exit to public internet)
│
┌────▼──────────┐
│ Destination │
│ (user ISP) │
└───────────────┘
Characteristics:
├─ Path: Origin → GCP backbone → PoP close to dest → public internet
├─ Latency: Good (backbone faster than internet)
├─ Cost: Higher (backbone usage charged)
├─ Control: High (stays on GCP network most of the way)
└─ Strategy: Prioritize latency, don't worry about egress costsHot Potato Routing
┌──────────────────┐
│ Source (us-east1)│
│ IP: 203.0.113.1 │
└────────┬─────────┘
│
│ (Quick exit)
│
┌────▼──────────┐
│ Local PoP │
│ (PoP-VA in │
│ us-east1) │
└────┬──────────┘
│ (Exit to public internet)
│
┌────▼──────────────────────────────────────┐
│ Public Internet │
│ (ISP routing, multiple hops, longer path) │
└────┬───────────────────────────────────────┘
│
┌────▼──────────┐
│ Destination │
│ (user ISP) │
└───────────────┘
Characteristics:
├─ Path: Origin → Local PoP → exit immediately → internet routing
├─ Latency: Worse (internet slower than backbone)
├─ Cost: Lower (less backbone usage)
├─ Control: Low (subject to internet routing, ISP behavior)
└─ Strategy: Optimize for cost, accept latency trade-offProduction Architecture Patterns
Pattern 1: Video Streaming (Cold Potato Recommended)
Content origin: us-central1
User: Tokyo, ISP: NTT (200.0.0.0/24)
Cold Potato:
├─ Stream: us-central1 → GCP backbone → PoP-Tokyo → user
├─ Latency: ~120ms (via backbone optimized path)
├─ Bandwidth usage: High (backbone traversal charged)
├─ Quality: Consistent, predictable
Hot Potato (bad choice):
├─ Stream: us-central1 → PoP-Virginia → internet → Tokyo ISP
├─ Latency: ~200ms (internet routing longer)
├─ Cost: Lower
├─ Quality: Buffering, variable, poor UX
Decision: Cold Potato (latency + UX critical)Pattern 2: Batch Data Export (Hot Potato Acceptable)
Data origin: us-central1 (Analytics export)
Destination: Customer datacenter in Frankfurt
Cold Potato:
├─ Transfer: 500GB data via backbone → PoP-EU → customer
├─ Time: 30 minutes (500GB ÷ high bandwidth)
├─ Cost: $0.12/GB (Premium backbone) = $60
├─ Total cost: $60 + compute + storage
Hot Potato:
├─ Transfer: 500GB via PoP-Virginia → internet → Frankfurt
├─ Time: 45 minutes (internet slower, more congested)
├─ Cost: $0.04/GB (Standard/internet) = $20
├─ Total cost: $20 + compute + storage
Decision: Hot Potato (cost savings outweigh 15-min latency difference)Pattern 3: Database Replication (Cold Potato)
Primary: us-central1
Replica: eu-west1 (sync replication, must be fast)
Cold Potato:
├─ Replication: us-central1 → backbone → eu-west1
├─ Latency: ~80ms one-way (backbone SLA)
├─ Throughput: High (backbone dedicated)
├─ Consistency: Strong (fast sync possible)
Hot Potato (unacceptable):
├─ Replication: us-central1 → PoP → internet → eu-west1
├─ Latency: ~150-200ms (internet variable)
├─ Throughput: Limited (ISP congestion possible)
├─ Consistency: At risk (slow sync times out)
Decision: Cold Potato (required for strong consistency)GCP Egress Point Strategy
GCP implements cold potato by default for Premium Tier:
Per-region egress points:
├─ us-central1: Egress primarily via PoP-US (Ashburn, Virginia)
├─ eu-west1: Egress primarily via PoP-EU (London, Belgium)
├─ asia-southeast1: Egress primarily via PoP-Asia (Singapore)
│
└─ Routing logic:
└─ Destination IP: Where should packet exit?
└─ Look up: Routing table says "exit via PoP closest to dest"
└─ Result: Cold Potato (take longer backbone path, exit near dest)Standard Tier uses different strategy (cost-optimized):
Standard Tier egress:
├─ Prefer: Egress from region where traffic originates
├─ Example: Traffic from us-central1 → exits us-central1
├─ Even if destination in eu-west1: Still exit from us-central1
├─ Result: Hot Potato (exit early, public internet takes it to EU)
├─ Why: Saves backbone usage, reduces costsReal-world Failure Scenarios
Scenario 1: Backbone Congestion (Cold Potato Bottleneck)
Symptom: All Premium Tier traffic from us-central1 to eu-west1 slow
├─ Latency: 80ms → 300ms
├─ Packet loss: <1%
Root cause:
└─ Cold Potato: All traffic routed via backbone
└─ Backbone capacity: Oversubscribed during peak
└─ Bottleneck: PoP-EU uplink
Impact:
├─ All data transfers: Slowed
├─ Database replication: Delayed
├─ Users: Perceive slowness
Options:
├─ Upgrade backbone capacity (long-term, expensive)
├─ Switch to Hot Potato temporarily (lower SLA)
├─ Shift traffic to less congested times (if possible)
└─ Use Standard Tier for non-critical data (lower cost)Scenario 2: ISP Route Flap (Hot Potato Vulnerability)
Symptom: Users from US ISP can't reach data in eu-west1
├─ Latency: Starts at 150ms (ok for internet)
├─ Then timeouts (3-5x)
├─ Pattern: Intermittent (every 5-10 seconds)
Root cause:
└─ Hot Potato: Exiting via US PoP
└─ ISP routing: BGP flap (route advertised/withdrawn repeatedly)
└─ Packet loss: Routes unstable
└─ Hops: 200-300% over expected due to routing churn
Impact:
├─ TCP retransmits: Frequent
├─ Throughput: 50% of normal
├─ Affects only this ISP pair
Resolution:
├─ ISP fixes routing (their responsibility)
├─ Switch to Cold Potato: Better latency stabilityCommon Mistakes & Anti-Patterns
Mistake 1: Assuming Cold Potato Always Better
❌ Wrong thinking:
"Cold Potato better latency, always use it"✅ Correct understanding:
- Cold Potato: Better for latency-critical, high-bandwidth flows
- Hot Potato: Better for cost optimization on non-critical flows
- Mix: Use both strategically per use case
Prevention: Analyze each data flow. Document routing strategy.
Mistake 2: Not Monitoring Egress Points
❌ Wrong thinking:
"Traffic routing automatic, no need to monitor"✅ Correct understanding:
- Routing changes: Can cause unexpected behavior
- Egress point shift: Might increase latency
- Need: Monitor to detect anomalies
Prevention: Set up monitoring on egress latency per region pair.
Mistake 3: Oversizing to Compensate for Hot Potato
❌ Wrong thinking:
"Cold Potato too expensive, use Hot Potato + bigger instances"✅ Correct understanding:
- Latency issue: Can't fix with more compute
- Better: Choose right tier for workload
- If latency critical: Cold Potato (premium tier) required
- If cost critical: Hot Potato (standard tier) acceptable
Prevention: Profile before optimizing. Make conscious trade-off decision.
GCP-native Implementation Guidance
Monitoring Egress Points
bash
# VPC Flow Logs capture egress point information
gcloud compute networks subnets update my-subnet \
--enable-flow-logs \
--region=us-central1
# Query high-bandwidth flows with egress info
gcloud logging read \
"resource.type=gce_instance AND jsonPayload.bytes_sent>1000000" \
--format='table(jsonPayload.dst_ip, jsonPayload.bytes_sent, jsonPayload.src_addr)' \
--limit=20
# Check network tiers (Premium vs Standard)
gcloud compute addresses list --global --format='table(name, network_tier)'
gcloud compute addresses list --format='table(name, region, network_tier)'Forcing Specific Tier for Control
bash
# Create Premium tier IP (cold potato)
gcloud compute addresses create premium-ip \
--global \
--network-tier=premium
# Create Standard tier IP (hot potato)
gcloud compute addresses create standard-ip \
--region=us-central1 \
--network-tier=standard
# Use in forwarding rules based on workload type
gcloud compute forwarding-rules create latency-critical-rule \
--global \
--target-https-proxy=my-proxy \
--address=premium-ip \
--ports=443
gcloud compute forwarding-rules create cost-optimized-rule \
--region=us-central1 \
--target-http-proxy=regional-proxy \
--address=standard-ip \
--ports=80References
- Network Service Tiers: Cold Potato vs Hot Potato — Official routing strategies
- VPC Flow Logs for Monitoring — Observe egress patterns
- Optimizing Egress Costs — Cost optimization techniques
- Network Latency Dashboard — Monitor real routing latency
Next: Network Service Tiers: Datapath Differences — Deep dive into practical implications