Skip to content

Cold Potato vs Hot Potato Routing

Vì sao quan trọng trong production

Khi packet của bạn từ origin tới destination, có 2 cách để route:

Cold Potato (GCP default):

  • Keep traffic inside GCP network as long as possible
  • Exit close to destination (egress point near destination)
  • Lower latency, more control, but more expensive (internal backbone usage)

Hot Potato (alternative):

  • Exit GCP network as soon as possible
  • Let public internet route to destination
  • Lower cost (less backbone usage), but higher latency + less control

Hiểu trade-off này giúp bạn tối ưu hóa networking decisions — khi nào use which strategy?

Internal Model: Egress Point Selection

Cold Potato Routing

┌──────────────────┐
│ Source (us-east1)│
│ IP: 203.0.113.1  │
└────────┬─────────┘

         │ (Keep on GCP backbone)

    ┌────▼────────────────────────┐
    │ GCP Private Backbone Network │
    │ (Premium Tier network)       │
    └────┬─────────────────────────┘

         │ (Route toward destination)

    ┌────▼─────────────────────┐
    │ PoP close to destination  │
    │ (e.g., PoP-LA for user in│
    │ Southern California)      │
    └────┬────────────────────┘
         │ (Exit to public internet)

    ┌────▼──────────┐
    │ Destination   │
    │ (user ISP)    │
    └───────────────┘

Characteristics:
├─ Path: Origin → GCP backbone → PoP close to dest → public internet
├─ Latency: Good (backbone faster than internet)
├─ Cost: Higher (backbone usage charged)
├─ Control: High (stays on GCP network most of the way)
└─ Strategy: Prioritize latency, don't worry about egress costs

Hot Potato Routing

┌──────────────────┐
│ Source (us-east1)│
│ IP: 203.0.113.1  │
└────────┬─────────┘

         │ (Quick exit)

    ┌────▼──────────┐
    │ Local PoP      │
    │ (PoP-VA in    │
    │ us-east1)     │
    └────┬──────────┘
         │ (Exit to public internet)

    ┌────▼──────────────────────────────────────┐
    │ Public Internet                           │
    │ (ISP routing, multiple hops, longer path) │
    └────┬───────────────────────────────────────┘

    ┌────▼──────────┐
    │ Destination   │
    │ (user ISP)    │
    └───────────────┘

Characteristics:
├─ Path: Origin → Local PoP → exit immediately → internet routing
├─ Latency: Worse (internet slower than backbone)
├─ Cost: Lower (less backbone usage)
├─ Control: Low (subject to internet routing, ISP behavior)
└─ Strategy: Optimize for cost, accept latency trade-off

Production Architecture Patterns

Content origin: us-central1
User: Tokyo, ISP: NTT (200.0.0.0/24)

Cold Potato:
├─ Stream: us-central1 → GCP backbone → PoP-Tokyo → user
├─ Latency: ~120ms (via backbone optimized path)
├─ Bandwidth usage: High (backbone traversal charged)
├─ Quality: Consistent, predictable

Hot Potato (bad choice):
├─ Stream: us-central1 → PoP-Virginia → internet → Tokyo ISP
├─ Latency: ~200ms (internet routing longer)
├─ Cost: Lower
├─ Quality: Buffering, variable, poor UX

Decision: Cold Potato (latency + UX critical)

Pattern 2: Batch Data Export (Hot Potato Acceptable)

Data origin: us-central1 (Analytics export)
Destination: Customer datacenter in Frankfurt

Cold Potato:
├─ Transfer: 500GB data via backbone → PoP-EU → customer
├─ Time: 30 minutes (500GB ÷ high bandwidth)
├─ Cost: $0.12/GB (Premium backbone) = $60
├─ Total cost: $60 + compute + storage

Hot Potato:
├─ Transfer: 500GB via PoP-Virginia → internet → Frankfurt
├─ Time: 45 minutes (internet slower, more congested)
├─ Cost: $0.04/GB (Standard/internet) = $20
├─ Total cost: $20 + compute + storage

Decision: Hot Potato (cost savings outweigh 15-min latency difference)

Pattern 3: Database Replication (Cold Potato)

Primary: us-central1
Replica: eu-west1 (sync replication, must be fast)

Cold Potato:
├─ Replication: us-central1 → backbone → eu-west1
├─ Latency: ~80ms one-way (backbone SLA)
├─ Throughput: High (backbone dedicated)
├─ Consistency: Strong (fast sync possible)

Hot Potato (unacceptable):
├─ Replication: us-central1 → PoP → internet → eu-west1
├─ Latency: ~150-200ms (internet variable)
├─ Throughput: Limited (ISP congestion possible)
├─ Consistency: At risk (slow sync times out)

Decision: Cold Potato (required for strong consistency)

GCP Egress Point Strategy

GCP implements cold potato by default for Premium Tier:

Per-region egress points:
├─ us-central1: Egress primarily via PoP-US (Ashburn, Virginia)
├─ eu-west1: Egress primarily via PoP-EU (London, Belgium)
├─ asia-southeast1: Egress primarily via PoP-Asia (Singapore)

└─ Routing logic:
   └─ Destination IP: Where should packet exit?
   └─ Look up: Routing table says "exit via PoP closest to dest"
   └─ Result: Cold Potato (take longer backbone path, exit near dest)

Standard Tier uses different strategy (cost-optimized):

Standard Tier egress:
├─ Prefer: Egress from region where traffic originates
├─ Example: Traffic from us-central1 → exits us-central1
├─ Even if destination in eu-west1: Still exit from us-central1
├─ Result: Hot Potato (exit early, public internet takes it to EU)
├─ Why: Saves backbone usage, reduces costs

Real-world Failure Scenarios

Scenario 1: Backbone Congestion (Cold Potato Bottleneck)

Symptom: All Premium Tier traffic from us-central1 to eu-west1 slow
├─ Latency: 80ms → 300ms
├─ Packet loss: <1%

Root cause:
└─ Cold Potato: All traffic routed via backbone
   └─ Backbone capacity: Oversubscribed during peak
   └─ Bottleneck: PoP-EU uplink

Impact:
├─ All data transfers: Slowed
├─ Database replication: Delayed
├─ Users: Perceive slowness

Options:
├─ Upgrade backbone capacity (long-term, expensive)
├─ Switch to Hot Potato temporarily (lower SLA)
├─ Shift traffic to less congested times (if possible)
└─ Use Standard Tier for non-critical data (lower cost)

Scenario 2: ISP Route Flap (Hot Potato Vulnerability)

Symptom: Users from US ISP can't reach data in eu-west1
├─ Latency: Starts at 150ms (ok for internet)
├─ Then timeouts (3-5x)
├─ Pattern: Intermittent (every 5-10 seconds)

Root cause:
└─ Hot Potato: Exiting via US PoP
   └─ ISP routing: BGP flap (route advertised/withdrawn repeatedly)
   └─ Packet loss: Routes unstable
   └─ Hops: 200-300% over expected due to routing churn

Impact:
├─ TCP retransmits: Frequent
├─ Throughput: 50% of normal
├─ Affects only this ISP pair

Resolution:
├─ ISP fixes routing (their responsibility)
├─ Switch to Cold Potato: Better latency stability

Common Mistakes & Anti-Patterns

Mistake 1: Assuming Cold Potato Always Better

Wrong thinking:

"Cold Potato better latency, always use it"

Correct understanding:

  • Cold Potato: Better for latency-critical, high-bandwidth flows
  • Hot Potato: Better for cost optimization on non-critical flows
  • Mix: Use both strategically per use case

Prevention: Analyze each data flow. Document routing strategy.

Mistake 2: Not Monitoring Egress Points

Wrong thinking:

"Traffic routing automatic, no need to monitor"

Correct understanding:

  • Routing changes: Can cause unexpected behavior
  • Egress point shift: Might increase latency
  • Need: Monitor to detect anomalies

Prevention: Set up monitoring on egress latency per region pair.

Mistake 3: Oversizing to Compensate for Hot Potato

Wrong thinking:

"Cold Potato too expensive, use Hot Potato + bigger instances"

Correct understanding:

  • Latency issue: Can't fix with more compute
  • Better: Choose right tier for workload
  • If latency critical: Cold Potato (premium tier) required
  • If cost critical: Hot Potato (standard tier) acceptable

Prevention: Profile before optimizing. Make conscious trade-off decision.

GCP-native Implementation Guidance

Monitoring Egress Points

bash
# VPC Flow Logs capture egress point information
gcloud compute networks subnets update my-subnet \
  --enable-flow-logs \
  --region=us-central1

# Query high-bandwidth flows with egress info
gcloud logging read \
  "resource.type=gce_instance AND jsonPayload.bytes_sent>1000000" \
  --format='table(jsonPayload.dst_ip, jsonPayload.bytes_sent, jsonPayload.src_addr)' \
  --limit=20

# Check network tiers (Premium vs Standard)
gcloud compute addresses list --global --format='table(name, network_tier)'
gcloud compute addresses list --format='table(name, region, network_tier)'

Forcing Specific Tier for Control

bash
# Create Premium tier IP (cold potato)
gcloud compute addresses create premium-ip \
  --global \
  --network-tier=premium

# Create Standard tier IP (hot potato)
gcloud compute addresses create standard-ip \
  --region=us-central1 \
  --network-tier=standard

# Use in forwarding rules based on workload type
gcloud compute forwarding-rules create latency-critical-rule \
  --global \
  --target-https-proxy=my-proxy \
  --address=premium-ip \
  --ports=443

gcloud compute forwarding-rules create cost-optimized-rule \
  --region=us-central1 \
  --target-http-proxy=regional-proxy \
  --address=standard-ip \
  --ports=80

References


Next: Network Service Tiers: Datapath Differences — Deep dive into practical implications