Skip to content

GCP Edge Network & Point of Presence (PoP)

Vì sao quan trọng trong production

Internet user ở Hà Nội kết nối tới application của bạn ở GCP. Câu hỏi: Cách nào traffic của user vào GCP datacenter?

Câu trả lời: Thông qua Point of Presence (PoP) — các edge nodes Google duy trì ở khắp thế giới để:

  1. Terminate SSL/TLS gần user (latency savings)
  2. Accept incoming traffic từ user ISPs
  3. Route traffic vào core GCP network
  4. Cache responses (cho Cloud CDN customers)

Hiểu PoP strategy giúp bạn:

  • Dự báo latency từ specific geography
  • Tối ưu global load balancing
  • Debug "user từ region X slow" issues
  • Thiết kế disaster recovery (alternative PoPs)

Internal Model: PoP Architecture

PoP Hierarchy

┌──────────────────────────────────────┐
│         Internet Backbone            │
│    (ISPs, Transit Providers)         │
└────────────────────────────────────┬─┘

         ┌──────────┴──────────┐
         │                    │
    ┌────▼─────┐         ┌────▼─────┐
    │ PoP      │         │ PoP       │
    │ (US-East)│         │ (EU-West) │
    └────┬─────┘         └────┬─────┘
         │                    │
         └────────┬───────────┘

         ┌────────▼─────────┐
         │ GCP Global       │
         │ Backbone Network │
         │ (Private Fiber)  │
         └────────┬─────────┘

    ┌─────────────┼─────────────┐
    │             │             │
┌───▼───┐    ┌───▼───┐    ┌───▼───┐
│ Region│    │ Region│    │ Region│
│US-C1  │    │EU-W1  │    │AP-SE1 │
└───────┘    └───────┘    └───────┘

PoP as Traffic Aggregation Point

Single PoP might serve traffic from multiple ISPs/transit providers:

PoP: Ashburn (Northern Virginia)
├─ Upstream ISP #1: Verizon, Level 3
├─ Upstream ISP #2: AT&T, CenturyLink
├─ Upstream ISP #3: Cogent, Hurricane Electric

└─ Peering: Direct connections with major content providers

Why multiple upstreams?

  • Redundancy: If one ISP fails, traffic still reaches PoP
  • Capacity: Each ISP can contribute bandwidth
  • Cost: Direct peering with major providers = preferred routes
  • Performance: Diverse paths to internet ensure low latency

Traffic Path: Internet User → PoP → GCP

1. User Browser (Hong Kong, 223.26.0.0/16 network)
   └─ DNS query: "Where is myapp.com?"
   └─ Response: "Anycast IP 35.201.123.45" (Google's public IP)

2. User → Nearest PoP (Anycast routing via BGP)
   ├─ ISP routing table (learned via BGP):
   │  └─ 35.201.123.45/32 → via PoP-Hong-Kong (most specific)
   │  └─ 35.0.0.0/8 → via PoP-Singapore (less specific)
   │  └─ 0.0.0.0/0 → via default upstream
   ├─ User packet reaches PoP-Hong-Kong (shortest AS path)

3. PoP receives incoming traffic
   ├─ Global Front End (GFE) terminates SSL/TLS
   ├─ Decrypt HTTP/2 stream
   └─ Forward to application via GCP backbone

4. GCP Backbone carries traffic to region
   └─ Fiber optic links from PoP to datacenter region

5. Datacenter (e.g., us-central1)
   ├─ Entry router receives
   ├─ Route to application (VM/LB/App Engine)
   └─ Return response via same path (or different egress PoP)

Key insight: Traffic from user to PoP uses public internet (ISP networks), but traffic from PoP to datacenter uses Google's private fiber.

PoP Distribution Strategy

Google deploys PoPs not just at ISP hubs, but strategically:

Major PoPs (Tier 1):
├─ Ashburn, USA (major US hub)
├─ London, UK (major EU hub)
├─ Singapore (major Asia hub)
├─ Tokyo, Japan (major Asia hub)
├─ Sydney, Australia

Regional PoPs (Tier 2):
├─ Bangkok, Manila, Jakarta (SE Asia coverage)
├─ Dubai (Middle East, Africa)
├─ Johannesburg (Africa)
├─ São Paulo (South America)

Edge PoPs (Tier 3):
├─ Deployed via partnerships with ISPs/CDNs
├─ Shared infrastructure with Cloud CDN nodes
├─ Prioritize geographic coverage over capacity

Strategy: User's traffic should reach PoP in <10ms, then backbone carries to destination region.

Production Architecture Patterns

Pattern 1: Global Anycast Load Balancing

Application deployment:
├─ Deployment A: us-central1 (app + LB)
├─ Deployment B: eu-west1 (app + LB)
├─ Deployment C: asia-southeast1 (app + LB)

└─ Global anycast IP: 35.201.123.45 (announced from all 3 regions)

User traffic routing:
├─ User in USA → BGP sees 35.201.123.45 from us-central1 (lowest AS distance)
│  └─ PoP-Ashburn → GCP backbone → Datacenter US-C1

├─ User in Europe → BGP sees 35.201.123.45 from eu-west1
│  └─ PoP-London → GCP backbone → Datacenter EU-W1

└─ User in Asia → BGP sees 35.201.123.45 from asia-southeast1
   └─ PoP-Singapore → GCP backbone → Datacenter AP-SE1

Result: Traffic automatically geo-routed to nearest Google presence

Pattern 2: Premium vs Standard Tier PoP Strategy

Premium Tier (High Performance):
├─ Inbound: Traffic enters at PoP closest to user
├─ PoPs: Deployed in many locations (200+)
├─ Routing: Optimal, shortest path to destination
└─ Example: User in Bangkok
   └─ Might enter at PoP-Bangkok or PoP-Singapore (closest)

Standard Tier (Cost Optimized):
├─ Inbound: Traffic enters at PoP closest to destination region
├─ PoPs: Fewer, only near major regions
├─ Routing: Direct to region where backend located
└─ Example: User in Bangkok accessing us-central1
   └─ Traffic enters PoP-Ashburn (near destination)
   └─ Takes public internet from Bangkok to Ashburn
   └─ Then private GCP backbone to us-central1
   
Result: Standard Tier has higher latency but cheaper (internet path cheaper than backbone)

Pattern 3: Failover via Alternative PoP

Normal: User → PoP-A → Datacenter-1
┌─────────────────────────┐
│ User (ISP network)      │
└──────────┬──────────────┘

    ┌──────▼──────┐
    │ PoP-A       │
    │ (primary)   │
    └──────┬──────┘

    ┌──────▼──────────────┐
    │ GCP Backbone        │
    └──────┬──────────────┘

    ┌──────▼──────────┐
    │ Datacenter-1    │
    │ (handling)      │
    └─────────────────┘

Failure: PoP-A down or saturated
┌─────────────────────────┐
│ User (ISP network)      │
└──────────┬──────────────┘
           │ (BGP reconvergence)
    ┌──────▼──────┐
    │ PoP-B       │
    │ (backup)    │
    └──────┬──────┘

    ┌──────▼──────────────┐
    │ GCP Backbone        │
    └──────┬──────────────┘

    ┌──────▼──────────┐
    │ Datacenter-1    │
    │ (handling)      │
    └─────────────────┘

Recovery: Automatic via BGP failover (seconds)

Pattern 4: DDoS Mitigation at PoP

Attack: 100Gbps traffic targeting application
┌──────────────────────────┐
│ Attacker (Bot network)   │
└──────────┬───────────────┘
           │ (100Gbps attack traffic)
    ┌──────▼──────┐
    │ PoP-X       │
    │ Scrubbing   │
    │ (DDoS Mitigation)
    └──────┬──────┘
           │ (After filtering: 5Gbps legitimate traffic)
    ┌──────▼──────────────┐
    │ GCP Backbone        │
    │ (protected)         │
    └──────┬──────────────┘

    ┌──────▼──────────┐
    │ Datacenter      │
    │ (unaffected)    │
    └─────────────────┘

Result: Google's DDoS protection at PoP prevents backend saturation

Real-world Failure Scenarios

Scenario 1: PoP Packet Loss (Upstream ISP Congestion)

Symptoms:
├─ Users from specific region report packetloss (3-5%)
├─ Latency: Normal
├─ Regional pattern: Only from South America region

Root cause:
└─ PoP-São Paulo seeing ISP congestion
   └─ During peak hours, ISP network drops 1% of packets
   └─ Some user retransmit → appears as lag
   └─ But backbone is fine

Investigation:
├─ Check CDN/LB logs: See geographic pattern
├─ High RTT variance in PoP-São Paulo
├─ Work with Google Network team to increase PoP capacity
│  (add more fiber to São Paulo PoP, add another PoP in South America)

Prevention:
└─ Multi-PoP strategy: Route traffic to backup PoP if primary congested

Scenario 2: PoP Fiber Cut (Backbone Connectivity Loss)

Symptoms:
├─ Users from entire Asia region: Error (connection timeout)
├─ Latency: Very high or connection refused
├─ All regions affected simultaneously

Root cause:
└─ Fiber cut between PoP-Singapore and Datacenter
   └─ Primary path broken
   └─ Failover to secondary path (via US backbone)
   └─ Latency: 200ms → 400ms (too slow for sync workloads)

Investigation:
├─ Network team: "Fiber cut detected at UTC+8 timestamp"
├─ Check backup paths active
├─ Query which users affected (all from Southeast Asia)

Recovery:
├─ Emergency: Traffic to US backend, high latency but available
├─ Repair: Fiber rerouted (can take hours)
├─ Deployment: Add local failover (app replica in Asia)

Prevention:
└─ Diverse backbone paths between PoPs and datacenters
   └─ Dual/Triple fiber redundancy
   └─ Multi-region deployment (don't rely on single path)

Common Mistakes & Anti-Patterns

Mistake 1: Assuming Traffic Always Takes Shortest Path

Wrong thinking:

"User in Bangkok accessing US region: Traffic via closest PoP (Singapore)"

Correct understanding:

  • Traffic routing depends on BGP advertisements
  • Premium Tier: Optimized path (likely Singapore)
  • Standard Tier: Path might be via ISP routing (could be US PoP first)
  • BGP is dynamic: Can change based on upstream peering agreements

Prevention: Test actual paths using traceroute from different geographies. Monitor latency trends.

Mistake 2: Not Planning for PoP Capacity

Wrong thinking:

"PoP has unlimited capacity for inbound traffic"

Correct understanding:

  • Each PoP has finite uplink capacity (typically 100-400Gbps)
  • Peak traffic can exhaust capacity
  • Overflow requires secondary PoP failover
  • Capacity planning: estimate peak region-pair traffic

Prevention: Contact Google for PoP capacity info. Plan for reasonable limits.

Mistake 3: Ignoring PoP Latency in SLA

Wrong thinking:

"GCP latency SLA only matters inside datacenters, not PoP-to-PoP"

Correct understanding:

  • End-to-end latency includes: user→PoP + PoP→datacenter
  • PoP latency: 10-50ms depending on distance
  • Datacenter latency: <1ms
  • Total: Often PoP latency dominates total latency

Prevention: Break down latency: measure user→PoP separately using RUM (Real User Monitoring).

GCP-native Implementation Guidance

Verifying PoP Ingress

bash
# Global HTTP(S) Load Balancer automatically uses anycast to PoPs
gcloud compute backend-services create my-backend \
  --global \
  --protocol=HTTPS \
  --health-checks=http-health-check

gcloud compute url-maps create my-url-map \
  --default-service=my-backend

gcloud compute target-https-proxies create my-https-proxy \
  --url-map=my-url-map \
  --ssl-certificates=my-cert

gcloud compute forwarding-rules create my-forwarding-rule \
  --global \
  --target-https-proxy=my-https-proxy \
  --address=my-global-static-ip

# Result: Global anycast IP announced from all regions
# Users get routed to nearest PoP automatically

Monitoring PoP Performance

bash
# Use Network Latency Report to see PoP-level latency
# Available at: Google Cloud Console → Network → Network Latency Report

# Or query via API:
gcloud compute network-peering-routes list --network=my-vpc

# Check specific PoP latency using traceroute from VM in specific zone:
# SSH to VM in asia-southeast1:
gcloud compute ssh vm-in-asia --zone=asia-southeast1-a

# Inside VM:
traceroute -m 30 my-backend-ip
# Look for hops going through private GCP network

Implementing Premium vs Standard Tier

bash
# Create Regional IP (can be Standard Tier)
gcloud compute addresses create regional-ip \
  --region=us-central1 \
  --network-tier=standard

# Create Global IP (must be Premium Tier)
gcloud compute addresses create global-ip \
  --global \
  --network-tier=premium

# Standard Tier limits to regional load balancing
gcloud compute forwarding-rules create standard-forwarding-rule \
  --region=us-central1 \
  --address=regional-ip \
  --target-http-proxy=regional-proxy

# Premium Tier enables global load balancing
gcloud compute forwarding-rules create premium-forwarding-rule \
  --global \
  --address=global-ip \
  --target-https-proxy=global-https-proxy

References


Next: GCP Global Backbone: Premium vs Standard Tier — After traffic enters PoP, how does it reach your datacenter?