Skip to content

Andromeda: GCP Software-Defined Networking Stack

Vì sao quan trọng trong production

Andromeda là foundation của tất cả mọi thứ networking trong GCP. Mỗi packet gửi từ VM của bạn, mỗi connection tới service khác, mỗi load balancer decision — đều được xử lý bởi Andromeda. Hiểu cách nó hoạt động cho phép bạn:

  • Dự báo network behavior thay vì phải trial-and-error
  • Debug network issues khi không phải là application code là lỗi
  • Optimize network performance thay vì để system run default
  • Design security architecture bằng hiểu rõ packet processing pipeline

Một kỹ sư platform hay cloud architect không cần implement Andromeda, nhưng PHẢI biết cách nó xử lý traffic, vì mọi decision bạn làm — firewall rules, routing, NAT — đều phụ thuộc vào model này.

Internal model: Cách Andromeda xử lý packet

Problem: Tại sao Google cần SDN?

Trước khi có Andromeda (trước ~2012), Google phải manually configure từng network device (switch, router) để implement network policy. Khi bạn có hàng triệu VM, network policy thay đổi liên tục (VMs created/deleted, firewall rules updated, routes changed) — manual configuration không scale.

Giải pháp: Software-Defined Networking — tách control plane (policy logic) từ data plane (packet forwarding). Google viết một "network operating system" (Andromeda) chạy trên top of physical hardware, cho phép programmatic management của toàn bộ network.

Architecture Overview

Andromeda có hai components chính:

┌─────────────────────────────────────────────┐
│         CONTROL PLANE (Central)             │
│  - Policy Management (Firewall, Routes)    │
│  - Service Discovery & Load Balancing      │
│  - Monitoring & Telemetry                  │
└────────────┬────────────────────────────────┘

        gRPC / Protocol Buffers

┌────────────▼────────────────────────────────┐
│   DATA PLANE (Node-Local)                   │
│  - Packet Forwarding                        │
│  - Connection Tracking                      │
│  - NAT/IP Translation                       │
│  - Firewall Enforcement                     │
└─────────────────────────────────────────────┘

Mô hình này cho phép:

  1. Centralized policy: Một nơi định nghĩa "traffic từ A tới B được phép", control plane push policy xuống tất cả nodes
  2. Local enforcement: Mỗi node (VM) enforce policy independently, không phụ thuộc vào central system (high availability)
  3. Stateful connection tracking: Mỗi node track connections, cho phép asymmetric routing (outgoing traffic có thể exit từ point khác incoming traffic)

Data Plane Execution

Khi packet tới VM của bạn, Andromeda data plane xử lý theo thứ tự:

1. INGRESS FIREWALL RULES
   - Layer 4 match (protocol, port)
   - Stateful: Nếu kết nối outbound từ trước → allow inbound response
   - Default: DENY (implied deny rule)

2. ROUTING DECISION
   - Subnet routes (local)
   - Custom routes (static)
   - Dynamic routes (Cloud Router via BGP)

3. NAT (nếu được config)
   - Source NAT (SNAT) cho outgoing
   - Destination NAT (DNAT) cho load balancer

4. FORWARDING
   - Chuyển packet đến egress interface (physical NIC)
   - Tunnel encapsulation (nếu cross-datacenter)

5. EGRESS FIREWALL RULES
   - Layer 4 match
   - Implicit allow (khác ingress)

Critical detail: Firewall rules áp dụng AT THE INSTANCE (node), không ở network edge. Có nghĩa:

  • Khi 2 VM trong cùng subnet communicate qua internal IPs, firewall vẫn được enforce (không bypass firewall nếu same subnet)
  • Monitoring được centralized (connection logs from all nodes)
  • Mỗi VM có full stateful firewall, không cần dedicated firewall appliance

VPC Implementation trong Andromeda

VPC network trong Andromeda là logical overlay trên physical network:

Physical GCP Datacenter
├── Node A (Compute Host)
│   └── VM1 (10.1.0.10/VPC-A)
│   └── VM2 (10.1.0.11/VPC-B)

├── Node B (Compute Host)  
│   └── VM3 (10.1.0.12/VPC-A)

└── Network Switch Fabric
    └── Forwards based on physical MAC/IP

Andromeda decouples logical network từ physical network:

  • Physical network thấy: packets với physical MAC addresses, physical IP addresses (trong GCP backbone)
  • VMs thấy: packets với VPC IP addresses, VPC MAC addresses (generated by Andromeda)

Cách này thực hiện:

1. VM1 sends packet: src=10.1.0.10, dst=10.1.0.12
2. Andromeda data plane intercepts
3. Lookup routing table: 10.1.0.12 is on Node B
4. Encapsulate: Add GCP backbone header (physical MAC/IP)
5. Forward via physical network
6. Node B receives, decapsulates, delivers to VM3

Result: VM không biết nó sống trên shared physical network. Mỗi VPC là completely isolated logical network, ngay cả khi VMs chạy trên same physical host.

Production Architecture Patterns

Pattern 1: Single-Region VPC (Typical for Stateful Apps)

┌─────────────────────────────────────────────┐
│      Region us-central1 (Iowa)              │
│                                             │
│  ┌──────────────────────────────────────┐  │
│  │ VPC Network (10.0.0.0/8)             │  │
│  │                                      │  │
│  │  ┌────────────────┐                 │  │
│  │  │ Subnet 1       │ (10.1.0.0/24)   │  │
│  │  │ Zone us-c1-a   │                 │  │
│  │  │ - VM-1         │ (10.1.0.2)      │  │
│  │  │ - VM-2         │ (10.1.0.3)      │  │
│  │  └────────────────┘                 │  │
│  │                                      │  │
│  │  ┌────────────────┐                 │  │
│  │  │ Subnet 2       │ (10.2.0.0/24)   │  │
│  │  │ Zone us-c1-b   │                 │  │
│  │  │ - VM-3         │ (10.2.0.2)      │  │
│  │  └────────────────┘                 │  │
│  │                                      │  │
│  │ Firewall Rules:                      │  │
│  │  - allow: tcp:80,443 from 0.0.0.0/0 │  │
│  │  - allow: tcp:3306 from 10.1.0.0/24 │  │
│  │  - deny: all (implicit)              │  │
│  │                                      │  │
│  │ Routes:                              │  │
│  │  - 10.0.0.0/8 → local (subnet rts)  │  │
│  │  - 0.0.0.0/0  → internet gateway    │  │
│  └──────────────────────────────────────┘  │
│                                             │
└─────────────────────────────────────────────┘

How Andromeda enforces this:

  • Subnet routes (10.1.0.0/24, 10.2.0.0/24) được programmed vào data plane mỗi node
  • Firewall rules được compiled thành iptables rules (hoặc eBPF, trong Dataplane V2)
  • Each node independently enforce rules — nếu control plane dies, traffic vẫn forward theo cached rules
  • New VMs → control plane sends configuration → data plane starts enforcing

Pattern 2: Multi-VPC with VPC Peering

┌──────────────────┐        ┌──────────────────┐
│   VPC Network A  │        │   VPC Network B  │
│  10.1.0.0/16     │        │  10.2.0.0/16     │
│                  │        │                  │
│  ┌────────────┐  │        │  ┌────────────┐  │
│  │ VM-A1      │  │        │  │ VM-B1      │  │
│  │ 10.1.1.0   │  │        │  │ 10.2.1.0   │  │
│  └────────────┘  │        │  └────────────┘  │
│                  │        │                  │
│  Firewall Rules: │        │  Firewall Rules: │
│  allow: 10.2/16 │        │  allow: 10.1/16 │
└──────────┬───────┘        └────────┬─────────┘
           │                        │
           └────────VPC Peering─────┘
              (subnet routes exchanged)

How Andromeda implements peering:

  • Control plane: "VPC-A and VPC-B are peered" → compute routes
  • Data plane push to all nodes:
    • Nodes in VPC-A: 10.2.0.0/16 → peering next-hop → reach VPC-B
    • Nodes in VPC-B: 10.1.0.0/16 → peering next-hop → reach VPC-A
  • Firewall rules: Each VPC독립적으로 controls ingress/egress
  • Key: No transit traffic (A→B→C not allowed, even if B peered with both A and C)

Pattern 3: Shared VPC (Enterprise Multi-Project)

┌──────────────────────────────────────┐
│      Organization                     │
├──────────────────────────────────────┤
│  HOST PROJECT                         │
│  └─ VPC Network (10.0.0.0/8)         │
│     └─ Subnet (10.1.0.0/24)          │
│        ├─ Firewall Rules              │
│        └─ Routes                      │
└──────────────────────────────────────┘

    VPC Peering (logical, shared)

┌───────┴──────────┬────────────────────┐
│                  │                    │
▼                  ▼                    ▼
SERVICE-PROJECT-1  SERVICE-PROJECT-2  SERVICE-PROJECT-3
└─ Can create VMs in Shared VPC
   └─ Must have IAM role: serviceAccount@svc.id.goog
   └─ Firewall rules from host project apply

Andromeda's role:

  • Host project defines firewall/routes (central policy)
  • Service projects thế VMs vào shared subnets
  • All data plane enforcement happens uniform across projects
  • Billing: tracked per-project (data transfer), but routing is shared

Real-world Scenarios & Trade-offs

Scenario 1: E-commerce Platform (Global, Multi-Tenant)

├─ Frontend Layer (Global)
│  └─ Anycast IP (via Global Load Balancer)
│  └─ Route user→nearest region (using anycast + BGP)

├─ Region: us-east1
│  └─ VPC-us-east
│  └─ App servers (internal IPs)
│  └─ Firewall: allow from GLB only

├─ Region: eu-west1
│  └─ VPC-eu-west
│  └─ App servers (internal IPs)
│  └─ Firewall: allow from GLB only

└─ Database Layer (Regional, but replicated)
   └─ Cloud SQL (managed, not in VPC)
   └─ Private IP endpoint (uses Private Google Access)

Andromeda decisions:

  • GLB terminates SSL at nearest PoP
  • Interior app-server traffic stays within region (no cross-region latency)
  • Database connections use Private Google Access (199.36.153.4/30 magic network)
  • Firewall rules: Whitelist GLB health check IPs from specific regions

Scenario 2: Microservices with Strict Egress Control

namespace: payment
├─ Pod: payment-processor (internal IP: 10.10.0.50)
│  └─ Allowed outbound: only to Stripe API (egress rule: stripe.com:443)
│  └─ Denied: internet, other services

└─ Pod: payment-audit (internal IP: 10.10.0.51)
   └─ Allowed outbound: only to Cloud Logging, BigQuery
   └─ Denied: everything else

Andromeda Data Plane:

  • Each pod network interface has stateful firewall
  • Connection-tracking: egress allows response, ingress blocks unsolicited
  • Egress rule for "stripe.com:443" → DNS resolved to IP → firewall matches
  • Denies unexpected traffic immediately at pod network interface

Scenario 3: Hybrid Network (On-Prem + Cloud)

┌─────────────┐  Cloud Interconnect   ┌──────────────┐
│On-Prem DC   │━━━━━━━━━━━━━━━━━━━━━━━│ GCP VPC      │
│172.16.0/12  │    (Dedicated VLAN)   │10.0.0.0/8    │
└─────────────┘                       └──────────────┘
      │                                     │
      └─ On-Prem: 172.16.0.0/12             │
      └─ GCP: 10.0.0.0/8                   │
      └─ Cloud Router: BGP advertisement   │
      └─ Firewall: allow both CIDR blocks  │

Andromeda's role:

  • Cloud Router (GCP side) receives routes via BGP from on-prem
  • Routes: "172.16.0.0/12 via Cloud Interconnect attachment"
  • Data plane: routes matching 172.16.0.0/12 sent to tunnel interface
  • Physical network (Jupiter) handles encapsulation/tunneling
  • Firewall: enforce rules for hybrid traffic same as intra-VPC

Common Mistakes & Anti-Patterns

Mistake 1: Assuming Same-Subnet = Auto-Allow

Wrong thinking:

"VMs in same subnet are automatically reachable - firewall rules don't matter"

Correct understanding:

  • Firewall rules ALWAYS apply, even within same subnet
  • Default: implied deny for ingress (security by default)
  • Must explicitly allow each traffic direction
  • Andromeda enforces at packet level, regardless of subnet

Impact: Dev deploys application, tries curl vm-b from vm-a, fails. "Same subnet should work!" — No, firewall rule needed.

Prevention: Always review firewall rules even for same-subnet communication. Use gcloud compute firewall-rules list to verify.

Mistake 2: Not Understanding Stateful Connection Tracking

Wrong thinking:

"I need separate egress firewall rules for inbound responses"

Correct understanding:

  • Firewall is stateful: if you allow egress TCP:443 → response TCP:443 automatically allowed back
  • Connection state tracked in kernel (ct

racking table)

  • ESTABLISHED/RELATED connections bypass firewall rules

Impact: Unnecessary firewall rules bloat, performance confusion

Prevention: Understand connection states (NEW, ESTABLISHED, RELATED). In GCP firewall console, see "Session affinity" and connection matching.

Mistake 3: Forgetting Egress Internet Gateway

Wrong thinking:

"VM has external IP, can reach internet automatically"

Correct understanding:

  • External IP alone isn't enough
  • Need: egress firewall rule + default route (0.0.0.0/0 → internet gateway)
  • Default VPC provides these, custom VPC might not

Impact: VM has external IP but can't reach internet. Debugging frustration.

Prevention: When creating custom VPC, explicitly create internet gateway route. Check: gcloud compute routes list.

Mistake 4: Misunderstanding VPC Isolation

Wrong thinking:

"VPC-A and VPC-B are completely isolated - no data leakage possible"

Correct understanding:

  • VPC-to-VPC isolation depends on peering/sharing configuration
  • Firewall rules provide defense-in-depth
  • Shared VPC grants cross-project access (intentional architectural decision)

Impact: Security design flaw. Dev enables Shared VPC without understanding implications.

Prevention: Understand IAM boundaries + VPC boundaries. Shared VPC should have explicit network admin role separation.

GCP-native Implementation Guidance

Creating VPC with Andromeda Control

bash
# Create custom VPC (Andromeda will manage)
gcloud compute networks create my-vpc \
  --subnet-mode=custom \
  --bgp-routing-mode=regional

# Create subnet (Andromeda allocates in data plane)
gcloud compute networks subnets create my-subnet \
  --network=my-vpc \
  --region=us-central1 \
  --range=10.1.0.0/24 \
  --enable-flow-logs \
  --logging-aggregation-interval=interval-5-sec

# Firewall rule (Andromeda compiles into data plane rules)
gcloud compute firewall-rules create allow-http \
  --network=my-vpc \
  --direction=INGRESS \
  --priority=1000 \
  --source-ranges=0.0.0.0/0 \
  --target-tags=http-server \
  --allow=tcp:80,tcp:443

# Create VM (Andromeda assigns VPC IP, configures firewall)
gcloud compute instances create my-vm \
  --zone=us-central1-a \
  --network-interface=network=my-vpc,subnet=my-subnet \
  --tags=http-server

What Andromeda does behind scenes:

  1. Control plane receives "create subnet" → calculates CIDR allocation
  2. Push to all nodes in region: "10.1.0.0/24 is local to us-central1-a"
  3. Control plane receives "create firewall rule" → compile to iptables/eBPF rules
  4. Push to target nodes: "VM with tag http-server allows 0.0.0.0/0:tcp:80"
  5. VM starts → assigned 10.1.0.2 → Andromeda data plane configures veth pair → VM can send traffic

Verifying Andromeda Configuration

bash
# Check VPC network properties
gcloud compute networks describe my-vpc

# List firewall rules (control plane state)
gcloud compute firewall-rules list --filter="network=my-vpc"

# Check VM network interface (data plane state)
gcloud compute instances describe my-vm --zone=us-central1-a \
  --format='value(networkInterfaces[0])'

# Monitor VPC Flow Logs (capture from data plane)
gcloud logging read "resource.type=gce_instance AND jsonPayload.src_addr=10.1.0.0/24" \
  --limit=10

# Trace packet path using Network Connectivity Tests
gcloud compute networks vpc-peering routes list --vpc-network=my-vpc

References


Next: Jupiter Fabric: Spine-Leaf Topology & Oversubscription — Hiểu infrastructure vật lý mà Andromeda chạy on top of