Skip to content

etcd vs Spanner Backend — GKE State Storage & Consistency Model

Tại Sao Backend Storage Quan Trọng

Kubernetes cluster state — tất cả Pods, Services, ConfigMaps, Secrets — phải stored ở somewhere persistent. Backend storage không chỉ ảnh hưởng đến availability, mà cả latency của API requests, correctness của reconciliation loops, và disaster recovery capability.

Sai lựa chọn backend → propagation delays, inconsistent state, data loss.


etcd — Kubernetes Default Backend

Tổng Quan Kiến Trúc

etcd là distributed key-value store dựa trên Raft consensus algorithm. Mỗi write phải được replicated across majority (quorum) trước khi committed.

etcd Cluster (3 nodes typical)
├─ Node 1 (Leader)
│  └─ Receives write
│     └─ Broadcasts append-entry RPC
├─ Node 2 (Follower)
│  └─ Receives append-entry
│     └─ Persists to disk
│     └─ Acks leader
└─ Node 3 (Follower)
   └─ Similar to Node 2

Once 2/3 nodes acked → Leader commits

Write Consistency Path

API Server:
   Write request

   Authentication/Authorization

   etcd.Put(key, value)

   Raft leader receives write

   Raft log append + broadcast to followers

   Wait for quorum acks

   Commit to state machine (persisted)

   Return success to API Server

   API Server returns response to client

Latency: typically 10-50ms per write (depends network, disk I/O)

Strong Consistency Guarantee

etcd provide strong consistency:

  • Write phải quorum committed trước ack
  • Read từ leader always fresh
  • Read từ follower might be stale (caution needed)
bash
# GKE uses etcd with strong consistency
# All Kubernetes API reads dari etcd guarantee fresh data

Limitations of etcd

LimitationImpactExample
Key size limit~1 MB per keyLarge ConfigMap/Secret rejects
Value size limit~1-2 GB totalCluster size practical ceiling ~100k objects
Write throughput~1000s writes/secHigh churn workload throttle
Network partition handlingUnavailable if minoritySplit-brain prevention
Transaction size~1-10k operationsBatch deletions might fail

etcd Backup & Recovery

GKE automatically manages etcd backups:

bash
# GKE automated backups
gcloud container backups describe <backup-id>

# Restore procedure
gcloud container backups restore <backup-id> \
  --cluster-name=my-cluster

Recovery time: ~30 mins untuk restore snapshot + replay logs


Spanner Backend — Google's Distributed SQL

Tổng Quan Kiến Trúc

Spanner là globally-distributed SQL database với strong consistency (similar etcd) nhưng additional capabilities:

Spanner Cluster (Google-managed)
├─ Regions (3+)
│  └─ Replicas
│     └─ Strong consistency via TrueTime
├─ Automatic replication
├─ Multi-region failover
└─ ACID transactions

Write Consistency Path (Spanner)

API Server:
   Write request

   Spanner transaction begin

   Write to Spanner (using TrueTime-synchronized clocks)

   Spanner replicates across quorum in multiple regions

   Transaction committed

   Spanner returns success

   API Server returns response

Latency: typically 50-200ms (depends region replicas) — slower etcd!

Advantages of Spanner

AdvantageImpactUse Case
Multi-region HAAutomatic failoverRegional failures transparent
SQL queriesAudit, forensicsQuery state directly
Larger scale10M+ objectsVery large clusters
Built-in backupsPoint-in-time recoveryRegulatory requirements
Stronger semanticsACID transactionsComplex state changes

etcd vs Spanner Trade-off Matrix

AspectetcdSpanner
Latency10-50ms50-200ms
Throughput~1000s writes/sec~100s writes/sec (typically)
Scale~100k objects~10M objects
HA regionSingle regionMulti-region built-in
Backup complexityManual snapshotsBuilt-in, automatic
Query capabilityKey-value onlyFull SQL
CostLowerHigher
Operational simplicityMore tools availableManaged by Google

GKE State Storage Choices

Standard Cluster (Default)

GKE Standard clusters default etcd backend dalam 3-zone HA configuration:

GKE Cluster (us-central1)
├─ Control Plane Zone 1
│  └─ etcd replica
├─ Control Plane Zone 2
│  └─ etcd replica
└─ Control Plane Zone 3
   └─ etcd replica

All replicas synchronized, quorum = 2/3

Autopilot Cluster (Optional)

Autopilot clusters bisa pilih antara etcd atau Spanner saat creation:

bash
# etcd backend (default)
gcloud container clusters create my-autopilot \
  --enable-autopilot \
  --zone us-central1-a \
  --database-backend etcd

# Spanner backend (alternative)
gcloud container clusters create my-autopilot \
  --enable-autopilot \
  --zone us-central1-a \
  --database-backend spanner

Note: Sekali memilih backend, tidak bisa diubah tanpa cluster recreate.


Consistency Model Details

Strong Consistency Guarantee (Both Backends)

Baik etcd maupun Spanner menjamin:

  1. Write atomicity: Write either fully succeed atau fully fail
  2. Read freshness: Reads always see latest committed writes
  3. Ordering: Writes ordered correctly
  4. No divergence: No version conflicts

Watch API — Event Streaming

Both backends support watch API untuk streaming changes:

bash
# Watch all Pods changes
kubectl get pods --watch

# Under the hood: API Server watches etcd/Spanner changes

Important: Watch tidak missing events, tapi ada delay:

  • etcd: biasanya <100ms
  • Spanner: biasanya <500ms

API Server Caching Layer

Meskipun backend punya consistency, API Server punya local cache untuk performance:

┌─────────────────────────────┐
│  API Server                 │
│                             │
│ ┌─────────────────────────┐ │
│ │ Local Cache (in-memory) │ │
│ │ (objects API Server     │ │
│ │  recently accessed)     │ │
│ └─────────────────────────┘ │
│   ↓                    ↑     │
│   └────Watch API ──────┘     │
│                             │
│ For writes: always go to    │
│ backend (etcd/Spanner)      │
└─────────────────────────────┘

┌─────────────────────────────┐
│  etcd / Spanner             │
│  (persistent state)         │
└─────────────────────────────┘

Implication: Cached reads might be stale jika watch connection drops.


Scaling Implications

Object Count Scaling

Countetcd ConcernSpanner Concern
100kDefault, OKWorks, probably overkill
500kReasonableBetter fit
1M+ProblematicBetter fit

Write Rate Scaling

Req/SecetcdSpannerMitigation
100OKOK-
500OKOK-
1000+Stress pointBetterClient-side batching
10000+ImpossibleHardShard cluster

Typical workaround untuk high write rate: Multi-cluster setup dengan sharding.


Disaster Recovery

etcd Backup Strategy

GKE automated backups, tapi production pattern:

  1. Enable automated backups:
bash
gcloud container backups describe <backup>
# Shows: full snapshot + incremental backups
  1. Test recovery (critical!):
bash
# Create test cluster dari backup
gcloud container backups restore <backup> \
  --cluster-name=test-restore
  1. RPO/RTO typically:
  • RPO: 1 hour (backup frequency)
  • RTO: 30 minutes (restore time)

Spanner Advantages untuk DR

Spanner offers:

  • Point-in-time recovery: Recover ke specific timestamp
  • Automatic replication: Multi-region backup implicit
  • Built-in redundancy: Data loss casi impossible

Performance Tuning

etcd Performance Tuning (GKE)

Limited tuning surface, tapi dapat monitor:

bash
# Check etcd latency
kubectl get --raw /metrics | grep etcd_disk_backend_commit_duration

# Check etcd object count
kubectl get --raw /metrics | grep etcd_object_counts

Spanner Performance Tuning

Usually managed by Google, tapi bisa monitor:

bash
# Check Spanner latency
gcloud spanner operations list \
  --instance=<instance> \
  --database=<database>

Production Patterns

Pattern 1: Separate Metadata vs Data

Besar objek (ConfigMap dengan 10MB) masuk backend juga:

yaml
# ❌ BAD - Large ConfigMap
apiVersion: v1
kind: ConfigMap
metadata:
  name: large-config
data:
  data.txt: |
    [10 MB of data]  # Now in etcd/Spanner!

# ✅ GOOD - Store reference only
apiVersion: v1
kind: ConfigMap
metadata:
  name: config-ref
data:
  storage-url: gs://bucket/data.txt

Pattern 2: Object Count Management

Clusters dengan 500k+ objects biasanya lebih baik dengan:

  1. Shard into multiple clusters
  2. Archive old objects
  3. Implement cleanup policies
bash
# Monitor object count
kubectl get all -A --no-headers | wc -l

# Archive old completed jobs
kubectl delete jobs -A \
  --field-selector status.successful=1 \
  --older-than 7d

Migration Between Backends

GKE tidak support live migration between etcd ↔ Spanner. Options:

  1. Recreate cluster: Simple, downtime required
  2. Migrate data: Export/import workflow (complex)
  3. Multi-cluster: New cluster parallel, migrate workload

Reference Dokumentasi


Summary

  • etcd: Default Kubernetes backend, good balance latency/scale, <100k objects typical
  • Spanner: Google's distributed SQL, multi-region HA, better untuk large clusters
  • Both guarantee strong consistency, differences di scaling, DR, operational complexity
  • API Server caching layer adds complexity untuk understanding staleness
  • Backup/recovery strategy critical — test regularly
  • Shard clusters khi single cluster mencapai state limits