etcd vs Spanner Backend — GKE State Storage & Consistency Model
Tại Sao Backend Storage Quan Trọng
Kubernetes cluster state — tất cả Pods, Services, ConfigMaps, Secrets — phải stored ở somewhere persistent. Backend storage không chỉ ảnh hưởng đến availability, mà cả latency của API requests, correctness của reconciliation loops, và disaster recovery capability.
Sai lựa chọn backend → propagation delays, inconsistent state, data loss.
etcd — Kubernetes Default Backend
Tổng Quan Kiến Trúc
etcd là distributed key-value store dựa trên Raft consensus algorithm. Mỗi write phải được replicated across majority (quorum) trước khi committed.
etcd Cluster (3 nodes typical)
├─ Node 1 (Leader)
│ └─ Receives write
│ └─ Broadcasts append-entry RPC
├─ Node 2 (Follower)
│ └─ Receives append-entry
│ └─ Persists to disk
│ └─ Acks leader
└─ Node 3 (Follower)
└─ Similar to Node 2
Once 2/3 nodes acked → Leader commitsWrite Consistency Path
API Server:
Write request
↓
Authentication/Authorization
↓
etcd.Put(key, value)
↓
Raft leader receives write
↓
Raft log append + broadcast to followers
↓
Wait for quorum acks
↓
Commit to state machine (persisted)
↓
Return success to API Server
↓
API Server returns response to clientLatency: typically 10-50ms per write (depends network, disk I/O)
Strong Consistency Guarantee
etcd provide strong consistency:
- Write phải quorum committed trước ack
- Read từ leader always fresh
- Read từ follower might be stale (caution needed)
# GKE uses etcd with strong consistency
# All Kubernetes API reads dari etcd guarantee fresh dataLimitations of etcd
| Limitation | Impact | Example |
|---|---|---|
| Key size limit | ~1 MB per key | Large ConfigMap/Secret rejects |
| Value size limit | ~1-2 GB total | Cluster size practical ceiling ~100k objects |
| Write throughput | ~1000s writes/sec | High churn workload throttle |
| Network partition handling | Unavailable if minority | Split-brain prevention |
| Transaction size | ~1-10k operations | Batch deletions might fail |
etcd Backup & Recovery
GKE automatically manages etcd backups:
# GKE automated backups
gcloud container backups describe <backup-id>
# Restore procedure
gcloud container backups restore <backup-id> \
--cluster-name=my-clusterRecovery time: ~30 mins untuk restore snapshot + replay logs
Spanner Backend — Google's Distributed SQL
Tổng Quan Kiến Trúc
Spanner là globally-distributed SQL database với strong consistency (similar etcd) nhưng additional capabilities:
Spanner Cluster (Google-managed)
├─ Regions (3+)
│ └─ Replicas
│ └─ Strong consistency via TrueTime
├─ Automatic replication
├─ Multi-region failover
└─ ACID transactionsWrite Consistency Path (Spanner)
API Server:
Write request
↓
Spanner transaction begin
↓
Write to Spanner (using TrueTime-synchronized clocks)
↓
Spanner replicates across quorum in multiple regions
↓
Transaction committed
↓
Spanner returns success
↓
API Server returns responseLatency: typically 50-200ms (depends region replicas) — slower etcd!
Advantages of Spanner
| Advantage | Impact | Use Case |
|---|---|---|
| Multi-region HA | Automatic failover | Regional failures transparent |
| SQL queries | Audit, forensics | Query state directly |
| Larger scale | 10M+ objects | Very large clusters |
| Built-in backups | Point-in-time recovery | Regulatory requirements |
| Stronger semantics | ACID transactions | Complex state changes |
etcd vs Spanner Trade-off Matrix
| Aspect | etcd | Spanner |
|---|---|---|
| Latency | 10-50ms | 50-200ms |
| Throughput | ~1000s writes/sec | ~100s writes/sec (typically) |
| Scale | ~100k objects | ~10M objects |
| HA region | Single region | Multi-region built-in |
| Backup complexity | Manual snapshots | Built-in, automatic |
| Query capability | Key-value only | Full SQL |
| Cost | Lower | Higher |
| Operational simplicity | More tools available | Managed by Google |
GKE State Storage Choices
Standard Cluster (Default)
GKE Standard clusters default etcd backend dalam 3-zone HA configuration:
GKE Cluster (us-central1)
├─ Control Plane Zone 1
│ └─ etcd replica
├─ Control Plane Zone 2
│ └─ etcd replica
└─ Control Plane Zone 3
└─ etcd replica
All replicas synchronized, quorum = 2/3Autopilot Cluster (Optional)
Autopilot clusters bisa pilih antara etcd atau Spanner saat creation:
# etcd backend (default)
gcloud container clusters create my-autopilot \
--enable-autopilot \
--zone us-central1-a \
--database-backend etcd
# Spanner backend (alternative)
gcloud container clusters create my-autopilot \
--enable-autopilot \
--zone us-central1-a \
--database-backend spannerNote: Sekali memilih backend, tidak bisa diubah tanpa cluster recreate.
Consistency Model Details
Strong Consistency Guarantee (Both Backends)
Baik etcd maupun Spanner menjamin:
- Write atomicity: Write either fully succeed atau fully fail
- Read freshness: Reads always see latest committed writes
- Ordering: Writes ordered correctly
- No divergence: No version conflicts
Watch API — Event Streaming
Both backends support watch API untuk streaming changes:
# Watch all Pods changes
kubectl get pods --watch
# Under the hood: API Server watches etcd/Spanner changesImportant: Watch tidak missing events, tapi ada delay:
- etcd: biasanya <100ms
- Spanner: biasanya <500ms
API Server Caching Layer
Meskipun backend punya consistency, API Server punya local cache untuk performance:
┌─────────────────────────────┐
│ API Server │
│ │
│ ┌─────────────────────────┐ │
│ │ Local Cache (in-memory) │ │
│ │ (objects API Server │ │
│ │ recently accessed) │ │
│ └─────────────────────────┘ │
│ ↓ ↑ │
│ └────Watch API ──────┘ │
│ │
│ For writes: always go to │
│ backend (etcd/Spanner) │
└─────────────────────────────┘
↓
┌─────────────────────────────┐
│ etcd / Spanner │
│ (persistent state) │
└─────────────────────────────┘Implication: Cached reads might be stale jika watch connection drops.
Scaling Implications
Object Count Scaling
| Count | etcd Concern | Spanner Concern |
|---|---|---|
| 100k | Default, OK | Works, probably overkill |
| 500k | Reasonable | Better fit |
| 1M+ | Problematic | Better fit |
Write Rate Scaling
| Req/Sec | etcd | Spanner | Mitigation |
|---|---|---|---|
| 100 | OK | OK | - |
| 500 | OK | OK | - |
| 1000+ | Stress point | Better | Client-side batching |
| 10000+ | Impossible | Hard | Shard cluster |
Typical workaround untuk high write rate: Multi-cluster setup dengan sharding.
Disaster Recovery
etcd Backup Strategy
GKE automated backups, tapi production pattern:
- Enable automated backups:
gcloud container backups describe <backup>
# Shows: full snapshot + incremental backups- Test recovery (critical!):
# Create test cluster dari backup
gcloud container backups restore <backup> \
--cluster-name=test-restore- RPO/RTO typically:
- RPO: 1 hour (backup frequency)
- RTO: 30 minutes (restore time)
Spanner Advantages untuk DR
Spanner offers:
- Point-in-time recovery: Recover ke specific timestamp
- Automatic replication: Multi-region backup implicit
- Built-in redundancy: Data loss casi impossible
Performance Tuning
etcd Performance Tuning (GKE)
Limited tuning surface, tapi dapat monitor:
# Check etcd latency
kubectl get --raw /metrics | grep etcd_disk_backend_commit_duration
# Check etcd object count
kubectl get --raw /metrics | grep etcd_object_countsSpanner Performance Tuning
Usually managed by Google, tapi bisa monitor:
# Check Spanner latency
gcloud spanner operations list \
--instance=<instance> \
--database=<database>Production Patterns
Pattern 1: Separate Metadata vs Data
Besar objek (ConfigMap dengan 10MB) masuk backend juga:
# ❌ BAD - Large ConfigMap
apiVersion: v1
kind: ConfigMap
metadata:
name: large-config
data:
data.txt: |
[10 MB of data] # Now in etcd/Spanner!
# ✅ GOOD - Store reference only
apiVersion: v1
kind: ConfigMap
metadata:
name: config-ref
data:
storage-url: gs://bucket/data.txtPattern 2: Object Count Management
Clusters dengan 500k+ objects biasanya lebih baik dengan:
- Shard into multiple clusters
- Archive old objects
- Implement cleanup policies
# Monitor object count
kubectl get all -A --no-headers | wc -l
# Archive old completed jobs
kubectl delete jobs -A \
--field-selector status.successful=1 \
--older-than 7dMigration Between Backends
GKE tidak support live migration between etcd ↔ Spanner. Options:
- Recreate cluster: Simple, downtime required
- Migrate data: Export/import workflow (complex)
- Multi-cluster: New cluster parallel, migrate workload
Reference Dokumentasi
Summary
- etcd: Default Kubernetes backend, good balance latency/scale, <100k objects typical
- Spanner: Google's distributed SQL, multi-region HA, better untuk large clusters
- Both guarantee strong consistency, differences di scaling, DR, operational complexity
- API Server caching layer adds complexity untuk understanding staleness
- Backup/recovery strategy critical — test regularly
- Shard clusters khi single cluster mencapai state limits