Scaling Deployment แล้วพัง (replicas เพิ่มแต่ระบบไม่พร้อม)

เคสจริง

Marketing campaign กำลังจะส่ง email หาลูกค้า 100,000 คน — ทีม scale webapp deployment จาก 3 → 20 replica เพื่อรับ traffic

kubectl scale deployment webapp --replicas=20

ผลที่เกิด:

ทุก replica พยายามเชื่อม Postgres → connection pool เต็มทันที → DB ปฏิเสธ connection
Redis ที่ rate limit ของแต่ละ pod คนละ key → rate limit ใช้ไม่ได้
Pod ใหม่ start แต่ readiness probe เช็ค DB → fail → restart loop
LoadBalancer กระจายไปยัง pod ที่ readiness ผ่านเท่านั้น = ตอน scale ไม่มี pod ที่ ready ทั้งหมดทำให้ existing pod รับ load หนัก

จาก "เพิ่ม pod เพื่อรับ traffic" → กลายเป็น "ล่มทั้งระบบ"

Scale แล้วพังเพราะอะไร

1. Database connection limit

Postgres default max_connections = 100

ถ้า 1 pod เปิด connection pool 20 connection → 5 pod = 100 connection (เต็ม)

scale 10 pod → ขอ 200 connection → DB reject ครึ่งหนึ่ง

แก้:

A. ใช้ connection pooler

services:
  pgbouncer:
    image: edoburu/pgbouncer:1.22
    environment:
      DATABASE_URL: postgres://...
      POOL_MODE: transaction
      MAX_CLIENT_CONN: 1000
      DEFAULT_POOL_SIZE: 20

App connect ไป pgbouncer แทน Postgres — pgbouncer pool ให้

B. ลด pool size ต่อ pod

// pg pool ใน Node.js
const pool = new Pool({
  connectionString: process.env.DATABASE_URL,
  max: 5,           // ↓ จาก 20 เหลือ 5
})

ที่ 20 pod × 5 = 100 connection — ยัง fit max_connections

C. เพิ่ม max_connections

ALTER SYSTEM SET max_connections = 500;
SELECT pg_reload_conf();   -- ต้อง restart ถ้าไม่ได้

ระวัง — เพิ่ม connection ก็เพิ่ม memory usage ต่อ DB เครื่อง

2. Cache cold start

Pod ใหม่ start — cache ใน RAM ว่าง — request แรก hit DB ทุก request

20 pod × 1,000 req/s × cold = DB โดน 20,000 req/s ที่ไม่ผ่าน cache

แก้:

A. Shared cache (Redis) แทน in-memory

ทุก pod connect Redis เดียวกัน — cache hit ทันที

import Redis from 'ioredis'
const redis = new Redis(process.env.REDIS_URL!)

async function getUser(id: string) {
  const cached = await redis.get(`user:${id}`)
  if (cached) return JSON.parse(cached)
  // ...
}

อ่าน Redis for Beginners

B. Pre-warm cache ก่อน ready

readinessProbe:
  httpGet: { path: /ready, port: 3000 }
  initialDelaySeconds: 30   # ให้เวลา warm cache

ใน app endpoint /ready:

let isWarm = false

app.listen(3000, async () => {
  await preloadCache()    // load top 100 user / popular content
  isWarm = true
})

app.get('/ready', (req, res) => {
  res.status(isWarm ? 200 : 503).end()
})

3. ดาวน์สตรีม service ขยายไม่พอ

Webapp scale 20 — แต่ API service หลังที่ webapp เรียก ยัง 3 replica = bottleneck

ตรวจ:

kubectl get pods -A | grep -E "webapp|api"
kubectl top pods -A

แก้: scale dependent service พร้อมกัน หรือใช้ HPA ให้ขยายตาม load

4. Stateful App — ไม่ scale ตรงๆ ได้

Service ที่มี state ต่อ pod (เช่น session in-memory, queue worker ที่ lock job):

scale แล้ว state แต่ละ pod ไม่ sync
Race condition

แก้:

ทำ stateless — เก็บ state ใน DB / Redis
ใช้ StatefulSet ไม่ใช่ Deployment ถ้าต้อง stable identity

5. Rolling update ใช้ resource ชั่วคราว 2 เท่า

Default RollingUpdate strategy:

strategy:
  type: RollingUpdate
  rollingUpdate:
    maxSurge: 25%       # pod เพิ่มได้ระหว่าง update (สูงสุด 1.25x)
    maxUnavailable: 25%

ถ้า scale 10 → 20 replica + rolling deploy พร้อมกัน = 25 pod ชั่วคราว = node อาจไม่พอ

แก้:

ตั้ง maxSurge: 0 ถ้า resource ตึง
Scale ก่อน rollout ใหม่ — ไม่ทำพร้อมกัน

6. Liveness probe fail ตอน load สูง

Pod รับ load เกินที่ไหว — response ช้า — liveness probe timeout = restart pod

restart ทำให้ load เพิ่มไปยัง pod ที่เหลือ → cascade failure

แก้:

ตั้ง liveness timeoutSeconds ให้นานพอ (5-10 วินาที)
ตั้ง failureThreshold = 3-5 (ไม่ใช่ 1)
Liveness ไม่ควรเช็ค DB / external dependency

livenessProbe:
  httpGet:
    path: /health   # endpoint เบาที่สุด
    port: 3000
  periodSeconds: 30
  timeoutSeconds: 5
  failureThreshold: 3

วิธี Scale อย่างปลอดภัย

1. ใช้ HPA — ค่อยๆ ขยาย

แทน manual scale:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: webapp
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: webapp
  minReplicas: 3
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
        - type: Percent
          value: 50
          periodSeconds: 60      # เพิ่มสูงสุด 50% ทุก 1 นาที
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
        - type: Percent
          value: 25
          periodSeconds: 60

HPA ค่อยๆ เพิ่ม pod ตาม load จริง — DB / dependency ขยายตามได้

2. PodDisruptionBudget — กันลด pod เกินที่รับได้

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: webapp
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: webapp

ตอน drain node / rolling update — K8s รักษา pod อย่างน้อย 2 ตัวเสมอ

3. Resource request/limit ที่ถูกต้อง

resources:
  requests:
    cpu: 100m
    memory: 128Mi
  limits:
    cpu: 500m
    memory: 512Mi

requests = scheduler ใช้คำนวณว่า node ไหนมีที่ limits = container ใช้เกินไม่ได้

ตั้ง request ต่ำเกิน → schedule ได้แต่แย่งกัน ตั้งสูงเกิน → schedule ไม่ขึ้นเพราะ node ไม่พอ

วัดจริงใช้ kubectl top pod ตอน load จริง — ตั้ง request = ค่าเฉลี่ย × 1.5

4. ทดสอบ load ก่อน production

# k6 — load test ที่ดี
k6 run --vus 1000 --duration 5m script.js

หา breaking point ก่อน production — รู้ว่าระบบรับได้กี่ user / RPS

5. Progressive rollout

ใช้ Argo Rollouts หรือ Flagger:

Canary 10% → ดู metric → 25% → 50% → 100%
Auto-rollback ถ้า error rate สูง

apiVersion: argoproj.io/v1alpha1
kind: Rollout
spec:
  strategy:
    canary:
      steps:
        - setWeight: 10
        - pause: { duration: 5m }
        - setWeight: 25
        - pause: { duration: 5m }
        - setWeight: 50
        - pause: { duration: 5m }
        - setWeight: 100

Checklist ก่อน Scale Up

[ ] DB connection pool รองรับ 20 pod × pool size
[ ] Cache เป็น shared (Redis) ไม่ใช่ in-memory
[ ] Dependent service ขยายตามได้
[ ] Readiness probe checks dependency
[ ] Liveness probe ไม่ strict เกิน
[ ] Resource request ถูก
[ ] PodDisruptionBudget ตั้งไว้
[ ] Load test ผ่าน

Checklist ตอน Scale Down

[ ] PreStop hook drain connection ก่อนปิด
[ ] terminationGracePeriodSeconds พอ
[ ] graceful shutdown ใน app code

spec:
  containers:
    - name: app
      lifecycle:
        preStop:
          exec:
            command: ["sh", "-c", "sleep 15 && kill -SIGTERM 1"]
  terminationGracePeriodSeconds: 60

TLDR

Scale ไม่ใช่แค่เพิ่ม replicas — ต้องคิดทั้ง stack:

Database — pooler + max_connections
Cache — Redis shared
Dependencies — scale พร้อมกัน
Probes — liveness/readiness สมเหตุสมผล
Strategy — HPA + Canary แทน manual

สรุป

scale up พังเพราะ "ลืมว่า dependency ก็ต้อง scale ด้วย"

ใช้ HPA + load test + monitoring (Prometheus) ให้ระบบขยายเองอย่างปลอดภัย

อ่านต่อ: