Menu
 

Live Game Backend Scaling Playbook: From 100 to 100K Players

Live Game Backend Scaling Playbook: From 100 to 100K Players

Scaling a live multiplayer backend is not about one heroic upgrade — it is about hitting four growth thresholds without the backend becoming the part that breaks. This playbook walks through the transitions at 100, 1K, 10K, and 100K peak concurrent players, with the signals to watch and the knobs to turn at each step.

The 80/50/99.9 rule: scale when CPU sustains >80%, P95 API latency crosses 50 ms, or monthly uptime drops below 99.9%. Below those thresholds, resist the urge to rearchitect.

Threshold 1: 100 CCU (Soft Launch)

  • Stack: one API process, one Postgres, one Redis, object storage, in-process workers.
  • Focus: correctness, not throughput. Write the schema well. Add indexes on (player_id, key) for documents and on leaderboard (lb_id, score desc).
  • Watch: error rate, not latency. Anything over 0.5% 5xx is a bug, not a capacity issue.

Threshold 2: 1,000 CCU (First Live Game)

  • Add: Redis leaderboard cache, per-player and per-project rate limits, structured logs, a staging environment.
  • Focus: isolate staging so balance tests don't corrupt production leaderboards.
  • Watch: top 10 slowest endpoints; budget a P95 of 50 ms on all reads.

Threshold 3: 10,000 CCU (Active Commercial Game)

  • Add: horizontal API replicas behind a load balancer, read replicas for Postgres, Redis persistence tuned, object storage with CDN in front of config bundles.
  • Focus: autoscale API replicas on CPU + queue depth, not purely CPU. Bare-metal baseline + cloud burst is the most cost-effective pattern.
  • Watch: connection count on Postgres; use PgBouncer in transaction mode if you run more than ~16 API replicas.
  • Heartbeats: if each active server heartbeats every 20s, 500 servers produce ~2.6M heartbeats/month just from that fleet.

Threshold 4: 100,000+ CCU (Viral Hit)

  • Add: sharded Postgres or partitioned tables on hot keys (leaderboard entries, player documents), dedicated Redis cluster, regional API deployment for geographic proximity.
  • Focus: protect the database from cache stampedes. Use SWR (stale-while-revalidate) on server browser and active-config endpoints.
  • Watch: egress bandwidth. Config bundle downloads from 100K clients swinging at once can exceed compute cost. Cache bundles at the CDN edge.
  • Liveops: pre-warm the cache before a scheduled event; make rollbacks boring.

Growth Thresholds at a Glance

Peak CCU Main bottleneck Investment
100 Correctness Indexes + logs
1,000 Leaderboard reads Redis cache + rate limits
10,000 DB connections + heartbeats Replicas + pgbouncer + CDN for configs
100,000+ Egress + cache stampedes Sharding + regional API + SWR

The Cost-Per-CCU Lens

i3D.net's "true cost per CCU" framing is useful: compute, bandwidth, and storage must be considered together. Bandwidth often dominates at scale — a game with 100K CCU and modest per-player traffic can easily land on five-figure monthly egress bills. Measure your real cost per CCU quarterly and use it to decide when to move from cloud burst to more bare metal.

Managed Backend as a Scale Lever

A managed backend (Supercraft GSB, PlayFab, Metaplay) absorbs thresholds 1–3 for you. You still own autoscaling on the game server side, but rate limits, replicas, and cache tiers become someone else's pager. That is usually the right trade for indie and mid-size studios.

Scaling-day discipline: during a launch, freeze config changes, pre-warm caches 30 minutes before the announcement, and keep one engineer free of responsibilities to watch graphs. Most avoidable incidents happen when everyone is busy doing something else.

Related in This Hub

Start on the Supercraft Game Server Backend page or explore the API.

Top