100k concurrent on $19/month — how Futbolis stayed cheap

The live football board behind Futbolis peaked at around 100,000 concurrent viewers during a derby. The infrastructure bill for the realtime layer that month was about $19. This is how.

The short version: pick the cheapest transport that meets the requirements, fan out through one piece of shared state, and let the kernel do the backpressure work. The long version follows.

The shape of the traffic

A live match has a very specific traffic shape. Most users connect once at kickoff and stay connected for ninety minutes. The server publishes maybe two hundred discrete events per match — goals, cards, substitutions, xG ticks every fifteen seconds. Each event is small (a few hundred bytes of JSON). Per user, the total downlink for a full match is on the order of 80–120 KB. At 100k concurrent, that's a peak fanout of around 12 GB transferred per match-window, spread over ninety minutes.

Three implications:

The connection is long-lived, low-bandwidth, server-push-dominated.
The total bytes are tiny — egress is not the cost driver.
The cost driver is how you hold 100k idle TCP sockets.

Why SSE, not WebSockets

Server-Sent Events get dismissed as "the worse WebSocket." For a one-way fanout like this they're the strictly better choice:

SSE rides on plain HTTP/2. No upgrade dance. No sticky-session load balancer. No layer-4 LB drama. Vercel's standard edge runtime serves it without a custom config.
A single HTTP/2 connection multiplexes streams, so a CDN that already fronts your site can also front the realtime layer with no protocol configuration.
The browser handles reconnect, Last-Event-ID resume, and exponential backoff for free. You write one event handler and stop thinking about the transport.

The only thing you give up is bidirectional messages from the client, and a live football board doesn’t have any.

// packages/realtime/src/sse/server.ts (excerpt)
export function sseResponse(channel: Channel, opts: SseServerOptions = {}): Response {
  const heartbeatMs = opts.heartbeatMs ?? 15_000;
  const encoder = new TextEncoder();

  const stream = new ReadableStream<Uint8Array>({
    start(controller) {
      const send = (chunk: string) => {
        try { controller.enqueue(encoder.encode(chunk)); } catch { /* closed */ }
      };

      send(encodeRetry(opts.retryMs ?? 2_000));

      const unsub = channel.subscribe((ev) => send(encodeSse(ev)));

      const heartbeat = setInterval(() => send(encodeHeartbeat()), heartbeatMs);

      return () => { clearInterval(heartbeat); unsub(); };
    },
  });

  return new Response(stream, {
    headers: {
      'Content-Type': 'text/event-stream; charset=utf-8',
      'Cache-Control': 'no-cache, no-transform',
      Connection: 'keep-alive',
      // Important on nginx/CloudFront — disables response buffering so
      // each event flushes the moment we enqueue it.
      'X-Accel-Buffering': 'no',
    },
  });
}

Three boring details on that snippet that account for most of the production reliability:

X-Accel-Buffering: no — without it, intermediate proxies hold onto your bytes until the buffer fills. A goal in the seventeenth minute lands in the user’s browser at the eighteenth.
15s heartbeats — long enough to be cheap, short enough that load balancers, mobile NAT tables, and corporate proxies don’t drop the connection mid-match.
Encoder reused per request — not per event. Allocating a TextEncoder per frame at 100k connections is the kind of mistake that doesn’t show up in dev and ruins your day in production.

Fanout math

A single Node web node on a Vercel function can comfortably hold around 5,000–10,000 idle SSE connections. Memory is dominated by the ReadableStream controller and the per-channel subscription closure. In practice we saw ~70 MB resident for ten thousand connections.

To reach 100k concurrent you need a dozen or so stateless web nodes. Each publish event has to reach every node. That’s where the fanout adapter comes in:

// packages/realtime/src/server.ts (excerpt)
export function createRealtimeServer(opts: RealtimeServerOptions = {}): RealtimeServer {
  const registry = new ChannelRegistry();
  const fanoutSubs = new Map<string, () => void | Promise<void>>();

  const channel = (name: string): Channel => {
    const ch = registry.channel(name);
    if (opts.fanout && !fanoutSubs.has(name)) {
      // First subscriber on this node bridges Redis → in-process channel.
      // Subsequent subscribers piggyback on the same fan-in.
      const result = opts.fanout.subscribe(name, (ev) => ch.deliver(ev));
      fanoutSubs.set(name, () => result);
    }
    return ch;
  };

  return {
    registry,
    channel,
    async publish(name, event, data) {
      const ch = channel(name);
      const ev = ch.publish(event, data);
      // Local subscribers got it from `ch.publish` already.
      // Now bridge to the other nodes via the shared backbone.
      if (opts.fanout) await opts.fanout.publish(ev);
      return ev;
    },
  };
}

A single Redis PUBLISH is one round trip and a few bytes. With twelve web nodes subscribed to the channel, one publish becomes twelve cross-AZ TCP frames and 100k browser-bound SSE frames. Redis is doing the 1 → N work that you would otherwise have to coordinate yourself with sticky sessions and intra-cluster RPC.

The cost shape:

| Component | Tier | $ / month | | --------------------- | ----------------- | --------- | | Vercel Pro (compute) | included w/ team | $0 marginal | | Vercel egress | first 1 TB free | $0 | | Fly.io Redis | shared-cpu-1x, 256 MB | $5 | | DNS / certs | Cloudflare free | $0 | | Marginal realtime | | $5 | | Fly app for the WS | shared-cpu-1x, 256 MB (idle) | $0–5 | | Vercel team seat | already paying | $19 attributable |

The $19 number on the title is “the Vercel seat we were already paying for, plus the $5 Redis box.” The marginal cost of adding realtime to a stack that already had Vercel Pro was the $5 Fly box.

The ten lines that mattered most

Most cost engineering is removing things. The handful of code changes that moved the needle:

1. Drop frames on slow consumers

A single hung mobile connection can wedge a Node process if the writes queue up. The fix is to detect controller backpressure and drop non-critical frames rather than buffer them:

const send = (chunk: string) => {
  // controller.desiredSize <= 0 means the consumer is already behind.
  // For ticker-style data, dropping the frame is correct — the next
  // event has the same shape and supersedes this one.
  if ((controller.desiredSize ?? 0) <= 0 && isLossyEvent(chunk)) return;
  try { controller.enqueue(encoder.encode(chunk)); } catch { /* closed */ }
};

For goals you don’t drop. For xG-tick updates you absolutely do.

2. Heartbeat as the only timer

Early versions ran a per-connection idle-disconnect timer and a heartbeat. At 100k connections that’s 200k timer entries in the event loop. We deleted the idle timer and let the heartbeat double as a liveness signal: if controller.enqueue throws on a heartbeat, the peer is gone.

3. One Redis publish per match, not per user

The first version naively republished match state to every user’s private channel. That meant a goal triggered 100k Redis publishes. We moved to one channel per match, with the SSE handler subscribing on behalf of all of that match’s viewers. Redis CPU dropped by three orders of magnitude.

4. Heartbeat encoded once

const HEARTBEAT_BYTES = new TextEncoder().encode(': keep-alive\n\n');
const sendHeartbeat = () => {
  try { controller.enqueue(HEARTBEAT_BYTES); } catch { /* closed */ }
};

A pre-encoded Uint8Array shared across every connection. Saves allocations, and at 15s × 100k that’s ~6,700 fewer object allocations per second.

What I’d do differently

If I were starting over today:

WebTransport for unreliable channels (live position data, chat presence). Same browser support story as WebRTC but a vastly nicer API. The package leaves room for a third transport adapter.
Cloudflare Durable Objects as an alternative fanout backend. Same $5-ish cost shape, lower operational footprint than running Redis.
Stop parsing JSON on the wire. For high-frequency tickers, pre-encode the SSE frame at publish time and ship the bytes through the fanout layer untouched. The encoder/decoder symmetry is wasted work when the producer always knows the shape.

Takeaways

The thing that kept the bill at $19/month wasn’t any one trick. It was picking SSE so we never had to scale a WebSocket cluster, then fanning out through a single $5 Redis instance, then being careful with allocations on the hot path. None of those choices is novel. They’re just the choices that fall out of taking the cost question seriously before you write any code.

The starter that extracts this stack is @jmnpr/realtime. Quickstart and API reference live at realtime.jmnpr.co/docs. The package is Apache-2.0 — fork it, gut it, ship your own live board.