Cricinfo Live Score HLD: Serving One Hot Score to Millions of Readers
A live cricket score system design (Cricinfo-style): read fan-out to tens of millions, collapsing a hot key with caching and single-flight, push vs poll delivery, and freshness over consistency.
"Design Cricinfo's live score." It looks tiny — a score is a few numbers. That's the trap. The numbers change a handful of times a minute (one writer, the scorer at the ground), but during a World Cup final tens of millions of people refresh them at once. So this is the mirror image of most designs: writes are trivial, reads are a tidal wave, and they all want the same tiny object. The entire problem is read fan-out of one hot key — and the answer is almost never "scale the database."
The other half is a quiet trade: nobody needs the score to be perfectly consistent. A ball that lands in your app a second late is fine; a score that fails to load is not. Freshness and availability beat strong consistency here, and that choice unlocks the whole design.
Let's start nowhere near a computer
A packed stadium has exactly one scoreboard operator and a hundred thousand fans. The operator changes the board once per ball. How does every fan learn the score?
Not by the operator running to each seat and whispering it — that's a conversation per fan, and it doesn't survive a hundred thousand of them, let alone the millions watching from home. They learn it by glancing at the giant board. One shared surface, updated once, read by the entire crowd for free.
That giant board is a cache, and the lesson is the whole article: when one value is read by a crowd, you don't serve each reader from the source — you put the value on a shared surface (a cache, a CDN edge) that the crowd reads cheaply, and you update that surface when the writer changes it. Scale the board, not the operator.
Where this exact shape shows up
- Live sports (Cricinfo, ESPN), election results, stock tickers — one fast-changing value, an enormous synchronized audience.
- A breaking-news headline, a "users online" counter, a flash-sale stock count — any single hot value under a read storm.
- The rate limiter's shared counter and any cache-fronted read path — same hot-key mechanics, smaller crowd.
- It's the read twin of the newsletter's write fan-out: there, one event went to millions of inboxes; here, millions pull one object.
Step 1 — Functional requirements (sentences first)
- A scorer pushes ball-by-ball updates for a match — runs, wickets, overs, an event ("FOUR", "WICKET"), and a line of commentary.
- A viewer sees a match's live score and scorecard, updating in near-real-time without a manual refresh.
- A viewer browses the list of live / upcoming / finished matches.
- After the match, the full scorecard and commentary stay available as history.
The load-bearing word is "near-real-time": it implies an update path to millions, which is the design. Everything else is a cache away from trivial.
Step 2 — Non-functional requirements
For a live-score service the non-functional requirements aren't garnish — they pick the architecture.
- Massive read scalability. A marquee match is tens of millions of concurrent readers on one object. This is the dominant force.
- Freshness (low latency to the eyeball). An update should reach viewers within ~1–2 seconds. Not microseconds — seconds.
- High availability. The score must always load. A stale-by-two-seconds score beats a spinner.
- Eventual consistency is fine. Readers can briefly disagree by a ball; there's no money or correctness riding on the exact instant. This permission is what lets caching work.
- Cost efficiency. You cannot open and hold tens of millions of live connections without a very good reason — the delivery mechanism has to be cheap at that scale.
Listing them is the easy half; the design only earns them if it fulfills them:
| Requirement | How this design fulfills it |
|---|---|
| Read scalability | the hot score lives in a cache / CDN edge; millions of reads collapse to a trickle — Steps 4, 9 |
| Freshness (~1–2s) | short cache TTL (or push on each ball); the score is small and cheap to refresh — Steps 4, 6 |
| High availability | reads are served from edge/cache even if the origin or writer stalls — Step 10 |
| Eventual consistency | a per-ball version lets clients poll for "anything newer?" and tolerate small lag — Steps 4, 5 |
| Cost efficiency | poll-through-CDN absorbs the crowd without millions of held connections — Step 6 |
Every trade-off below is chosen to keep one of these.
Step 3 — The data model, and which datastore
The shapes are tiny, and they don't all want the same store.
- Current score — one small object per match (
runs,wickets,overs,lastEvent, and a monotonically increasingversion). Read by everyone, changed a few times a minute. - Commentary / ball log — an append-only timeline per match.
- Match list — a short, cacheable list.
- History (final scorecards) — write-once, read occasionally.
package dev.fiveyear.cricinfo;
/** The live state of one match — small, hot, read by millions. `version` lets a
* polling client ask "anything newer than what I have?" and skip an unchanged body. */
record MatchScore(
String matchId, int runs, int wickets, double overs, String lastEvent, long version) {}
/** The source of truth behind the cache (a durable store / Redis). */
interface ScoreStore {
MatchScore load(String matchId);
}Which datastore — and why it isn't a default. The current score belongs in Redis (in-memory), because it's a tiny, blisteringly hot key read by the entire crowd — exactly the access pattern an in-memory store is built for, and Redis replicas plus a CDN sit naturally in front of it. The append-only commentary and history want a wide-column store (Cassandra): high write throughput for the ball log, partitioned by matchId, cheap to append and to range-scan in order — the firehose, not the hot key. A single relational primary would be the wrong reflex twice over: too slow for the read storm, and wasted on data that is never updated, only appended and read. Match the store to the access pattern: in-memory for the hot point read, wide-column for the ordered log.
The version field is the quiet hero: it makes the score cacheable with a cheap freshness check — a client says "I have version 41," and the server answers with the new score or a tiny "nothing newer."
Step 4 — The core problem: collapsing a hot key's reads
Put the score in a cache with a short TTL and most reads never touch the origin — they're served from memory at the edge. But the hot key has a sharp edge of its own: the instant the cached value expires, if a million requests are in flight, a million of them miss at once and stampede the origin. That cache stampede can take down the very database the cache was protecting.
The cure is single-flight: when the cache misses, let exactly one request go fetch from the origin; everyone else who misses in that window waits for that one result. A million misses become one origin load.
package dev.fiveyear.cricinfo;
import java.util.concurrent.CompletableFuture;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ConcurrentMap;
import java.util.function.Supplier;
/**
* Coalesces concurrent loads of the same key into ONE call. When a hot match's
* cached score expires under millions of readers, a naive cache lets every miss
* hit the origin at once — a stampede. Single-flight elects one loader; everyone
* else waits on its result.
*/
public class SingleFlight<K, V> {
private final ConcurrentMap<K, CompletableFuture<V>> inflight = new ConcurrentHashMap<>();
public V get(K key, Supplier<V> load) {
CompletableFuture<V> mine = new CompletableFuture<>();
CompletableFuture<V> existing = inflight.putIfAbsent(key, mine);
if (existing != null) {
return existing.join(); // someone is already loading this key
}
try {
V value = load.get(); // exactly one loader runs the origin call
mine.complete(value);
return value;
} catch (RuntimeException e) {
mine.completeExceptionally(e);
throw e;
} finally {
inflight.remove(key, mine);
}
}
}The cache itself is a short-TTL map that refreshes through single-flight, plus the version check that powers polling:
package dev.fiveyear.cricinfo;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ConcurrentMap;
/**
* Serves the current score from memory with a short TTL. On expiry, single-flight
* lets exactly one request refresh from the origin while the rest coalesce onto
* it — collapsing a hot match's millions of reads into a trickle of origin loads.
*/
public class ScoreCache {
private record Entry(MatchScore score, long expiresAt) {}
private final ConcurrentMap<String, Entry> cache = new ConcurrentHashMap<>();
private final SingleFlight<String, MatchScore> flight = new SingleFlight<>();
private final ScoreStore origin;
private final long ttlMillis;
public ScoreCache(ScoreStore origin, long ttlMillis) {
this.origin = origin;
this.ttlMillis = ttlMillis;
}
public MatchScore get(String matchId, long now) {
Entry e = cache.get(matchId);
if (e != null && now < e.expiresAt()) {
return e.score(); // fresh hit — the common case at scale
}
return flight.get(matchId, () -> {
MatchScore fresh = origin.load(matchId);
cache.put(matchId, new Entry(fresh, now + ttlMillis));
return fresh;
});
}
/** A polling client sends its last seen version; return the score only if newer
* (else the API answers 304 Not Modified and ships no body). */
public MatchScore pollSince(String matchId, long sinceVersion, long now) {
MatchScore s = get(matchId, now);
return s.version() > sinceVersion ? s : null;
}
}This is the heart of the system: a one-second TTL means the origin sees, at most, one read per match per second no matter how many millions are watching. The crowd is absorbed by memory.
Step 5 — Verbs become APIs (the API design)
| Verb / endpoint | Does |
|---|---|
POST /matches/{id}/score | scorer pushes a ball; bumps the score and its version |
GET /matches/{id}/score?v=N | viewer poll; returns the score if version > N, else 304 |
GET /matches/{id}/stream | optional SSE stream that pushes each new ball |
GET /matches/{id}/commentary | paginated ball log from the wide-column store |
GET /matches?state=live | the (cached) match list |
The poll endpoint is deliberately CDN-friendly: a plain GET with a short cache lifetime, so the CDN itself answers the vast majority of requests and only a handful per second reach the origin.
Step 6 — Push or pull? Delivering the update
Two ways to get a new ball to a phone, and the scale picks for you.
- Poll through a CDN — the client re-requests
GET /score?v=Nevery second or two; the CDN serves a cached body and returns304when nothing changed. Dead simple, costs almost nothing per reader, and the CDN's edge caching is what makes tens of millions affordable. The price is up to a second or two of staleness — which the NFRs already said is fine. - Push over SSE / WebSocket — the server streams each ball the instant it happens. Lower latency, but now you're holding tens of millions of open connections and a fan-out tier to feed them. Costly, and overkill when "within ~2 seconds" is the bar.
The pragmatic answer most live-score services land on: poll-through-CDN as the default, with push reserved for the small set of users who want true real-time. Don't pay for connections you don't need.
Step 7 — Trade-offs (each one keeping an NFR)
| Decision | The tempting alternative | Why ours wins | Keeps |
|---|---|---|---|
| cache + CDN the hot score | read the database per request | the crowd hits the edge; the origin sees a trickle | read scalability |
| single-flight on miss | let every miss hit the origin | a cache stampede on expiry can't take down the database | availability |
| poll-through-CDN by default | push to every viewer | no tens-of-millions of held connections; the CDN absorbs the load | cost efficiency |
short TTL / version polling | strong consistency on every read | ~1–2s staleness is invisible to a fan and makes caching legal | freshness |
| Redis hot key + Cassandra log | one relational database | in-memory for the point read, wide-column for the firehose | read scalability |
The complete implementation
The classes above are the system's core. Here's the driver that proves the hot-key behavior — 200 concurrent readers collapse to a single origin load, TTL refreshes exactly once, and version polling answers 304 correctly:
package dev.fiveyear.cricinfo;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.atomic.AtomicInteger;
public class Main {
public static void main(String[] args) throws Exception {
AtomicInteger loads = new AtomicInteger();
CountDownLatch release = new CountDownLatch(1);
// origin is slow on purpose: the first loader blocks until we release it,
// so all concurrent readers pile up and must coalesce.
ScoreStore origin = matchId -> {
loads.incrementAndGet();
try { release.await(); } catch (InterruptedException ignored) {}
return new MatchScore(matchId, 187, 3, 24.4, "FOUR", 42);
};
ScoreCache cache = new ScoreCache(origin, 1000);
int readers = 200;
CountDownLatch started = new CountDownLatch(readers);
CountDownLatch done = new CountDownLatch(readers);
MatchScore[] seen = new MatchScore[readers];
for (int i = 0; i < readers; i++) {
final int idx = i;
new Thread(() -> {
started.countDown();
seen[idx] = cache.get("ind-aus", 0); // all at the same logical "now"
done.countDown();
}).start();
}
started.await(); // all 200 threads have entered
Thread.sleep(200); // ensure they've all coalesced onto one future
release.countDown(); // let the single loader finish
done.await();
assertTrue(loads.get() == 1, "200 concurrent readers caused exactly 1 origin load (got " + loads.get() + ")");
for (MatchScore s : seen) assertTrue(s != null && s.runs() == 187, "every reader saw the score");
// a fresh read within TTL is a pure cache hit — no new load
cache.get("ind-aus", 500);
assertTrue(loads.get() == 1, "read within TTL hits cache, no origin load");
// past the TTL, exactly one more load refreshes it
cache.get("ind-aus", 1500);
assertTrue(loads.get() == 2, "read past TTL triggers one refresh (got " + loads.get() + ")");
// version-based polling: same version -> 304 (null); newer -> the score
assertTrue(cache.pollSince("ind-aus", 42, 1600) == null, "poll with current version -> not modified");
assertTrue(cache.pollSince("ind-aus", 41, 1600) != null, "poll with older version -> fresh score");
System.out.println("ALL CRICINFO ASSERTIONS PASSED");
}
static void assertTrue(boolean c, String m) { if (!c) throw new AssertionError(m); }
}Step 8 — Only now, the boxes
With the hot-key read settled, the architecture is those responsibilities given homes.
The shape to notice is the asymmetry: a thin write path (scorer → ingestion → Redis, persisted to Cassandra) feeds a fat read path (Redis → CDN/edge for pollers, Redis → pub/sub → SSE tier for pushers). The score is written in one place and read from many — that's the whole game.
Step 9 — Scaling the design, one bottleneck at a time
- Reads are the only real load → front the score with a cache, then a CDN edge so reads are served close to the viewer and never reach the origin.
- The audience is global → replicate the hot key to Redis replicas and CDN POPs per region; a reader in Mumbai and one in London both hit a nearby edge.
- The hot key can't be sharded → you can't split one match's current score across shards, so you replicate it instead. (Sharding still helps the other axis: spread different matches, and the commentary log, across shards by
matchId.) - Push at scale → if you offer SSE/WebSocket, that's a horizontally scaled tier of connection servers behind pub/sub — added only for the users who need sub-second latency.
The headline: for a hot read key, the lever is replication and caching, not sharding. Sharding splits load across keys; here the load is on one key.
Step 10 — When a piece fails: designing for failure
- The CDN or cache layer fails → it's an optimization; requests fall back to the origin (Redis), slower and hotter, but the score still loads. Degrade, don't break.
- The source of truth (Redis/Cassandra) stalls → keep serving the last cached score from the edge; reads stay up, writes pause. This is the availability-over-consistency choice made concrete — a two-second-stale score is the correct failure.
- The ingestion path or scorer drops → the score simply freezes at its last value; show "updated 8s ago" and recover by replaying from the source. No reader sees an error.
- The push tier falls over → SSE clients fall back to polling the cached score. Push is the luxury path; poll-through-CDN is the floor that always works.
The interview corner
- "Writes are rare and reads are enormous — what changes?" You stop scaling the database and start scaling the read surface: cache, CDN, replicas. The hot value goes on a shared board the crowd reads.
- "The cached score expires and a million requests miss at once." Single-flight: one request refreshes the origin, the rest wait on it. Otherwise the cache's expiry becomes a self-inflicted stampede.
- "Push or poll for live updates?" Poll-through-CDN by default — cheap, absorbs the crowd, ~1–2s fresh. Push (SSE) only for users who need true real-time, because tens of millions of held connections are expensive.
- "How do you shard one match read by 50M people?" You don't — you can't shard a single key. You replicate it across cache/CDN/Redis replicas. Sharding is for spreading different matches and the commentary log.
- "Consistency?" Eventual, deliberately. A ball arriving a second late is invisible; a score that won't load is not. That choice is what makes the whole caching strategy legal.
Where to go from here
- New to system design? The rookie's guide to HLD walks the method this article follows.
- This is the read twin of write fan-out — see the newsletter service for one event reaching millions of inboxes.
- The hot-key cache and its stampede live in miniature in the thread-safe LRU cache and the rate limiter.