Load Balancer HLD: The Front Door That Never Sends You to a Dead Server
A load balancer system design: balancing algorithms (round-robin, least-connections, consistent hashing), health checks, L4 vs L7, and keeping the balancer itself from being a single point of failure.
"Design a load balancer." It sounds like a one-liner — "spread requests across servers" — but it's the component every other system on this site quietly depends on, and it has three real jobs: distribute load so no server melts while others idle, detect failure so a dead backend never receives a request, and not become a single point of failure itself. The interesting parts are the algorithm for picking a server, the health checks that keep a dying one out of rotation, and the redundancy that keeps the front door open when the door's own hinge breaks.
The keystone is small and concrete: given a pool of servers and a request, which server gets it — and how do you make sure that server is alive?
Let's start nowhere near a computer
Picture the host at a busy restaurant door. Guests arrive; the host seats them across the waiters. A good host doesn't pile every table on one waiter — they spread guests evenly (round-robin), or send the next party to the waiter with the fewest tables right now (least-connections), and they skip a waiter who's on break (a health check). And if the host themselves steps away, a backup host takes the podium so the door is never unattended.
Swap the nouns: the host is the load balancer, the waiters are the backend pool, "spread evenly / send to the least busy" are balancing algorithms, "skip the waiter on break" is a health check, and "a backup host" is LB redundancy. That's the whole design — one address out front, a pool behind it that can grow, shrink, and fail without the diners noticing.
Where this exact shape shows up
- Nginx, HAProxy, AWS ELB/ALB, Envoy — every web stack has one (often several layers).
- API gateways, service meshes, database read-replica routers — same pick-a-healthy-backend job.
- It's the front of nearly every HLD here — the Cricinfo edge, the WhatsApp gateway, the Amazon API tier all sit behind one.
Step 1 — Functional requirements (sentences first)
- Accept incoming requests on one stable address and forward each to a backend in the pool.
- Distribute requests across backends by a configurable strategy.
- Health-check backends and route only to healthy ones.
- Support sticky routing when a request must reach a specific backend (session/cache affinity).
- Let the pool change — backends added or removed — with no client impact.
The load-bearing verbs are "distribute by a strategy" and "route only to healthy ones." They are the system.
Step 2 — Non-functional requirements
- Even distribution. No backend hot while others idle; the strategy must match the workload.
- Health-awareness. A failing backend leaves rotation fast; never send a request to a dead server.
- Low added latency. The LB is on every request, so its own overhead must be tiny.
- The LB is not a SPOF. It must be redundant — the front door can't have one hinge.
- Stickiness when needed. Some requests must land on the same backend (warm cache, session) — without a session database.
Listing them is the easy half; the design only earns them if it fulfills them:
| Requirement | How this design fulfills it |
|---|---|
| Even distribution | round-robin / least-connections / weighted strategies, chosen per workload — Step 3 |
| Health-awareness | periodic health checks pull failing backends out of rotation — Step 4 |
| Low added latency | the LB is near-stateless on the data path; L4 forwards without parsing — Steps 5 |
| Not a SPOF | redundant LBs behind a floating VIP / anycast / DNS — Step 6 |
| Stickiness without a DB | consistent hashing maps a key to a stable backend — Step 3 |
Every trade-off below is chosen to keep one of these.
Step 3 — The strategies
The core decision — which backend gets the next request — has three classic answers, each right for a different workload.
package dev.fiveyear.lb;
/** One server in the pool. `healthy` is flipped by health checks; `inFlight`
* tracks current connections for least-connections balancing. */
final class Backend {
final String id;
final int weight;
boolean healthy = true;
int inFlight = 0;
Backend(String id, int weight) { this.id = id; this.weight = weight; }
}package dev.fiveyear.lb;
import java.util.Comparator;
import java.util.List;
/** How to pick one backend from the healthy set. */
interface Strategy {
Backend pick(List<Backend> healthy);
}
/** Round-robin: hand requests out in a rotating cycle — even spread, ignores load. */
class RoundRobin implements Strategy {
private int next = 0;
public Backend pick(List<Backend> healthy) {
Backend b = healthy.get(Math.floorMod(next, healthy.size()));
next++;
return b;
}
}
/** Least-connections: send the next request to whoever is busiest-least —
* self-correcting when requests have uneven cost. */
class LeastConnections implements Strategy {
public Backend pick(List<Backend> healthy) {
return healthy.stream().min(Comparator.comparingInt(b -> b.inFlight)).orElseThrow();
}
}The picks, in one line each: round-robin for uniform requests (cheap, perfectly even); least-connections when request cost varies wildly (it self-corrects — a backend stuck on a slow request stops receiving new ones); consistent hashing when you need affinity (the same user or key always lands on the same backend, keeping its cache warm) — the same ring trick as the Amazon cart, here for stickiness without a session database.
Step 4 — Routing and health checks
The balancer ties it together: filter the pool to the healthy backends, let the strategy pick one, and track in-flight connections so load-aware strategies work. A separate health-check loop flips backends up and down.
package dev.fiveyear.lb;
import java.util.List;
/**
* Routes each request to one healthy backend by a pluggable strategy. The two
* jobs that matter: never route to an unhealthy backend, and track in-flight
* connections so load-aware strategies work. Acquire on route, release on done.
*/
public class LoadBalancer {
private final List<Backend> pool;
private final Strategy strategy;
public LoadBalancer(List<Backend> pool, Strategy strategy) {
this.pool = pool;
this.strategy = strategy;
}
public Backend route() {
List<Backend> healthy = pool.stream().filter(b -> b.healthy).toList();
if (healthy.isEmpty()) throw new IllegalStateException("no healthy backend");
Backend chosen = strategy.pick(healthy);
chosen.inFlight++;
return chosen;
}
public void release(Backend b) {
if (b.inFlight > 0) b.inFlight--;
}
/** A health check marks a backend up or down; routing skips the down ones. */
public void setHealthy(String id, boolean healthy) {
for (Backend b : pool) if (b.id.equals(id)) b.healthy = healthy;
}
}The single most important line is filter(b -> b.healthy) then if (healthy.isEmpty()) throw: the LB would rather fail fast than hand a request to a dead server. Health checks come in two flavours — active (the LB probes GET /health on a timer) and passive (it watches real traffic and ejects a backend that starts erroring, "outlier detection"). A recovered backend is readmitted gradually (slow-start) so it isn't flooded the instant it returns.
Step 5 — L4 vs L7, and where the state lives
A balancer works at one of two layers. L4 (transport) routes by IP/port — it forwards packets/connections without reading them, so it's blazing fast and protocol-agnostic. L7 (application) reads the HTTP request, so it can route by path, host, or header, terminate TLS, and do smarter things — at the cost of parsing every request. Real stacks layer them: a fast L4 tier spreads connections to a fleet of L7 balancers that do the smart routing.
Which "datastore"? Almost none — and that's the point. The LB is stateless on the hot path: it forwards and forgets. Its only state is pool membership and health, kept in memory (and shared across LB instances via a config service or service registry), plus per-backend in-flight counts. Stickiness is deliberately done with consistent hashing, not a session store, so the LB never has to look anything up to route. Keeping the data path stateless is what keeps the added latency near zero.
Step 6 — The load balancer can't be a single point of failure
Put one LB in front of everything and you've just moved the single point of failure, not removed it. The fix: run at least two LBs and front them with something that can shift traffic instantly — a floating virtual IP (the standby grabs the VIP if the active dies), anycast (the network routes to the nearest healthy LB), or DNS with health-checked records.
So the redundancy is layered: the LB makes the backends highly available, and a VIP/anycast layer makes the LB highly available. Turtles, but only two deep.
Step 7 — Trade-offs (each one keeping an NFR)
| Decision | The tempting alternative | Why ours wins | Keeps |
|---|---|---|---|
| least-connections (uneven work) | always round-robin | a backend stuck on a slow request stops getting new ones | even distribution |
| active + passive health checks | trust backends are up | a dead/erroring backend leaves rotation in seconds | health-awareness |
| consistent hashing for stickiness | a session store lookup | affinity with no per-request DB hit | latency / stickiness |
| L4 in front of L7 | one smart L7 tier | cheap connection spread, then smart routing where it's worth it | low latency |
| redundant LBs + floating VIP | a single big LB | the front door survives losing a balancer | not a SPOF |
The complete implementation
The strategies and balancer are the engine. Here's the driver that proves them — even round-robin, an unhealthy backend dropping out, least-connections picking the idlest, and a fail-fast when the whole pool is down:
package dev.fiveyear.lb;
import java.util.List;
public class Main {
public static void main(String[] args) {
Backend a = new Backend("a", 1), b = new Backend("b", 1), c = new Backend("c", 1);
// round-robin cycles evenly (release immediately so load doesn't matter)
LoadBalancer rr = new LoadBalancer(List.of(a, b, c), new RoundRobin());
StringBuilder seq = new StringBuilder();
for (int i = 0; i < 6; i++) { Backend x = rr.route(); seq.append(x.id); rr.release(x); }
assertTrue(seq.toString().equals("abcabc"), "round-robin spreads evenly (got " + seq + ")");
// an unhealthy backend drops out of rotation
rr.setHealthy("b", false);
StringBuilder seq2 = new StringBuilder();
for (int i = 0; i < 4; i++) { Backend x = rr.route(); seq2.append(x.id); rr.release(x); }
assertTrue(!seq2.toString().contains("b"), "unhealthy backend is skipped (got " + seq2 + ")");
rr.setHealthy("b", true);
// least-connections sends to the least busy
Backend p = new Backend("p", 1), q = new Backend("q", 1), r = new Backend("r", 1);
p.inFlight = 5; q.inFlight = 0; r.inFlight = 2;
LoadBalancer lc = new LoadBalancer(List.of(p, q, r), new LeastConnections());
Backend pick = lc.route();
assertTrue(pick.id.equals("q"), "least-connections picks the idlest (got " + pick.id + ")");
assertTrue(q.inFlight == 1, "routing increments the chosen backend's in-flight count");
// every backend down -> routing fails fast, doesn't pick a dead server
LoadBalancer dead = new LoadBalancer(List.of(new Backend("z", 1)), new RoundRobin());
dead.setHealthy("z", false);
boolean threw = false;
try { dead.route(); } catch (IllegalStateException e) { threw = true; }
assertTrue(threw, "no healthy backend -> fail fast, never route to a dead one");
System.out.println("ALL LOADBALANCER ASSERTIONS PASSED");
}
static void assertTrue(boolean cond, String msg) { if (!cond) throw new AssertionError(msg); }
}Step 8 — Scaling the design, one bottleneck at a time
- The backend pool is the load → the LB makes scaling it trivial: add backends, health checks pick them up, traffic rebalances. This is the LB's whole reason to exist.
- One LB tier saturates → run many LBs, fronted by DNS round-robin or anycast; the balancer layer scales horizontally too.
- Smart routing is expensive → put a cheap L4 tier in front of the L7 tier, so only requests that need application-level routing pay for it.
- Cache affinity matters → consistent hashing keeps a key on the same backend, so backend caches stay warm as the pool changes size (only a slice of keys move).
The headline: an LB scales the system behind it effortlessly, and scales itself by being replicated behind a VIP/anycast layer.
Step 9 — When a piece fails: designing for failure
Failure handling isn't a feature of a load balancer — it's the entire point.
- A backend dies → health checks eject it within seconds; in-flight requests are retried on another backend. A backend is an optimization, fully replaceable.
- The active LB dies → the floating VIP / anycast shifts traffic to the standby LB, which has the same pool and health view. The front door fails over; clients barely notice.
- A backend is slow, not dead (a false "healthy") → passive outlier detection ejects it on rising error/latency, and a circuit breaker stops hammering it. Don't trust a binary health bit alone.
- A recovered backend gets flooded → slow-start ramps its traffic up gradually instead of dumping the full share the instant it returns, so it doesn't immediately fall over again.
The interview corner
- "Which balancing algorithm?" Round-robin for uniform requests, least-connections when costs vary, consistent hashing for affinity/stickiness. Name the workload that picks each.
- "How do you avoid sending traffic to a dead server?" Active and passive health checks; eject failing backends, fail fast if none are healthy, readmit with slow-start.
- "L4 vs L7?" L4 forwards by IP/port (fast, dumb); L7 reads HTTP and routes by path/host/header (smart, costlier). Often layered, L4 → L7.
- "Isn't the LB a single point of failure?" Not if it's redundant — two+ LBs behind a floating VIP / anycast / DNS so one can die.
- "Sticky sessions without a session store?" Consistent hashing on a key → stable backend, with minimal reshuffling when the pool changes.
Where to go from here
- The consistent-hashing ring also shards the Amazon cart; nearly every HLD here sits behind a load balancer — see the rookie's guide to HLD for the method.
- For what runs behind the LB at read scale, the cache discipline is in Cricinfo.