Reddit Comments DB Design: Modelling a Threaded Tree That Scales
A database deep dive on modelling Reddit-style threaded comments in SQL: adjacency list vs materialized path vs closure table, fetching a subtree in one query, plus scores and sharding by post.
"Design the database for Reddit's comments." It's a deceptively deep question, because comments aren't a list — they're a tree of arbitrary depth, and trees are the one shape relational databases are worst at. A reply has a parent, which has a parent, which has a parent. You need to load a whole branch sorted by score, paginate "load more replies," show a live count, and let someone delete a comment without orphaning the fifty replies underneath it — all on a table that, for a viral thread, holds millions of rows.
So the whole question is one trade-off: how do you store a tree in flat rows so that reading a subtree is one cheap query, not a hundred? Get the model right and everything else is ordinary SQL.
Let's start nowhere near a computer
Picture a company org chart in a filing cabinet, one index card per person. You're asked: "pull every card under VP Jane."
- If each card lists only "my manager" — Bob's card says "Jane" — you start at Jane, find her direct reports, then their reports, walking the cabinet level by level. Correct, but it's a trip per layer.
- If each card instead lists its full chain of command — Bob's card reads "CEO › Jane › Bob" — then "everyone under Jane" is just every card whose chain starts with "CEO › Jane ›". One pass, one matching rule, the whole branch.
That second card is a materialized path, and it's the trick that makes a tree fast to read in a flat filing cabinet — or a flat database table. The rest of this article is choosing between that and its rivals, then making it scale.
Where this exact problem shows up
- Reddit, Hacker News, YouTube, Disqus — every nested comment section is this tree.
- Slack / Teams threads, email threading — shallower, but the same parent-child model.
- Category trees, org charts, file systems, BOMs — any "things nested inside things" schema is one of the four models below. (The in-memory file system is this tree without a database.)
- The Twitter timeline HLD — comments are the read-heavy, fan-out-on-read sibling of the feed problem.
Step 1 — The access patterns (functional requirements)
For a schema, "functional requirements" means the queries it must serve. Write those down first; the model is whatever makes them cheap.
- Post a comment — a reply to the post (top-level) or to another comment.
- Read a page of top-level comments for a post, sorted (best / top / new), paginated.
- Expand a comment's replies — fetch its subtree, sorted, paginated ("load more").
- Vote on a comment and show its score.
- Count the comments on a post.
- Delete a comment, leaving its replies readable (the
[deleted]tombstone).
The two that bite are "read a subtree to any depth" and "delete without orphaning." Everything below is in service of those.
Step 2 — Non-functional requirements
What separates a toy schema from Reddit's is how well it serves those queries under real load.
- Read-heavy. Comments are read far more than written (100:1 and up). Read latency is the boss; we'll pay at write time to make reads cheap.
- Write spikes. A viral thread takes a flood of new comments and votes at once — writes must stay cheap too, no whole-tree rewrites.
- Sort flexibility. The same subtree shown by best, new, or controversial — ordering is a first-class requirement, not an afterthought.
- Scale. One AMA can hold millions of comments; the model must shard, and the natural shard key is the post.
- Read-your-writes for the author. Counts and scores can lag (eventual consistency is fine), but a user must see their own new comment immediately.
Listing them is the easy half; the schema only earns them if it fulfills them. Here's the contract:
| Requirement | How this design fulfills it |
|---|---|
| Read-heavy / low latency | a materialized path turns "fetch a subtree" into one indexed range scan — Steps 3, 5 |
| Cheap writes / spikes | a new comment is one INSERT; no ancestor rows rewritten (unlike nested sets) — Step 6 |
| Sort flexibility | a denormalized score column + per-level ORDER BY; no joins to count votes — Step 6 |
| Scale | everything keys on post_id, so the table shards by post cleanly — Step 8 |
| Read-your-writes | the author's own insert is in their session's view; counts/scores reconcile async — Step 9 |
Every trade-off below is chosen to keep one of these.
Step 3 — Three ways to model a tree
This is the whole interview. There are four classic models; three are live options and one is a trap.
- Adjacency list — each row stores its
parent_id. Trivial to write, but reading a deep subtree means one query per level, or a recursive CTE that walks the tree at read time. Fine for shallow threads; it strains as depth grows. - Materialized path — each row stores the full chain of ancestor ids as a string (
/1/4/9/). A subtree is one prefix query at any depth. Writes stay a single insert. This is the sweet spot for read-heavy comment trees. - Closure table — a separate table holds one row per ancestor→descendant pair. Maximally flexible (move a subtree cheaply), but a comment at depth d writes d rows, and the table explodes on deep, wide threads. Great for org charts that get re-parented; overkill for append-only comments.
- Nested sets (the trap) — each node stores
left/rightnumbers bracketing its subtree. Reads are elegant, but inserting one comment renumbers half the table. For a high-write comment system it's disqualifying; name it in an interview only to reject it.
Which datastore — and why it isn't a default. Stay relational (Postgres), and not from habit. The comment tree is naturally per-post and bounded to one post's rows, the access is WHERE post_id = ? + a prefix or parent match, and votes want an atomic counter — all relational strengths. Postgres even ships the ltree type, a path index purpose-built for this. The tempting alternative — a document store that embeds the whole tree in one post document — works beautifully until a thread hits tens of thousands of comments: you blow past the document size limit, and every new comment rewrites the entire document, which murders the write-spike NFR. A graph database is the right shape but overkill for a strict tree keyed by one post. Relational rows + a materialized path is the answer that scales without exotic infrastructure.
Step 4 — The schema: materialized path
The table is flat. The path column is what makes it a tree — a string of ancestor ids with a leading and trailing separator (the trailing / matters, as you'll see).
CREATE TABLE comments (
id BIGINT PRIMARY KEY,
post_id BIGINT NOT NULL,
parent_id BIGINT, -- NULL = top-level
path TEXT NOT NULL, -- materialized path, e.g. '/1/4/9/'
depth INT NOT NULL, -- 1 = top-level
author_id BIGINT NOT NULL,
body TEXT NOT NULL,
score INT NOT NULL DEFAULT 0, -- denormalized: upvotes - downvotes
deleted BOOLEAN NOT NULL DEFAULT FALSE,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
-- the subtree query rides this: range-scan a path prefix within a post
CREATE INDEX idx_thread ON comments (post_id, path);
-- the top-level page rides this: roots of a post, best first
CREATE INDEX idx_toplevel ON comments (post_id, depth, score DESC);Two indexes, two query shapes. (post_id, path) makes a subtree a contiguous range; (post_id, depth, score DESC) makes the first page of roots a pre-sorted slice. Everything in Step 5 is one of these two reads.
Step 5 — The queries that matter (the read path)
A comment section loads in two motions: the page of top-level comments, then expand replies on demand. Both are single indexed reads.
SELECT id, author_id, body, score
FROM comments
WHERE post_id = 100 AND depth = 1 AND deleted = FALSE
ORDER BY score DESC
LIMIT 20;SELECT id, parent_id, path, body, score
FROM comments
WHERE post_id = 100 AND path LIKE '/1/%'
ORDER BY path; -- path order == tree (pre-order) traversalThat ORDER BY path is a quiet gift: sorting strings lexically yields a parent-before-children, depth-first walk — the exact order you render. And the trailing separator earns its keep here: LIKE '/1/%' matches /1/, /1/4/, /1/4/9/, /1/5/ — but not /10/, because the pattern demands the / after the 1. Drop the trailing slash from your paths and /1 silently swallows /10/, /11/, /123/ — the most common materialized-path bug there is.
If you'd rather not store a path — the adjacency-list model — the same subtree comes from a recursive CTE, walking parent links at read time:
WITH RECURSIVE subtree AS (
SELECT * FROM comments WHERE id = 1
UNION ALL
SELECT c.* FROM comments c
JOIN subtree s ON c.parent_id = s.id
)
SELECT id, body, score FROM subtree;It's correct and needs no path column, but it does the tree-walk on every read. The materialized path pays that cost once, at write time, by storing the answer — which is exactly the read-heavy trade you want.
Step 6 — Writes: post, vote, delete
A new comment is one insert; its path is the parent's path with its own id appended. No ancestor row is touched — that's what keeps writes cheap under a comment flood.
INSERT INTO comments (id, post_id, parent_id, path, depth, author_id, body)
VALUES (4, 100, 1, '/1/4/', 2, 7, 'reply to A');
-- path = parent.path || new_id || '/'; depth = parent.depth + 1A vote bumps the denormalized score so reads never join to a votes table to sort. (At Reddit scale the votes themselves go through a separate counter — buffered in Redis and folded in asynchronously — but the comment row carries the materialized total.)
UPDATE comments SET score = score + 1 WHERE id = 4;Delete is the subtle one. You can't remove the row — its replies would be orphaned and the subtree query would skip a generation. Instead, tombstone it: keep the row and its path, blank the content. The branch stays whole; the UI renders [deleted].
UPDATE comments SET deleted = TRUE, body = '[deleted]', author_id = 0
WHERE id = 1;
-- comment 1 still anchors '/1/4/', '/1/5/', '/1/4/9/' — nothing is orphanedThese six statements aren't illustrative sketches — the schema, the prefix
query (including the /10/ exclusion), the recursive CTE, the soft-delete,
and the score update were all run against a live SQLite database and produce
exactly the rows described.
Step 7 — Trade-offs (each one keeping an NFR)
The last column is the discipline: every choice keeps one of the promises from Step 2.
| Decision | The tempting alternative | Why ours wins | Keeps |
|---|---|---|---|
| materialized path | adjacency list + recursive CTE | subtree is one indexed range scan, not a per-read tree walk | read latency |
| store the tree in rows | embed the tree in one document | a huge thread blows the doc limit; every comment rewrites the whole | cheap writes |
denormalized score column | COUNT a votes table per sort | sorting needs no join; the hot read path stays a single scan | read latency |
| soft-delete (tombstone) | hard-delete the row | replies stay anchored; the subtree query never skips a generation | correctness |
leading and trailing / | bare ids in the path | the trailing / stops /1/ from matching /10/ | correctness |
shard by post_id | one global comments table | a thread's rows live together; no cross-shard subtree query | scale |
Step 8 — Scaling: denormalize, cache, shard by post
Start with one Postgres and add machinery only where load lands — and for comments, it lands on reads.
- Reads dominate first → put a cache in front: the rendered top-level page of a hot thread is the same for everyone, so cache it (and invalidate on new top-level comments). Add read replicas for the long tail of threads.
- Counts and scores are hot → never
COUNT(*)a viral thread on every page load. Keep a denormalized comment count per post and scores per comment, updated on write (or folded from a Redis counter). - The table grows past one box → shard by
post_id. Because every query is scoped to a post, a thread's entire tree lives on one shard — no cross-shard subtree scan, ever. That single fact is whypost_idis in every index. - One AMA is a hotspot → a single mega-thread can still overwhelm its shard; serve its hot pages from cache and accept that deep, cold branches are a slower, on-demand "load more."
Step 9 — When consistency slips: counts, deletes, races
A schema's "failure modes" are the ways its data goes wrong, not a server dying. Sort them the same way you'd sort component failures: an optimization that can drift, a truth that must hold, and a race to close.
- The denormalized count drifts (a crash between the insert and the counter bump). It's an optimization, not the truth — so degrade gracefully: show the slightly-stale number, and a periodic job recomputes it from the source of truth (the rows themselves). Never block a comment on its counter.
- A parent is deleted mid-render. The tombstone is the guard: the row and path survive, so the subtree query still returns a whole tree. The truth (the structure) never breaks; only the content blanks.
- Two votes race on the same comment.
UPDATE ... SET score = score + 1is atomic at the row, so concurrent votes don't clobber each other — but to stop one user voting twice, the real fix is aUNIQUE(comment_id, user_id)votes row; thescorecolumn is only the cached total. - A reply races its parent's deletion. The child insert still computes a valid path off the (now tombstoned) parent; because delete is a tombstone, not a row removal, the new reply is anchored correctly instead of dangling.
The interview corner
- "How do you fetch a deep thread in one query?" Materialized path +
WHERE path LIKE '/1/%' ORDER BY path. One indexed range scan, any depth, already in render order. - "Adjacency list vs. materialized path vs. closure table?" Adjacency = simplest write, recursive read. Path = one-query reads, simple writes — best for read-heavy comments. Closure = flexible re-parenting, write amplification. Nested sets = never, for high write.
- "Why the trailing slash in the path?" So
'/1/%'doesn't match/10/. The classic prefix-collision bug. - "How do you delete a comment with replies?" Soft-delete / tombstone — keep the row and path so the subtree stays whole; render
[deleted]. - "How do you sort by best?" A denormalized
scorecolumn so no join is needed; order siblings by it. Votes themselves are counted separately and folded in. - "How does it shard?" By
post_id— every query is post-scoped, so a thread's whole tree lives on one shard.
Where to go from here
- New to system design? The rookie's guide to HLD walks the method this article follows.
- The same tree, in memory and without a database, is the file system LLD — composite nodes instead of paths.
- For the read-vs-write fan-out decision that shapes the rest of a social app, see Twitter's timeline; for sending to that audience, the newsletter service.