LLDintermediatePremium

Music Recognition LLD: How Shazam Turns a Noisy Hum into a Constellation of Hashes

A low-level design walkthrough of audio fingerprinting like Shazam: reducing a spectrogram to peak landmarks, hashing point pairs, and matching a noisy snippet by voting on a consistent offset.

By fiveyearsdevJune 13, 202612 min read

"Design Shazam." You hold your phone up in a loud café, ten seconds of a song you can't name fighting through clattering cups and someone's laugh — and a moment later, the title. That result should feel impossible, and the reasons it is hard are exactly what the interviewer is hunting for. The recording is noisy (the café is louder than the song). It's a fragment (ten seconds of a four-minute track, starting who-knows-where). And it's matched against a library of tens of millions of songs in under a second. A beginner reaches for "compare the audio." But you can't compare audio — your clip and…

What’s inside

Let's start nowhere near a computer

Where the bright-stars trick runs

Step 1 — Functional requirements (sentences first)

Step 2 — Non-functional requirements

Step 3 — Nouns: peaks, landmarks, and an inverted index

Step 4 — Landmarks: pairing stars into nearly-unique hashes

Step 5 — Matching: let the offsets vote

Step 6 — Trade-offs (each one keeping an NFR)

The complete implementation

The interview corner

Where to go from here

Read this one free

Sign in and your first premium article is on us — read Music Recognition LLD: How Shazam Turns a Noisy Hum into a Constellation of Hashes free.

LLD