Streams, Explained: Functional Programming That Reads Like English
A friendly tour of the Stream API and functional programming: lambdas, pipelines, laziness, collectors — then interview questions on frequency counts, grouping, duplicates and second-highest.
You've written this loop five hundred times: make an empty list, loop over another list, if something, add to the new list. Somewhere around the five-hundred-and-first time, a thought sneaks in — why am I describing HOW to walk a list again, instead of just saying WHAT I want?
That thought has a name — functional programming — and a concrete toolbox: the Stream API. By the end of this article you'll read and write pipelines fluently, know the two traps everyone falls into, and have the classic stream interview questions — frequency counts, grouping, duplicates, second-highest — solved and explained, ready to whiteboard.
Let's start nowhere near a computer
Picture a small bottling plant. Crates of mixed fruit arrive at one end of a conveyor belt. Along the belt stand workers, each with exactly one job: the first throws out bruised fruit, the second peels what's left, the third drops the peeled fruit into juice bottles.
Nobody on that belt "loops over the crate." Each worker knows one small skill, the fruit flows past, and the plant's entire logic is readable from the door: discard, peel, bottle. Want a different product? Swap one station — nobody else's job changes.
Now read this:
List<String> juices = fruits.stream()
.filter(f -> !f.bruised()) // station 1: throw out the bad ones
.map(Fruit::pressed) // station 2: transform each survivor
.toList(); // station 3: bottle the resultThat's the same plant. A stream pipeline is a conveyor belt for data: you hire the stations, the belt does the walking.
You already use this everywhere
- SQL.
SELECT name FROM users WHERE age > 30names what, never how — no loop in sight. Streams are SQL's mindset brought to your collections. - Spreadsheets. Filter, then a formula column, then a SUM at the bottom — that's filter → map → reduce, and your accountant does it daily.
- Data pipelines. Spark, Kafka Streams, pandas — entire careers are built on this one shape.
- Code review at your own company. A six-line pipeline that reads aloud beats twenty lines of index juggling, which is why reviewers keep asking for it.
First, the atom: a lambda is behavior in a variable
Streams only click once this clicks: -> makes a piece of behavior into a value you can pass around, exactly like passing data.
Predicate<String> isLong = word -> word.length() > 8; // a stored question
Function<String, Integer> length = String::length; // a stored transformation
isLong.test("multithreading"); // true
length.apply("streams"); // 7When you hand filter a lambda, you're handing a worker their one skill. That's the entire trick — the rest of the API is hiring decisions.
The anatomy of every pipeline
Every stream you will ever write has the same three parts:
- A source —
list.stream(), the crates arriving. - Intermediate operations —
filter,map,sorted,distinct… stations on the belt. Each returns a new stream, so they chain. - One terminal operation —
toList(),count(),collect(…),forEach(…). The bottling station; without it, nothing happens at all.
That last clause isn't poetry. Intermediate operations are lazy — they're wiring, not work:
fruits.stream()
.filter(f -> {
System.out.println("checking " + f); // never prints…
return !f.bruised();
}); // …because no terminal op askedThe belt only switches on when a terminal operation demands a result — and then each item rides the whole belt one at a time. Laziness is also why streams can short-circuit: findFirst() after a filter stops the belt at the first match instead of processing a million items for fun.
Reading an unfamiliar pipeline? Find the terminal operation first — it tells you what the whole expression produces. Then read the stations top-to-bottom as a story. Writers: one station per line, so the story has paragraphs.
The stations you'll actually use
| Station | Job | Sticky example |
|---|---|---|
filter(p) | keep items that pass | bouncer with a guest list |
map(f) | transform each item | peeler: fruit in, peeled fruit out |
flatMap(f) | each item explodes into many, belt flattens | open every crate, fruit flows loose |
distinct() | drop repeats | "seen it already" |
sorted(c) | order the belt | needs to see everything first |
reduce(op) | fold the belt to one value | snowball rolling downhill |
collect(c) | gather into list / map / groups | the warehouse at the end |
flatMap deserves its thirty seconds, because interviews use it as a litmus test. map turns each crate into one thing; flatMap turns each crate into many things and merges the streams:
List<List<Integer>> nested = List.of(List.of(1, 2), List.of(3), List.of(4, 5));
List<Integer> flat = nested.stream()
.flatMap(List::stream) // each inner list pours onto the main belt
.toList(); // [1, 2, 3, 4, 5]The two traps
Trap 1: side effects inside the belt. The moment a lambda mutates outside state, you've reinvented the lost-update race with nicer syntax. Stations inspect and transform what flows past; the terminal operation produces the result.
Trap 2: a stream is single-use. It's a belt run, not a container — run it twice and you get IllegalStateException: stream has already been operated upon or closed. Need two answers? Build two pipelines from the source collection.
And sometimes the honest answer is: write the loop. A plain for with an
early break and a mutable accumulator across iterations can read better than
a pipeline contorted to avoid one. Streams are a tool, not a religion — saying
that out loud in an interview is itself a senior signal.
The interview questions, solved properly
The setup interviewers reach for, nine times out of ten:
record Employee(String name, String dept, double salary) {}
List<Employee> staff = List.of(
new Employee("meera", "Engineering", 95_000),
new Employee("ravi", "Engineering", 72_000),
new Employee("asha", "Engineering", 88_000),
new Employee("dev", "Sales", 60_000),
new Employee("priya", "Sales", 64_000),
new Employee("kabir", "HR", 51_000));Q1 — "Count how often each word appears in a sentence."
The shape to memorize: groupingBy(key) builds the buckets, a downstream collector says what to do inside each bucket — here, just count:
String line = "the quick brown fox jumps over the lazy dog the fox";
Map<String, Long> freq = Arrays.stream(line.split(" "))
.collect(Collectors.groupingBy(word -> word, Collectors.counting()));
// {the=3, fox=2, quick=1, brown=1, …}Bonus beat: "characters instead of words?" — line.chars() gives an IntStream; box with mapToObj(c -> (char) c) and the collector line doesn't change.
Q2 — "Find the second-highest salary."
The classic. The pipeline reads exactly like the requirement, and distinct() is the step everyone forgets — two employees sharing the top salary breaks the naive version:
Optional<Double> secondHighest = staff.stream()
.map(Employee::salary)
.distinct() // ties would lie to you
.sorted(Comparator.reverseOrder())
.skip(1) // step over the highest
.findFirst(); // Optional[88000.0]Returning Optional is part of the answer, not a decoration — with one distinct salary in the company, "there is no second-highest" is a value, not an exception.
Q3 — "Group employees by department — then average salary per department."
The question is testing whether you know downstream collectors, so name the concept while writing it:
Map<String, List<Employee>> byDept = staff.stream()
.collect(Collectors.groupingBy(Employee::dept));
// {Engineering=[meera, ravi, asha], Sales=[dev, priya], HR=[kabir]}
Map<String, Double> avgByDept = staff.stream()
.collect(Collectors.groupingBy(Employee::dept,
Collectors.averagingDouble(Employee::salary)));
// {Engineering=85000.0, Sales=62000.0, HR=51000.0}Same buckets, different verb inside the bucket — counting(), averagingDouble(…), mapping(…), whatever the follow-up demands. One shape, infinite follow-ups survived.
Q4 — "Find the duplicates in a list."
Two worthy answers — offer both and say the trade-off. The purely functional one reuses Q1's shape, because a duplicate is just frequency > 1:
List<Integer> nums = List.of(4, 7, 2, 7, 9, 4, 4);
List<Integer> dups = nums.stream()
.collect(Collectors.groupingBy(n -> n, Collectors.counting()))
.entrySet().stream() // second belt, over the counts
.filter(e -> e.getValue() > 1)
.map(Map.Entry::getKey)
.toList(); // [4, 7]The short, clever one exploits Set.add returning false on repeats:
Set<Integer> seen = new HashSet<>();
List<Integer> dups2 = nums.stream()
.filter(n -> !seen.add(n)) // mutates `seen` — Trap 1!
.distinct()
.toList(); // [7, 4]It works, and interviewers like seeing it — if you point at the lambda and say "this has a side effect, so it must never meet .parallel()." Knowing exactly where the rule bends is worth more than never bending it.
Where to go from here
You can now read any pipeline like a sentence: source, stations, one terminal verb — lazy until asked, grouped by collectors. Three next stops:
Optional, properly — the type that makes "no result" explicit; learnmap/orElse/orElseThrowand your null-checks start to dissolve.parallel()— with respect. One method call fans the belt across cores; it shines on big, side-effect-free, CPU-heavy pipelines and quietly hurts everywhere else. (Now you know exactly what those threads get up to.)- Collectors, the deep end —
partitioningBy,mapping,teeing: the difference between writing pipelines and composing them.
Next code review, when a colleague's nested loop reads like assembly instructions for furniture, you'll see the conveyor belt they could have built — discard, peel, bottle — and you'll know how to hire the workers.