MySQL supports the native JSON data type since version 5.7.8. The native JSON data type allows you to store JSON documents more efficiently than the JSON text format in the previous versions.
MySQL stores JSON documents in an internal format that allows quick read access to document elements. The JSON binary format is structured in the way that permits the server to search for values within the JSON document directly by key or array index, which is very fast.
The storage of a JSON document is approximately the same as the storage of LONGBLOB or LONGTEXT data.
CREATE TABLE events (
...
browser_info JSON,
...
);
Insert into json column
INSERT INTO events(browser_info)
VALUES (
'{ "name": "Safari", "os": "Mac", "resolution": { "x": 1920, "y": 1080 } }'
)
Automatic validation of JSON documents stored in JSON columns. Invalid documents produce an error.
Evaluates a (possibly empty) list of key-value pairs and returns a JSON object containing those pairs. An error occurs if any key name is NULL or the number of arguments is odd.

SELECT JSON_OBJECT('id',u.id,
'firstName',u.first_name,
'lastName',u.first_name) as jsonUser
FROM user as u;

Evaluates a (possibly empty) list of values and returns a JSON array containing those values.
SELECT JSON_ARRAY(u.id, u.first_name, u.first_name) as jsonUser FROM user as u;

Json Object Agg
Return result set as a single JSON object
Takes two column names or expressions as arguments, the first of these being used as a key and the second as a value, and returns a JSON object containing key-value pairs. Returns NULL if the result contains no rows, or in the event of an error. An error occurs if any key name is NULL or the number of arguments is not equal to 2.
SELECT JSON_OBJECTAGG(u.id, u.firstName, u.lastName) as jsonData
FROM user as u;
// output
{
"id": 1,
"firstName": "John",
"lastName": "Peter"
}
Return result set as a single JSON array
Aggregates a result set as a single JSON array whose elements consist of the rows. The order of elements in this array is undefined. The function acts on a column or an expression that evaluates to a single value. Returns NULL if the result contains no rows, or in the event of an error.
SELECT JSON_PRETTY(JSON_OBJECT('userId', u.id, 'cards', cardList)) as jsonData
FROM user as u
LEFT JOIN (SELECT c.user_id,
JSON_ARRAYAGG(
JSON_OBJECT(
'cardId', c.id,
'cardNumber', c.card_number)
) as cardList
FROM card as c
GROUP BY c.user_id) as cards ON u.id = cards.user_id;
{
"cards": [
{
"cardId": 4,
"cardNumber": "2440531"
},
{
"cardId": 11,
"cardNumber": "4061190"
}
],
"userId": 1
}

How to accomplish JSON_ARRAYAGG before version 5.7.8
SELECT JSON_PRETTY(JSON_OBJECT('userId', u.id, 'cards', cardList)) as jsonData
FROM user as u
LEFT JOIN (SELECT c.user_id,
CONCAT('[', GROUP_CONCAT(
JSON_OBJECT(
'cardId', c.id,
'cardNumber', c.card_number)
), ']') as cardList
FROM card as c
GROUP BY c.user_id) as cards ON u.id = cards.user_id;
Provides pretty-printing of JSON values similar to that implemented in PHP and by other languages and database systems. The value supplied must be a JSON value or a valid string representation of a JSON value.
SELECT JSON_PRETTY(JSON_OBJECT('id',u.id,
'firstName',u.first_name,
'lastName',u.first_name)) as jsonUser
FROM user as u;
json_extract(json_doc, path[, path] ...)
Returns data from a JSON document, selected from the parts of the document matched by the path arguments. Returns NULL if any argument is NULL or no paths locate a value in the document. An error occurs if the json_doc argument is not a valid JSON document or any path argument is not a valid path expression.

Imagine you need to give someone quick instructions. You could write a full manual with a title page, table of contents, and chapters — or you could just hand them a sticky note: “Sort these by price, lowest first.” A lambda expression is that sticky note. It is a concise way to represent a small piece of behavior — a function — without the ceremony of defining an entire class or method.
Introduced in Java 8, lambda expressions bring functional programming capabilities to Java. Before Java 8, every piece of behavior had to live inside a class. If you wanted to pass a comparator to a sort method, you had to create an anonymous inner class with boilerplate code. Lambdas eliminate that boilerplate.
Formally defined: A lambda expression is an anonymous function — a function with no name, no access modifier, and no return type declaration. It provides a clear and concise way to implement a single abstract method of a functional interface.
What lambdas give you:
Here is a before-and-after comparison to see the difference immediately:
import java.util.Arrays;
import java.util.Collections;
import java.util.Comparator;
import java.util.List;
public class LambdaBeforeAfter {
public static void main(String[] args) {
List names = Arrays.asList("Charlie", "Alice", "Bob");
// BEFORE Java 8: Anonymous inner class
Collections.sort(names, new Comparator() {
@Override
public int compare(String a, String b) {
return a.compareTo(b);
}
});
System.out.println("Sorted (anonymous class): " + names);
// Output: Sorted (anonymous class): [Alice, Bob, Charlie]
// AFTER Java 8: Lambda expression
List names2 = Arrays.asList("Charlie", "Alice", "Bob");
Collections.sort(names2, (a, b) -> a.compareTo(b));
System.out.println("Sorted (lambda): " + names2);
// Output: Sorted (lambda): [Alice, Bob, Charlie]
// EVEN SHORTER: Method reference
List names3 = Arrays.asList("Charlie", "Alice", "Bob");
names3.sort(String::compareTo);
System.out.println("Sorted (method reference): " + names3);
// Output: Sorted (method reference): [Alice, Bob, Charlie]
}
}
Five lines of anonymous class code reduced to a single expression. That is the power of lambdas.
The general syntax of a lambda expression is:
(parameters) -> expression
OR
(parameters) -> { statements; }
The arrow operator -> separates the parameter list from the body. The left side defines what goes in, the right side defines what comes out (or what happens).
Depending on the number of parameters and the complexity of the body, the syntax can be simplified in several ways:
| Variation | Syntax | Example |
|---|---|---|
| No parameters | () -> expression |
() -> System.out.println("Hello") |
| Single parameter (no parens needed) | param -> expression |
name -> name.toUpperCase() |
| Single parameter (with parens) | (param) -> expression |
(name) -> name.toUpperCase() |
| Multiple parameters | (p1, p2) -> expression |
(a, b) -> a + b |
| Expression body (implicit return) | (params) -> expression |
(x) -> x * x |
| Block body (explicit return) | (params) -> { return expr; } |
(x) -> { return x * x; } |
| Block body (void, no return) | (params) -> { statements; } |
(msg) -> { System.out.println(msg); } |
| Explicit parameter types | (Type p1, Type p2) -> expr |
(String a, String b) -> a.compareTo(b) |
import java.util.function.*;
public class LambdaSyntaxVariations {
public static void main(String[] args) {
// 1. No parameters
Runnable greet = () -> System.out.println("Hello, World!");
greet.run();
// Output: Hello, World!
// 2. Single parameter - parentheses optional
Consumer print = message -> System.out.println(message);
print.accept("Lambda with one param");
// Output: Lambda with one param
// 3. Single parameter - with parentheses (also valid)
Consumer print2 = (message) -> System.out.println(message);
print2.accept("Lambda with parens");
// Output: Lambda with parens
// 4. Multiple parameters
BinaryOperator add = (a, b) -> a + b;
System.out.println("Sum: " + add.apply(3, 7));
// Output: Sum: 10
// 5. Expression body - implicit return
Function square = x -> x * x;
System.out.println("Square of 5: " + square.apply(5));
// Output: Square of 5: 25
// 6. Block body - explicit return required
Function classify = x -> {
if (x > 0) {
return "Positive";
} else if (x < 0) {
return "Negative";
} else {
return "Zero";
}
};
System.out.println("10 is: " + classify.apply(10));
// Output: 10 is: Positive
// 7. Explicit types (usually unnecessary due to type inference)
BinaryOperator concat = (String a, String b) -> a + " " + b;
System.out.println(concat.apply("Hello", "Lambda"));
// Output: Hello Lambda
// 8. Multi-line block body with no return (void)
Consumer logger = (msg) -> {
String timestamp = java.time.LocalDateTime.now().toString();
System.out.println("[" + timestamp + "] " + msg);
};
logger.accept("Application started");
// Output: [2024-01-15T10:30:00.123] Application started
}
}
In most cases, the Java compiler can infer the parameter types from the context (the functional interface the lambda implements). You do not need to declare them explicitly.
The compiler looks at the target type — the functional interface type the lambda is being assigned to — and determines the parameter types from its single abstract method.
import java.util.Comparator;
import java.util.function.BiFunction;
public class TypeInference {
public static void main(String[] args) {
// The compiler knows this is Comparator, so a and b are String
Comparator comp1 = (a, b) -> a.compareTo(b);
// You CAN specify types explicitly -- sometimes useful for clarity
Comparator comp2 = (String a, String b) -> a.compareTo(b);
// IMPORTANT: You cannot mix -- either all types or no types
// Comparator comp3 = (String a, b) -> a.compareTo(b); // COMPILE ERROR
// Type inference works with generics too
BiFunction repeat = (text, times) -> text.repeat(times);
System.out.println(repeat.apply("Ha", 3));
// Output: HaHaHa
}
}
Lambdas do not exist in a vacuum. Every lambda expression in Java is an implementation of a functional interface. Understanding functional interfaces is essential to understanding lambdas.
A functional interface is an interface that has exactly one abstract method. It can have any number of default methods, static methods, and private methods — but only one abstract method. This single abstract method (SAM) is what the lambda implements.
Key rules:
default and static methodsObject (like toString(), equals()) do not count@FunctionalInterface annotation is optional but recommended — it causes a compile error if the interface has more than one abstract method// A functional interface - has exactly ONE abstract method
@FunctionalInterface
interface Greeting {
void greet(String name); // single abstract method
}
// Still a functional interface - default methods don't count
@FunctionalInterface
interface MathOperation {
double calculate(double a, double b); // single abstract method
default void printResult(double a, double b) {
System.out.println("Result: " + calculate(a, b));
}
}
// NOT a functional interface - has TWO abstract methods
// @FunctionalInterface // This would cause a compile error!
interface NotFunctional {
void methodOne();
void methodTwo();
}
// Still a functional interface - toString() comes from Object, doesn't count
@FunctionalInterface
interface Converter {
T convert(F from);
@Override
String toString(); // From Object -- does NOT count as abstract
}
You can create your own functional interfaces for domain-specific behavior. The @FunctionalInterface annotation tells the compiler (and other developers) that this interface is intended for lambda use.
@FunctionalInterface interface Validator{ boolean validate(T item); } @FunctionalInterface interface Transformer { R transform(T input); } @FunctionalInterface interface TriFunction { R apply(A a, B b, C c); } public class CustomFunctionalInterfaces { public static void main(String[] args) { // Using custom Validator Validator emailValidator = email -> email != null && email.contains("@") && email.contains("."); System.out.println("valid@email.com: " + emailValidator.validate("valid@email.com")); // Output: valid@email.com: true System.out.println("invalid: " + emailValidator.validate("invalid")); // Output: invalid: false // Using custom Transformer Transformer wordCounter = text -> text.split("\\s+").length; System.out.println("Word count: " + wordCounter.transform("Java lambdas are powerful")); // Output: Word count: 4 // Using custom TriFunction (Java doesn't provide one by default) TriFunction clamp = (value, min, max) -> Math.max(min, Math.min(max, value)); System.out.println("Clamp 15 to [0,10]: " + clamp.apply(15, 0, 10)); // Output: Clamp 15 to [0,10]: 10 System.out.println("Clamp 5 to [0,10]: " + clamp.apply(5, 0, 10)); // Output: Clamp 5 to [0,10]: 5 } }
Many interfaces that existed before Java 8 qualify as functional interfaces. The @FunctionalInterface annotation was added to them retroactively:
| Interface | Abstract Method | Package |
|---|---|---|
Runnable |
void run() |
java.lang |
Callable |
V call() |
java.util.concurrent |
Comparator |
int compare(T o1, T o2) |
java.util |
ActionListener |
void actionPerformed(ActionEvent e) |
java.awt.event |
This means you can use lambdas anywhere these interfaces are expected — no code changes needed on the caller side.
Java 8 introduced the java.util.function package with 43 functional interfaces. You do not need to memorize all of them. Most are specializations of four core interfaces. Master these four and the rest will follow naturally.
A Predicate takes one argument and returns a boolean. Use it for filtering, validation, and condition-checking.
| Method | Description |
|---|---|
boolean test(T t) |
The abstract method — evaluates the predicate on the given argument |
and(Predicate other) |
Logical AND — both predicates must be true |
or(Predicate other) |
Logical OR — at least one predicate must be true |
negate() |
Logical NOT — inverts the predicate |
Predicate.isEqual(target) |
Static method — creates predicate that tests equality to target |
import java.util.List;
import java.util.function.Predicate;
import java.util.stream.Collectors;
public class PredicateExamples {
public static void main(String[] args) {
// Basic predicate
Predicate isPositive = n -> n > 0;
System.out.println("5 is positive: " + isPositive.test(5)); // true
System.out.println("-3 is positive: " + isPositive.test(-3)); // false
// Composing predicates with and(), or(), negate()
Predicate isEven = n -> n % 2 == 0;
Predicate isPositiveAndEven = isPositive.and(isEven);
Predicate isPositiveOrEven = isPositive.or(isEven);
Predicate isNotPositive = isPositive.negate();
System.out.println("6 is positive AND even: " + isPositiveAndEven.test(6)); // true
System.out.println("3 is positive AND even: " + isPositiveAndEven.test(3)); // false
System.out.println("-4 is positive OR even: " + isPositiveOrEven.test(-4)); // true
System.out.println("-3 is NOT positive: " + isNotPositive.test(-3)); // true
// Practical example: filtering a list
List names = List.of("Alice", "Bob", "Charlie", "Dave", "Eve");
Predicate longerThan3 = name -> name.length() > 3;
Predicate startsWithC = name -> name.startsWith("C");
List filtered = names.stream()
.filter(longerThan3.and(startsWithC))
.collect(Collectors.toList());
System.out.println("Long names starting with C: " + filtered);
// Output: Long names starting with C: [Charlie]
// Predicate.isEqual() - useful for null-safe equality
Predicate isAlice = Predicate.isEqual("Alice");
System.out.println("Is Alice: " + isAlice.test("Alice")); // true
System.out.println("Is Alice: " + isAlice.test(null)); // false
}
}
A Function takes one argument of type T and returns a result of type R. Use it for transformations, conversions, and mappings.
| Method | Description |
|---|---|
R apply(T t) |
The abstract method — applies the function to the argument |
andThen(Function after) |
Compose: apply this function first, then apply after |
compose(Function before) |
Compose: apply before first, then apply this function |
Function.identity() |
Static method — returns a function that always returns its input |
import java.util.function.Function;
public class FunctionExamples {
public static void main(String[] args) {
// Basic function: String -> Integer
Function stringLength = s -> s.length();
System.out.println("Length of 'Lambda': " + stringLength.apply("Lambda"));
// Output: Length of 'Lambda': 6
// Function composition with andThen()
// Apply first function, then apply second to the result
Function toUpperCase = s -> s.toUpperCase();
Function addExclamation = s -> s + "!";
Function shout = toUpperCase.andThen(addExclamation);
System.out.println(shout.apply("hello"));
// Output: HELLO!
// Function composition with compose()
// Apply the argument function FIRST, then apply this function
Function multiplyBy2 = n -> n * 2;
Function add10 = n -> n + 10;
// compose: add10 runs first, then multiplyBy2
Function add10ThenDouble = multiplyBy2.compose(add10);
System.out.println("compose(5): " + add10ThenDouble.apply(5));
// Output: compose(5): 30 (5+10=15, 15*2=30)
// andThen: multiplyBy2 runs first, then add10
Function doubleThenAdd10 = multiplyBy2.andThen(add10);
System.out.println("andThen(5): " + doubleThenAdd10.apply(5));
// Output: andThen(5): 20 (5*2=10, 10+10=20)
// Function.identity() - returns input unchanged
Function identity = Function.identity();
System.out.println(identity.apply("unchanged"));
// Output: unchanged
// Practical: build a text processing pipeline
Function trim = String::trim;
Function lower = String::toLowerCase;
Function normalize = trim.andThen(lower).andThen(s -> s.replaceAll("\\s+", " "));
System.out.println("'" + normalize.apply(" Hello WORLD ") + "'");
// Output: 'hello world'
}
}
A Consumer takes one argument and returns nothing (void). Use it for actions, side effects, printing, logging, or saving data.
| Method | Description |
|---|---|
void accept(T t) |
The abstract method — performs the action on the argument |
andThen(Consumer after) |
Chain: perform this action, then perform after |
import java.util.List;
import java.util.function.Consumer;
public class ConsumerExamples {
public static void main(String[] args) {
// Basic consumer
Consumer print = s -> System.out.println(s);
print.accept("Hello from Consumer!");
// Output: Hello from Consumer!
// Chaining consumers with andThen()
Consumer toUpper = s -> System.out.println("Upper: " + s.toUpperCase());
Consumer toLower = s -> System.out.println("Lower: " + s.toLowerCase());
Consumer both = toUpper.andThen(toLower);
both.accept("Lambda");
// Output:
// Upper: LAMBDA
// Lower: lambda
// Practical: process a list of items
List emails = List.of("alice@example.com", "bob@example.com", "charlie@example.com");
Consumer validate = email -> {
if (!email.contains("@")) {
System.out.println("INVALID: " + email);
}
};
Consumer sendWelcome = email -> System.out.println("Sending welcome email to: " + email);
Consumer logAction = email -> System.out.println("Logged: processed " + email);
Consumer processEmail = validate.andThen(sendWelcome).andThen(logAction);
emails.forEach(processEmail);
// Output:
// Sending welcome email to: alice@example.com
// Logged: processed alice@example.com
// Sending welcome email to: bob@example.com
// Logged: processed bob@example.com
// Sending welcome email to: charlie@example.com
// Logged: processed charlie@example.com
}
}
A Supplier takes no arguments and returns a value. Use it for lazy evaluation, factory methods, and deferred computation.
| Method | Description |
|---|---|
T get() |
The abstract method — produces a result with no input |
import java.time.LocalDateTime;
import java.util.Random;
import java.util.function.Supplier;
public class SupplierExamples {
public static void main(String[] args) {
// Basic supplier
Supplier helloSupplier = () -> "Hello, World!";
System.out.println(helloSupplier.get());
// Output: Hello, World!
// Supplier for current timestamp
Supplier now = () -> LocalDateTime.now();
System.out.println("Current time: " + now.get());
// Output: Current time: 2024-01-15T10:30:00.123
// Supplier as a factory
Supplier randomFactory = () -> new Random();
Random r1 = randomFactory.get();
Random r2 = randomFactory.get();
System.out.println("Same instance? " + (r1 == r2)); // false -- new object each time
// Lazy evaluation -- the expensive computation only runs when needed
Supplier expensiveCalculation = () -> {
System.out.println(" ...performing expensive calculation...");
double result = 0;
for (int i = 0; i < 1000; i++) {
result += Math.sqrt(i);
}
return result;
};
boolean needResult = true;
if (needResult) {
System.out.println("Result: " + expensiveCalculation.get());
}
// Output:
// ...performing expensive calculation...
// Result: 21065.833...
// Supplier for default values
String name = null;
Supplier defaultName = () -> "Anonymous";
String displayName = (name != null) ? name : defaultName.get();
System.out.println("Name: " + displayName);
// Output: Name: Anonymous
}
}
UnaryOperator is a specialization of Function where the input and output types are the same. BinaryOperator is a specialization of BiFunction. These are convenience interfaces for operations that do not change the type.
import java.util.Arrays;
import java.util.List;
import java.util.function.BinaryOperator;
import java.util.function.UnaryOperator;
public class OperatorExamples {
public static void main(String[] args) {
// UnaryOperator: same input and output type
UnaryOperator toUpper = s -> s.toUpperCase();
System.out.println(toUpper.apply("lambda"));
// Output: LAMBDA
UnaryOperator doubleIt = n -> n * 2;
System.out.println(doubleIt.apply(7));
// Output: 14
// UnaryOperator with List.replaceAll()
List names = Arrays.asList("alice", "bob", "charlie");
names.replaceAll(String::toUpperCase);
System.out.println(names);
// Output: [ALICE, BOB, CHARLIE]
// BinaryOperator: two inputs of same type, same output type
BinaryOperator max = (a, b) -> a > b ? a : b;
System.out.println("Max of 5 and 9: " + max.apply(5, 9));
// Output: Max of 5 and 9: 9
BinaryOperator join = (a, b) -> a + ", " + b;
System.out.println(join.apply("Hello", "World"));
// Output: Hello, World
// BinaryOperator with reduce()
List numbers = List.of(1, 2, 3, 4, 5);
int sum = numbers.stream().reduce(0, Integer::sum);
System.out.println("Sum: " + sum);
// Output: Sum: 15
// BinaryOperator.minBy() and maxBy()
BinaryOperator longerString = BinaryOperator.maxBy(
(a, b) -> Integer.compare(a.length(), b.length())
);
System.out.println(longerString.apply("short", "much longer"));
// Output: much longer
}
}
Java provides “Bi” versions of Function, Predicate, and Consumer that accept two arguments instead of one.
import java.util.HashMap;
import java.util.Map;
import java.util.function.BiConsumer;
import java.util.function.BiFunction;
import java.util.function.BiPredicate;
public class BiFunctionExamples {
public static void main(String[] args) {
// BiFunction - takes two args, returns a result
BiFunction repeat = (text, times) -> text.repeat(times);
System.out.println(repeat.apply("Ha", 3));
// Output: HaHaHa
// BiPredicate - takes two args, returns boolean
BiPredicate isLongerThan = (str, length) -> str.length() > length;
System.out.println("'Lambda' longer than 3? " + isLongerThan.test("Lambda", 3));
// Output: 'Lambda' longer than 3? true
System.out.println("'Hi' longer than 3? " + isLongerThan.test("Hi", 3));
// Output: 'Hi' longer than 3? false
// BiConsumer - takes two args, returns nothing
BiConsumer printEntry = (key, value) ->
System.out.println(key + " = " + value);
// BiConsumer is especially useful with Map.forEach()
Map scores = new HashMap<>();
scores.put("Alice", 95);
scores.put("Bob", 87);
scores.put("Charlie", 92);
System.out.println("Scores:");
scores.forEach(printEntry);
// Output:
// Scores:
// Alice = 95
// Bob = 87
// Charlie = 92
// BiFunction with Map.replaceAll()
Map prices = new HashMap<>();
prices.put("Apple", 100);
prices.put("Banana", 50);
prices.put("Cherry", 200);
// Apply 10% discount to everything
prices.replaceAll((item, price) -> (int)(price * 0.9));
System.out.println("Discounted: " + prices);
// Output: Discounted: {Apple=90, Banana=45, Cherry=180}
}
}
Here is a summary of the most commonly used functional interfaces from java.util.function:
| Interface | Abstract Method | Input | Output | Use Case |
|---|---|---|---|---|
Predicate |
test(T) |
T | boolean | Filtering, validation |
BiPredicate |
test(T, U) |
T, U | boolean | Two-argument conditions |
Function |
apply(T) |
T | R | Transformation, mapping |
BiFunction |
apply(T, U) |
T, U | R | Two-argument transformation |
Consumer |
accept(T) |
T | void | Printing, logging, saving |
BiConsumer |
accept(T, U) |
T, U | void | Map.forEach(), two-arg actions |
Supplier |
get() |
none | T | Factories, lazy evaluation |
UnaryOperator |
apply(T) |
T | T | Same-type transformation |
BinaryOperator |
apply(T, T) |
T, T | T | Reduction, combining |
There are also primitive specializations like IntPredicate, LongFunction, DoubleSupplier, IntUnaryOperator, and others that avoid autoboxing overhead. Use them when working with primitive types in performance-sensitive code.
Java 8 added several methods to the Collection interfaces that accept functional interfaces — making lambdas a natural fit for everyday collection operations. These methods let you process data in place without creating streams.
import java.util.*;
public class LambdaWithCollections {
public static void main(String[] args) {
// ========== forEach() ==========
// Iterable.forEach(Consumer) - perform an action on each element
List fruits = Arrays.asList("Apple", "Banana", "Cherry", "Date");
System.out.println("--- forEach ---");
fruits.forEach(fruit -> System.out.println("Fruit: " + fruit));
// Output:
// Fruit: Apple
// Fruit: Banana
// Fruit: Cherry
// Fruit: Date
// forEach on a Map
Map ages = new LinkedHashMap<>();
ages.put("Alice", 30);
ages.put("Bob", 25);
ages.put("Charlie", 35);
System.out.println("\n--- Map forEach ---");
ages.forEach((name, age) -> System.out.println(name + " is " + age + " years old"));
// Output:
// Alice is 30 years old
// Bob is 25 years old
// Charlie is 35 years old
// ========== removeIf() ==========
// Collection.removeIf(Predicate) - remove elements that match condition
List numbers = new ArrayList<>(Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 10));
numbers.removeIf(n -> n % 2 == 0); // Remove all even numbers
System.out.println("\n--- removeIf (removed evens) ---");
System.out.println(numbers);
// Output: [1, 3, 5, 7, 9]
// ========== replaceAll() ==========
// List.replaceAll(UnaryOperator) - transform each element in place
List names = new ArrayList<>(Arrays.asList("alice", "bob", "charlie"));
names.replaceAll(name -> name.substring(0, 1).toUpperCase() + name.substring(1));
System.out.println("\n--- replaceAll (capitalized) ---");
System.out.println(names);
// Output: [Alice, Bob, Charlie]
// ========== sort() ==========
// List.sort(Comparator) - sort the list using a lambda comparator
List cities = new ArrayList<>(Arrays.asList("New York", "London", "Tokyo", "Paris", "Sydney"));
// Sort alphabetically
cities.sort((a, b) -> a.compareTo(b));
System.out.println("\n--- sort (alphabetical) ---");
System.out.println(cities);
// Output: [London, New York, Paris, Sydney, Tokyo]
// Sort by length
cities.sort((a, b) -> Integer.compare(a.length(), b.length()));
System.out.println("\n--- sort (by length) ---");
System.out.println(cities);
// Output: [Paris, Tokyo, London, Sydney, New York]
// Using Comparator helper methods (cleaner than raw lambda)
cities.sort(Comparator.comparingInt(String::length).reversed());
System.out.println("\n--- sort (by length, descending) ---");
System.out.println(cities);
// Output: [New York, London, Sydney, Paris, Tokyo]
// ========== Map.computeIfAbsent() ==========
// Compute a value only if the key is not already present
Map> groups = new HashMap<>();
groups.computeIfAbsent("fruits", k -> new ArrayList<>()).add("Apple");
groups.computeIfAbsent("fruits", k -> new ArrayList<>()).add("Banana");
groups.computeIfAbsent("veggies", k -> new ArrayList<>()).add("Carrot");
System.out.println("\n--- computeIfAbsent ---");
System.out.println(groups);
// Output: {veggies=[Carrot], fruits=[Apple, Banana]}
// ========== Map.merge() ==========
// Merge a new value with an existing value
Map wordCount = new HashMap<>();
String[] words = {"apple", "banana", "apple", "cherry", "banana", "apple"};
for (String word : words) {
wordCount.merge(word, 1, (oldVal, newVal) -> oldVal + newVal);
}
System.out.println("\n--- merge (word count) ---");
System.out.println(wordCount);
// Output: {banana=2, cherry=1, apple=3}
}
}
The Stream API is where lambdas truly shine. Streams provide a declarative pipeline for processing collections, and virtually every stream operation accepts a lambda expression. Here are the most common operations showing lambda syntax alongside method reference alternatives.
import java.util.*;
import java.util.stream.Collectors;
public class LambdaWithStreams {
public static void main(String[] args) {
List names = List.of("Alice", "Bob", "Charlie", "David", "Eve", "Alice");
// ========== filter() -- takes a Predicate ==========
// Lambda version
List longNames = names.stream()
.filter(name -> name.length() > 3)
.collect(Collectors.toList());
System.out.println("Filter (lambda): " + longNames);
// Output: Filter (lambda): [Alice, Charlie, David, Alice]
// ========== map() -- takes a Function ==========
// Lambda version
List nameLengths = names.stream()
.map(name -> name.length())
.collect(Collectors.toList());
System.out.println("Map (lambda): " + nameLengths);
// Output: Map (lambda): [5, 3, 7, 5, 3, 5]
// Method reference version
List upperNames = names.stream()
.map(String::toUpperCase)
.collect(Collectors.toList());
System.out.println("Map (method ref): " + upperNames);
// Output: Map (method ref): [ALICE, BOB, CHARLIE, DAVID, EVE, ALICE]
// ========== reduce() -- takes a BinaryOperator ==========
List numbers = List.of(1, 2, 3, 4, 5);
// Lambda version
int sum = numbers.stream()
.reduce(0, (a, b) -> a + b);
System.out.println("Reduce (lambda): " + sum);
// Output: Reduce (lambda): 15
// Method reference version
int sum2 = numbers.stream()
.reduce(0, Integer::sum);
System.out.println("Reduce (method ref): " + sum2);
// Output: Reduce (method ref): 15
// ========== collect() -- grouping with lambdas ==========
List allNames = List.of("Alice", "Anna", "Bob", "Bill", "Charlie", "Chris");
Map> grouped = allNames.stream()
.collect(Collectors.groupingBy(name -> name.charAt(0)));
System.out.println("Grouped: " + grouped);
// Output: Grouped: {A=[Alice, Anna], B=[Bob, Bill], C=[Charlie, Chris]}
// ========== sorted() -- takes a Comparator ==========
List sorted = allNames.stream()
.sorted((a, b) -> Integer.compare(a.length(), b.length()))
.collect(Collectors.toList());
System.out.println("Sorted by length: " + sorted);
// Output: Sorted by length: [Bob, Bill, Anna, Chris, Alice, Charlie]
// Comparator helper (cleaner)
List sorted2 = allNames.stream()
.sorted(Comparator.comparingInt(String::length).thenComparing(Comparator.naturalOrder()))
.collect(Collectors.toList());
System.out.println("Sorted by length then alpha: " + sorted2);
// Output: Sorted by length then alpha: [Bob, Anna, Bill, Alice, Chris, Charlie]
// ========== forEach() -- takes a Consumer ==========
System.out.println("forEach:");
names.stream()
.distinct()
.forEach(name -> System.out.println(" - " + name));
// Output:
// forEach:
// - Alice
// - Bob
// - Charlie
// - David
// - Eve
// ========== Combining multiple operations ==========
String result = names.stream()
.filter(name -> name.length() > 3) // Predicate
.map(String::toUpperCase) // Function (method ref)
.distinct() // Remove duplicates
.sorted() // Natural order
.collect(Collectors.joining(", ")); // Join into a string
System.out.println("Pipeline: " + result);
// Output: Pipeline: ALICE, CHARLIE, DAVID
}
}
A lambda expression can access variables from its enclosing scope — this is called variable capture. However, there are strict rules about which variables can be accessed and how.
A lambda can access a local variable from its enclosing scope only if that variable is effectively final — meaning its value is never modified after initialization. You do not need to explicitly declare it final, but you cannot change it.
import java.util.List;
import java.util.function.Consumer;
public class VariableCapture {
// Instance variable - CAN be modified in lambdas
private int instanceCounter = 0;
// Static variable - CAN be modified in lambdas
private static int staticCounter = 0;
public void demonstrate() {
// ===== Local variables must be effectively final =====
// This works -- prefix is effectively final (never reassigned)
String prefix = "Hello";
Consumer greeter = name -> System.out.println(prefix + ", " + name);
greeter.accept("Alice");
// Output: Hello, Alice
// This DOES NOT compile -- count is modified after the lambda captures it
// int count = 0;
// Runnable r = () -> System.out.println(count); // OK so far
// count = 1; // ERROR: Variable used in lambda must be effectively final
// This DOES NOT compile either -- you cannot modify a captured variable inside a lambda
// int total = 0;
// List.of(1, 2, 3).forEach(n -> total += n); // ERROR: Cannot modify local variable
// ===== Instance variables CAN be modified =====
List.of(1, 2, 3).forEach(n -> instanceCounter += n);
System.out.println("Instance counter: " + instanceCounter);
// Output: Instance counter: 6
// ===== Static variables CAN be modified =====
List.of(1, 2, 3).forEach(n -> staticCounter += n);
System.out.println("Static counter: " + staticCounter);
// Output: Static counter: 6
}
public static void main(String[] args) {
new VariableCapture().demonstrate();
}
}
The restriction exists because lambdas capture a copy of local variables, not a reference to them. Local variables live on the stack and disappear when the method returns, but the lambda might be executed later (e.g., in another thread). If the lambda modified its copy, changes would not reflect in the original — creating confusing bugs. Java prevents this at compile time.
Instance and static variables are different — they live on the heap and are accessed through references, so lambdas can read and modify them safely.
When you genuinely need to accumulate or modify a value inside a lambda, use one of these approaches:
import java.util.List;
import java.util.concurrent.atomic.AtomicInteger;
public class VariableCaptureWorkarounds {
public static void main(String[] args) {
List numbers = List.of(1, 2, 3, 4, 5);
// Workaround 1: AtomicInteger (preferred for thread-safe counting)
AtomicInteger atomicSum = new AtomicInteger(0);
numbers.forEach(n -> atomicSum.addAndGet(n));
System.out.println("AtomicInteger sum: " + atomicSum.get());
// Output: AtomicInteger sum: 15
// Workaround 2: Single-element array (the array reference is effectively final)
int[] arraySum = {0};
numbers.forEach(n -> arraySum[0] += n);
System.out.println("Array wrapper sum: " + arraySum[0]);
// Output: Array wrapper sum: 15
// Workaround 3: Use stream reduce() instead (BEST approach -- no side effects)
int streamSum = numbers.stream().reduce(0, Integer::sum);
System.out.println("Stream reduce sum: " + streamSum);
// Output: Stream reduce sum: 15
// Workaround 4: Mutable container
List results = new java.util.ArrayList<>();
numbers.forEach(n -> {
if (n % 2 == 0) {
results.add("Even: " + n);
}
});
System.out.println("Results: " + results);
// Output: Results: [Even: 2, Even: 4]
// BEST PRACTICE: Prefer stream operations over mutation
List betterResults = numbers.stream()
.filter(n -> n % 2 == 0)
.map(n -> "Even: " + n)
.collect(java.util.stream.Collectors.toList());
System.out.println("Better results: " + betterResults);
// Output: Better results: [Even: 2, Even: 4]
}
}
Before lambdas, anonymous inner classes were the primary way to pass behavior as an argument. Both achieve similar goals, but they differ in important ways.
import java.util.Arrays;
import java.util.Comparator;
import java.util.List;
public class LambdaVsAnonymousClass {
private String instanceField = "I'm an instance field";
public void compare() {
List names = Arrays.asList("Charlie", "Alice", "Bob");
// ========== Anonymous inner class ==========
names.sort(new Comparator() {
@Override
public int compare(String a, String b) {
// 'this' refers to the anonymous Comparator instance
System.out.println("this class: " + this.getClass().getSimpleName());
return a.compareTo(b);
}
});
System.out.println("Anonymous class sort: " + names);
// ========== Lambda expression ==========
List names2 = Arrays.asList("Charlie", "Alice", "Bob");
names2.sort((a, b) -> {
// 'this' refers to the enclosing LambdaVsAnonymousClass instance
System.out.println("this field: " + this.instanceField);
return a.compareTo(b);
});
System.out.println("Lambda sort: " + names2);
}
public static void main(String[] args) {
new LambdaVsAnonymousClass().compare();
// Output:
// this class:
// this class:
// Anonymous class sort: [Alice, Bob, Charlie]
// this field: I'm an instance field
// this field: I'm an instance field
// Lambda sort: [Alice, Bob, Charlie]
}
}
| Aspect | Anonymous Class | Lambda Expression |
|---|---|---|
| Syntax | Verbose — requires new Interface() { ... } |
Concise — (params) -> body |
this keyword |
Refers to the anonymous class instance | Refers to the enclosing class instance |
| Interface requirement | Can implement any interface (including multi-method) | Can only implement a functional interface (single abstract method) |
| State | Can have its own fields and state | Cannot have fields — stateless |
| Compilation | Generates a separate .class file (e.g., Outer$1.class) |
Uses invokedynamic — no extra class file |
| Performance | Slightly more overhead (class loading) | Slightly better (deferred binding with invokedynamic) |
| Readability | Harder to read for simple operations | Much cleaner for simple operations |
| Shadowing | Can shadow variables from enclosing scope | Cannot shadow — shares enclosing scope |
Use a lambda when:
this to refer to the implementation itselfUse an anonymous class when:
this to refer to the implementation instanceimport java.util.ArrayList;
import java.util.Collections;
import java.util.List;
public class MigrationExample {
public static void main(String[] args) {
List names = new ArrayList<>(List.of("Charlie", "Alice", "Bob", "David"));
// STEP 1: Original anonymous class
Collections.sort(names, new java.util.Comparator() {
@Override
public int compare(String a, String b) {
return a.compareToIgnoreCase(b);
}
});
System.out.println("Step 1 (anonymous): " + names);
// STEP 2: Replace with lambda
names = new ArrayList<>(List.of("Charlie", "Alice", "Bob", "David"));
Collections.sort(names, (a, b) -> a.compareToIgnoreCase(b));
System.out.println("Step 2 (lambda): " + names);
// STEP 3: Use List.sort() instead of Collections.sort()
names = new ArrayList<>(List.of("Charlie", "Alice", "Bob", "David"));
names.sort((a, b) -> a.compareToIgnoreCase(b));
System.out.println("Step 3 (List.sort): " + names);
// STEP 4: Use method reference
names = new ArrayList<>(List.of("Charlie", "Alice", "Bob", "David"));
names.sort(String::compareToIgnoreCase);
System.out.println("Step 4 (method ref): " + names);
// All output: [Alice, Bob, Charlie, David]
}
}
A method reference is a shorthand notation for a lambda expression that simply calls an existing method. If your lambda does nothing more than call a single method, a method reference is cleaner.
There are four types of method references:
| Type | Syntax | Lambda Equivalent | Example |
|---|---|---|---|
| Static method | Class::staticMethod |
(args) -> Class.staticMethod(args) |
Integer::parseInt |
| Instance method (bound) | object::instanceMethod |
(args) -> object.instanceMethod(args) |
System.out::println |
| Instance method (unbound) | Class::instanceMethod |
(obj, args) -> obj.instanceMethod(args) |
String::toUpperCase |
| Constructor | Class::new |
(args) -> new Class(args) |
ArrayList::new |
import java.util.Arrays;
import java.util.List;
import java.util.function.Function;
import java.util.function.Supplier;
import java.util.stream.Collectors;
public class MethodReferenceExamples {
public static void main(String[] args) {
List words = List.of("hello", "world", "java", "lambda");
// ========== 1. Static method reference ==========
// Lambda: s -> Integer.parseInt(s)
// Method ref: Integer::parseInt
List numberStrings = List.of("1", "2", "3", "4", "5");
List numbers = numberStrings.stream()
.map(Integer::parseInt) // static method reference
.collect(Collectors.toList());
System.out.println("Static: " + numbers);
// Output: Static: [1, 2, 3, 4, 5]
// ========== 2. Bound instance method reference ==========
// Lambda: s -> System.out.println(s)
// Method ref: System.out::println
System.out.println("Bound instance:");
words.forEach(System.out::println); // bound to System.out
// Output:
// hello
// world
// java
// lambda
// ========== 3. Unbound instance method reference ==========
// Lambda: s -> s.toUpperCase()
// Method ref: String::toUpperCase
List upper = words.stream()
.map(String::toUpperCase) // unbound -- called on each element
.collect(Collectors.toList());
System.out.println("Unbound: " + upper);
// Output: Unbound: [HELLO, WORLD, JAVA, LAMBDA]
// Unbound with two arguments (used in Comparator)
// Lambda: (a, b) -> a.compareToIgnoreCase(b)
// Method ref: String::compareToIgnoreCase
List sorted = Arrays.asList("banana", "Apple", "cherry");
sorted.sort(String::compareToIgnoreCase);
System.out.println("Sorted: " + sorted);
// Output: Sorted: [Apple, banana, cherry]
// ========== 4. Constructor reference ==========
// Lambda: () -> new ArrayList()
// Method ref: ArrayList::new
Supplier> listFactory = java.util.ArrayList::new;
List newList = listFactory.get();
newList.add("Created with constructor reference");
System.out.println("Constructor: " + newList);
// Output: Constructor: [Created with constructor reference]
// Constructor reference with parameters
Function sbFactory = StringBuilder::new;
StringBuilder sb = sbFactory.apply("Initial value");
System.out.println("StringBuilder: " + sb);
// Output: StringBuilder: Initial value
}
}
Rule of thumb: If your lambda is (x) -> someMethod(x) or (x) -> x.someMethod(), it can usually be replaced with a method reference. Use method references when they improve clarity; stick with lambdas when the reference would be confusing.
Lambdas are not just syntactic sugar — they enable cleaner implementations of well-known design patterns. Here are patterns you will use regularly.
Lambdas simplify callback-style programming. Instead of creating a class for every callback, pass behavior directly.
import java.util.ArrayList; import java.util.List; import java.util.function.Consumer; // A simple event system using lambdas as callbacks class EventEmitter{ private final List > listeners = new ArrayList<>(); public void on(Consumer listener) { listeners.add(listener); } public void emit(T event) { listeners.forEach(listener -> listener.accept(event)); } } public class EventHandlingPattern { public static void main(String[] args) { EventEmitter emitter = new EventEmitter<>(); // Register listeners using lambdas emitter.on(msg -> System.out.println("[LOG] " + msg)); emitter.on(msg -> System.out.println("[ALERT] " + msg.toUpperCase())); emitter.on(msg -> { if (msg.contains("error")) { System.out.println("[ERROR HANDLER] Escalating: " + msg); } }); emitter.emit("User logged in"); // Output: // [LOG] User logged in // [ALERT] USER LOGGED IN System.out.println(); emitter.emit("Database connection error"); // Output: // [LOG] Database connection error // [ALERT] DATABASE CONNECTION ERROR // [ERROR HANDLER] Escalating: Database connection error } }
The Strategy pattern defines a family of algorithms and makes them interchangeable. With lambdas, you no longer need a separate class for each strategy.
import java.util.function.BiFunction;
public class StrategyPattern {
// Before lambdas: separate classes for each strategy
interface DiscountStrategy {
double applyDiscount(double price, int quantity);
}
// With lambdas: strategies are just functions
public static void main(String[] args) {
// Define strategies as lambdas
BiFunction noDiscount =
(price, qty) -> price * qty;
BiFunction percentageDiscount =
(price, qty) -> price * qty * 0.9; // 10% off
BiFunction bulkDiscount =
(price, qty) -> qty >= 10 ? price * qty * 0.8 : price * qty; // 20% off for 10+
BiFunction buyOneGetOneFree =
(price, qty) -> price * (qty - qty / 2); // Every second item free
// Use the strategies
double price = 25.0;
System.out.println("No discount (5 items): $" + noDiscount.apply(price, 5));
// Output: No discount (5 items): $125.0
System.out.println("10% off (5 items): $" + percentageDiscount.apply(price, 5));
// Output: 10% off (5 items): $112.5
System.out.println("Bulk (15 items): $" + bulkDiscount.apply(price, 15));
// Output: Bulk (15 items): $300.0
System.out.println("BOGO (6 items): $" + buyOneGetOneFree.apply(price, 6));
// Output: BOGO (6 items): $75.0
}
}
The Decorator pattern wraps behavior around a function. With lambdas, you compose decorators by chaining Function instances.
import java.util.function.Function;
public class DecoratorPattern {
// A decorator that adds logging around any function
static Function withLogging(String name, Function fn) {
return input -> {
System.out.println(" [LOG] Calling " + name + " with: " + input);
R result = fn.apply(input);
System.out.println(" [LOG] " + name + " returned: " + result);
return result;
};
}
// A decorator that adds timing around any function
static Function withTiming(String name, Function fn) {
return input -> {
long start = System.nanoTime();
R result = fn.apply(input);
long elapsed = System.nanoTime() - start;
System.out.println(" [TIMING] " + name + " took " + elapsed / 1000 + " microseconds");
return result;
};
}
public static void main(String[] args) {
// Original function
Function reverseString = s ->
new StringBuilder(s).reverse().toString();
// Decorate with logging
Function loggedReverse = withLogging("reverse", reverseString);
// Decorate with logging AND timing
Function fullReverse = withTiming("reverse", withLogging("reverse", reverseString));
System.out.println("--- Logged only ---");
String result = loggedReverse.apply("Lambda");
System.out.println("Result: " + result);
// Output:
// [LOG] Calling reverse with: Lambda
// [LOG] reverse returned: adbmaL
// Result: adbmaL
System.out.println("\n--- Logged and timed ---");
result = fullReverse.apply("Decorator");
System.out.println("Result: " + result);
// Output:
// [TIMING] reverse took ... microseconds
// [LOG] Calling reverse with: Decorator
// [LOG] reverse returned: rotaroceD
// Result: rotaroceD
}
}
Lambdas enable lazy evaluation — deferring computation until the result is actually needed. This can save significant resources when a value might not be used.
import java.util.function.Supplier;
public class LazyEvaluation {
// Simulates an expensive computation
static String loadConfiguration() {
System.out.println(" Loading configuration from disk...");
try { Thread.sleep(100); } catch (InterruptedException e) {}
return "DB_URL=jdbc:mysql://localhost:3306/mydb";
}
// Without lazy evaluation: always computes the value
static void logEager(boolean isDebug, String message) {
if (isDebug) {
System.out.println("[DEBUG] " + message);
}
}
// With lazy evaluation: computes only if needed
static void logLazy(boolean isDebug, Supplier messageSupplier) {
if (isDebug) {
System.out.println("[DEBUG] " + messageSupplier.get());
}
}
public static void main(String[] args) {
boolean debugMode = false;
// EAGER: loadConfiguration() runs even though debugMode is false
System.out.println("--- Eager (debug=false) ---");
logEager(debugMode, "Config: " + loadConfiguration());
// Output:
// Loading configuration from disk...
// (the value was computed but never used!)
// LAZY: loadConfiguration() does NOT run because debugMode is false
System.out.println("\n--- Lazy (debug=false) ---");
logLazy(debugMode, () -> "Config: " + loadConfiguration());
// Output: (nothing -- the supplier was never called)
// LAZY with debug enabled
debugMode = true;
System.out.println("\n--- Lazy (debug=true) ---");
logLazy(debugMode, () -> "Config: " + loadConfiguration());
// Output:
// Loading configuration from disk...
// [DEBUG] Config: DB_URL=jdbc:mysql://localhost:3306/mydb
}
}
Even experienced developers make mistakes with lambdas. Here are the most common pitfalls and how to avoid them.
The built-in functional interfaces (Function, Consumer, Predicate, etc.) do not declare checked exceptions. If your lambda needs to throw a checked exception, it will not compile.
import java.util.List;
import java.util.function.Function;
public class CheckedExceptionMistake {
// This is a method that throws a checked exception
static String readFile(String path) throws java.io.IOException {
// Simulate reading a file
if (path.contains("missing")) {
throw new java.io.IOException("File not found: " + path);
}
return "Content of " + path;
}
// Custom functional interface that allows checked exceptions
@FunctionalInterface
interface ThrowingFunction {
R apply(T t) throws Exception;
}
// Wrapper method to convert a throwing function into a standard Function
static Function unchecked(ThrowingFunction fn) {
return t -> {
try {
return fn.apply(t);
} catch (Exception e) {
throw new RuntimeException(e);
}
};
}
public static void main(String[] args) {
List paths = List.of("file1.txt", "file2.txt");
// PROBLEM: This does NOT compile!
// paths.stream()
// .map(path -> readFile(path)) // ERROR: Unhandled IOException
// .forEach(System.out::println);
// SOLUTION 1: Wrap in try-catch inside the lambda
paths.stream()
.map(path -> {
try {
return readFile(path);
} catch (java.io.IOException e) {
throw new RuntimeException(e);
}
})
.forEach(System.out::println);
// Output:
// Content of file1.txt
// Content of file2.txt
// SOLUTION 2: Use a wrapper function (cleaner)
paths.stream()
.map(unchecked(CheckedExceptionMistake::readFile))
.forEach(System.out::println);
// Output:
// Content of file1.txt
// Content of file2.txt
}
}
Lambdas used in stream operations should be side-effect-free. Modifying external state from inside a stream pipeline leads to unpredictable behavior, especially with parallel streams.
import java.util.ArrayList;
import java.util.List;
import java.util.stream.Collectors;
public class SideEffectMistake {
public static void main(String[] args) {
List names = List.of("Alice", "Bob", "Charlie", "David");
// BAD: Modifying external list from inside map()
List results = new ArrayList<>();
names.stream()
.map(String::toUpperCase)
.forEach(name -> results.add(name)); // side effect!
System.out.println("Bad (side effect): " + results);
// This might work with sequential streams, but BREAKS with parallel streams
// GOOD: Use collect() to build the result
List betterResults = names.stream()
.map(String::toUpperCase)
.collect(Collectors.toList());
System.out.println("Good (collect): " + betterResults);
// Output: Good (collect): [ALICE, BOB, CHARLIE, DAVID]
// BAD: Accumulating a count with side effects
int[] count = {0};
names.stream().forEach(n -> count[0]++);
System.out.println("Bad count: " + count[0]); // works but fragile
// GOOD: Use count()
long goodCount = names.stream().count();
System.out.println("Good count: " + goodCount);
// Output: Good count: 4
}
}
If a lambda spans more than 3-4 lines, it is too complex. Extract it into a named method for readability, testability, and reuse.
import java.util.List;
import java.util.function.Predicate;
import java.util.stream.Collectors;
public class ComplexLambdaMistake {
// BAD: This lambda is too complex
static List filterBad(List emails) {
return emails.stream()
.filter(email -> {
if (email == null || email.isBlank()) return false;
if (!email.contains("@")) return false;
String[] parts = email.split("@");
if (parts.length != 2) return false;
String domain = parts[1];
if (!domain.contains(".")) return false;
if (domain.startsWith(".") || domain.endsWith(".")) return false;
return true;
})
.collect(Collectors.toList());
}
// GOOD: Extract the logic into a named method
static boolean isValidEmail(String email) {
if (email == null || email.isBlank()) return false;
if (!email.contains("@")) return false;
String[] parts = email.split("@");
if (parts.length != 2) return false;
String domain = parts[1];
if (!domain.contains(".")) return false;
return !domain.startsWith(".") && !domain.endsWith(".");
}
static List filterGood(List emails) {
return emails.stream()
.filter(ComplexLambdaMistake::isValidEmail) // Clean and readable
.collect(Collectors.toList());
}
public static void main(String[] args) {
List emails = List.of(
"alice@example.com",
"invalid",
"",
"bob@test.org",
"bad@.com",
"ok@domain.io"
);
System.out.println("Valid emails: " + filterGood(emails));
// Output: Valid emails: [alice@example.com, bob@test.org, ok@domain.io]
}
}
Lambdas can only be used where a functional interface is expected. You cannot use a lambda to implement an interface with multiple abstract methods, or assign a lambda to an Object variable without a cast.
public class FunctionalInterfaceRequirement {
// Interface with TWO abstract methods -- NOT functional
interface TwoMethods {
void methodA();
void methodB();
}
public static void main(String[] args) {
// ERROR: Cannot use lambda -- TwoMethods is not a functional interface
// TwoMethods t = () -> System.out.println("Hello"); // COMPILE ERROR
// ERROR: Cannot assign lambda to Object without cast
// Object obj = () -> System.out.println("Hello"); // COMPILE ERROR
// FIX: Cast to a specific functional interface
Object obj = (Runnable) () -> System.out.println("Hello");
((Runnable) obj).run();
// Output: Hello
// COMMON GOTCHA: Overloaded methods can cause ambiguity
// If a method accepts both Runnable and Callable, the compiler might not
// know which one a no-arg lambda should map to.
}
}
Lambdas are not serializable by default. If you need to serialize a lambda (e.g., for distributed computing frameworks), the target functional interface must extend Serializable.
import java.io.*;
import java.util.function.Predicate;
public class SerializationMistake {
// Regular functional interface -- NOT serializable
@FunctionalInterface
interface RegularPredicate {
boolean test(T t);
}
// Serializable functional interface
@FunctionalInterface
interface SerializablePredicate extends Predicate, Serializable {
}
public static void main(String[] args) {
// This lambda is NOT serializable
RegularPredicate notSerializable = s -> s.length() > 5;
// This lambda IS serializable
SerializablePredicate serializable = s -> s.length() > 5;
// Or use an intersection cast (less clean but avoids a custom interface)
Predicate alsoSerializable = (Predicate & Serializable) s -> s.length() > 5;
System.out.println("Test 'Lambda': " + serializable.test("Lambda"));
// Output: Test 'Lambda': true
}
}
Follow these guidelines to write lambdas that are clean, maintainable, and efficient.
| # | Practice | Do | Don’t |
|---|---|---|---|
| 1 | Keep lambdas short | 1-3 lines max | Write 10+ line lambdas |
| 2 | Use method references | String::toUpperCase |
s -> s.toUpperCase() when a reference is clearer |
| 3 | Avoid side effects | collect() to build results |
Mutate external state in forEach() |
| 4 | Use meaningful parameter names | (name, age) -> ... |
(a, b) -> ... when context is unclear |
| 5 | Extract complex lambdas | Move to a named private method | Inline a 10-line validation lambda |
| 6 | Prefer standard interfaces | Use Predicate, Function, Consumer |
Create custom interface when a standard one fits |
| 7 | Use @FunctionalInterface | Annotate your custom interfaces | Rely on convention alone |
| 8 | Handle exceptions explicitly | Wrapper methods for checked exceptions | Swallow exceptions in catch blocks |
| 9 | Consider readability | Use anonymous class if lambda is confusing | Force everything into a lambda |
| 10 | Leverage type inference | (a, b) -> a + b |
(Integer a, Integer b) -> a + b when types are obvious |
import java.util.*;
import java.util.function.*;
import java.util.stream.Collectors;
public class LambdaBestPractices {
// BEST PRACTICE: Extract complex logic into named methods
static boolean isEligibleForDiscount(Map customer) {
int age = (int) customer.get("age");
boolean isMember = (boolean) customer.get("member");
double totalSpent = (double) customer.get("totalSpent");
return (age >= 65 || isMember) && totalSpent > 100.0;
}
// BEST PRACTICE: Use standard functional interfaces with clear names
static List filterBy(List items, Predicate criteria) {
return items.stream()
.filter(criteria)
.collect(Collectors.toList());
}
// BEST PRACTICE: Compose small, focused predicates
public static void main(String[] args) {
List words = List.of("Lambda", "is", "a", "powerful", "feature", "in", "Java");
// GOOD: Small, focused predicates composed together
Predicate longerThan2 = word -> word.length() > 2;
Predicate startsWithLower = word -> Character.isLowerCase(word.charAt(0));
List result = words.stream()
.filter(longerThan2.and(startsWithLower))
.map(String::toUpperCase) // method reference (cleaner)
.sorted() // natural order
.collect(Collectors.toList());
System.out.println("Filtered: " + result);
// Output: Filtered: [FEATURE, POWERFUL]
// GOOD: Meaningful parameter names
Map> grouped = words.stream()
.collect(Collectors.groupingBy(word -> word.substring(0, 1).toUpperCase()));
System.out.println("Grouped: " + grouped);
// GOOD: Use Comparator helpers instead of raw lambdas
List sortedByLength = new ArrayList<>(words);
sortedByLength.sort(
Comparator.comparingInt(String::length)
.thenComparing(Comparator.naturalOrder())
);
System.out.println("Sorted: " + sortedByLength);
// Output: Sorted: [a, in, is, Java, Lambda, feature, powerful]
}
}
Let us put everything together with a real-world example. We will build a student records processing system that demonstrates lambdas for filtering, sorting, transforming, grouping, and reporting.
import java.util.*;
import java.util.function.*;
import java.util.stream.Collectors;
public class StudentDataProcessing {
// ========== Student record ==========
static class Student {
private final String name;
private final String major;
private final double gpa;
private final int age;
private final List courses;
Student(String name, String major, double gpa, int age, List courses) {
this.name = name;
this.major = major;
this.gpa = gpa;
this.age = age;
this.courses = courses;
}
public String getName() { return name; }
public String getMajor() { return major; }
public double getGpa() { return gpa; }
public int getAge() { return age; }
public List getCourses() { return courses; }
@Override
public String toString() {
return String.format("%s (Major: %s, GPA: %.1f, Age: %d)", name, major, gpa, age);
}
}
// ========== Custom functional interface for reporting ==========
@FunctionalInterface
interface ReportGenerator {
String generate(List data);
}
// ========== Utility: generic filter + transform pipeline ==========
static List pipeline(List data, Predicate filter, Function transform) {
return data.stream()
.filter(filter)
.map(transform)
.collect(Collectors.toList());
}
// ========== Main ==========
public static void main(String[] args) {
// Create sample data
List students = List.of(
new Student("Alice", "Computer Science", 3.8, 21, List.of("Java", "Algorithms", "Databases")),
new Student("Bob", "Mathematics", 3.2, 22, List.of("Calculus", "Statistics", "Algorithms")),
new Student("Charlie", "Computer Science", 3.5, 20, List.of("Java", "Networks", "AI")),
new Student("Diana", "Physics", 3.9, 23, List.of("Quantum", "Calculus", "Statistics")),
new Student("Eve", "Computer Science", 2.8, 21, List.of("Java", "Web Dev", "Databases")),
new Student("Frank", "Mathematics", 3.6, 22, List.of("Calculus", "Algorithms", "Statistics")),
new Student("Grace", "Physics", 3.1, 20, List.of("Quantum", "Mechanics", "Calculus")),
new Student("Hank", "Computer Science", 3.7, 23, List.of("Java", "AI", "Networks")),
new Student("Ivy", "Mathematics", 3.4, 21, List.of("Statistics", "Algebra", "Calculus")),
new Student("Jack", "Physics", 2.9, 22, List.of("Mechanics", "Quantum", "Statistics"))
);
System.out.println("=== STUDENT DATA PROCESSING SYSTEM ===\n");
// ===== 1. FILTERING with Predicate =====
System.out.println("--- 1. Honor Roll (GPA >= 3.5) ---");
Predicate isHonorRoll = student -> student.getGpa() >= 3.5;
students.stream()
.filter(isHonorRoll)
.forEach(s -> System.out.println(" " + s));
// Output:
// Alice (Major: Computer Science, GPA: 3.8, Age: 21)
// Charlie (Major: Computer Science, GPA: 3.5, Age: 20)
// Diana (Major: Physics, GPA: 3.9, Age: 23)
// Frank (Major: Mathematics, GPA: 3.6, Age: 22)
// Hank (Major: Computer Science, GPA: 3.7, Age: 23)
// ===== 2. COMPOSED PREDICATES =====
System.out.println("\n--- 2. CS students on Honor Roll ---");
Predicate isCS = s -> s.getMajor().equals("Computer Science");
Predicate csHonor = isCS.and(isHonorRoll);
students.stream()
.filter(csHonor)
.forEach(s -> System.out.println(" " + s));
// Output:
// Alice (Major: Computer Science, GPA: 3.8, Age: 21)
// Charlie (Major: Computer Science, GPA: 3.5, Age: 20)
// Hank (Major: Computer Science, GPA: 3.7, Age: 23)
// ===== 3. SORTING with Comparator lambdas =====
System.out.println("\n--- 3. All students sorted by GPA (descending) ---");
students.stream()
.sorted(Comparator.comparingDouble(Student::getGpa).reversed())
.forEach(s -> System.out.println(" " + s));
// Output:
// Diana (Major: Physics, GPA: 3.9, Age: 23)
// Alice (Major: Computer Science, GPA: 3.8, Age: 21)
// Hank (Major: Computer Science, GPA: 3.7, Age: 23)
// Frank (Major: Mathematics, GPA: 3.6, Age: 22)
// Charlie (Major: Computer Science, GPA: 3.5, Age: 20)
// Ivy (Major: Mathematics, GPA: 3.4, Age: 21)
// Bob (Major: Mathematics, GPA: 3.2, Age: 22)
// Grace (Major: Physics, GPA: 3.1, Age: 20)
// Jack (Major: Physics, GPA: 2.9, Age: 22)
// Eve (Major: Computer Science, GPA: 2.8, Age: 21)
// ===== 4. TRANSFORMATION with Function =====
System.out.println("\n--- 4. Student names in uppercase ---");
Function toNameUpper = s -> s.getName().toUpperCase();
List upperNames = students.stream()
.map(toNameUpper)
.collect(Collectors.toList());
System.out.println(" " + upperNames);
// Output: [ALICE, BOB, CHARLIE, DIANA, EVE, FRANK, GRACE, HANK, IVY, JACK]
// ===== 5. GROUPING with Collectors =====
System.out.println("\n--- 5. Students grouped by major ---");
Map> byMajor = students.stream()
.collect(Collectors.groupingBy(Student::getMajor));
byMajor.forEach((major, list) -> {
System.out.println(" " + major + ":");
list.forEach(s -> System.out.println(" - " + s.getName() + " (GPA: " + s.getGpa() + ")"));
});
// Output:
// Computer Science:
// - Alice (GPA: 3.8)
// - Charlie (GPA: 3.5)
// - Eve (GPA: 2.8)
// - Hank (GPA: 3.7)
// Mathematics:
// - Bob (GPA: 3.2)
// - Frank (GPA: 3.6)
// - Ivy (GPA: 3.4)
// Physics:
// - Diana (GPA: 3.9)
// - Grace (GPA: 3.1)
// - Jack (GPA: 2.9)
// ===== 6. STATISTICS with reduce and Collectors =====
System.out.println("\n--- 6. GPA Statistics by Major ---");
Map statsByMajor = students.stream()
.collect(Collectors.groupingBy(
Student::getMajor,
Collectors.summarizingDouble(Student::getGpa)
));
statsByMajor.forEach((major, stats) ->
System.out.printf(" %s: avg=%.2f, min=%.1f, max=%.1f%n",
major, stats.getAverage(), stats.getMin(), stats.getMax())
);
// Output:
// Computer Science: avg=3.45, min=2.8, max=3.8
// Mathematics: avg=3.40, min=3.2, max=3.6
// Physics: avg=3.30, min=2.9, max=3.9
// ===== 7. PIPELINE utility with Predicate + Function =====
System.out.println("\n--- 7. Pipeline: CS student names with high GPA ---");
List csHonorNames = pipeline(
students,
isCS.and(isHonorRoll), // composed Predicate
Student::getName // method reference as Function
);
System.out.println(" " + csHonorNames);
// Output: [Alice, Charlie, Hank]
// ===== 8. COURSE ANALYSIS with flatMap and lambdas =====
System.out.println("\n--- 8. Most popular courses ---");
Map courseCounts = students.stream()
.flatMap(s -> s.getCourses().stream())
.collect(Collectors.groupingBy(
course -> course, // grouping key
Collectors.counting() // count per group
));
courseCounts.entrySet().stream()
.sorted(Map.Entry.comparingByValue().reversed())
.forEach(entry -> System.out.println(" " + entry.getKey() + ": " + entry.getValue() + " students"));
// Output:
// Calculus: 4 students
// Java: 4 students
// Statistics: 4 students
// Quantum: 3 students
// Algorithms: 3 students
// ...
// ===== 9. CUSTOM REPORT with functional interface =====
System.out.println("\n--- 9. Custom Honor Roll Report ---");
ReportGenerator honorRollReport = data -> {
StringBuilder sb = new StringBuilder();
sb.append("Honor Roll Report\n");
sb.append("=================\n");
List honorStudents = data.stream()
.filter(isHonorRoll)
.sorted(Comparator.comparingDouble(Student::getGpa).reversed())
.collect(Collectors.toList());
sb.append(String.format("Total honor students: %d / %d%n", honorStudents.size(), data.size()));
sb.append(String.format("Percentage: %.0f%%%n%n",
(double) honorStudents.size() / data.size() * 100));
honorStudents.forEach(s ->
sb.append(String.format(" %-10s | %-20s | GPA: %.1f%n",
s.getName(), s.getMajor(), s.getGpa()))
);
return sb.toString();
};
System.out.println(honorRollReport.generate(students));
// ===== 10. CONSUMER chaining for notifications =====
System.out.println("--- 10. Student notifications ---");
Consumer emailNotification = s ->
System.out.println(" [EMAIL] Congratulations " + s.getName() + "! You made the honor roll.");
Consumer smsNotification = s ->
System.out.println(" [SMS] " + s.getName() + ", check your email for honor roll details.");
Consumer logNotification = s ->
System.out.println(" [LOG] Notification sent to " + s.getName());
Consumer notifyAll = emailNotification.andThen(smsNotification).andThen(logNotification);
students.stream()
.filter(csHonor)
.forEach(notifyAll);
// Output:
// [EMAIL] Congratulations Alice! You made the honor roll.
// [SMS] Alice, check your email for honor roll details.
// [LOG] Notification sent to Alice
// [EMAIL] Congratulations Charlie! You made the honor roll.
// [SMS] Charlie, check your email for honor roll details.
// [LOG] Notification sent to Charlie
// [EMAIL] Congratulations Hank! You made the honor roll.
// [SMS] Hank, check your email for honor roll details.
// [LOG] Notification sent to Hank
// ===== Summary =====
System.out.println("\n=== LAMBDA CONCEPTS DEMONSTRATED ===");
System.out.println("1. Predicate - filtering students by GPA");
System.out.println("2. Predicate.and() - combining CS + honor roll filters");
System.out.println("3. Comparator lambda - sorting by GPA descending");
System.out.println("4. Function - transforming student to name");
System.out.println("5. Collectors.groupingBy - grouping by major");
System.out.println("6. summarizingDouble - GPA statistics per major");
System.out.println("7. Pipeline utility - generic filter + transform method");
System.out.println("8. flatMap + lambda - course frequency analysis");
System.out.println("9. Custom @FunctionalInterface - report generation");
System.out.println("10. Consumer.andThen() - chained notification actions");
}
}
| Concept | Summary | Example |
|---|---|---|
| Lambda syntax | Parameters -> body | (a, b) -> a + b |
| Functional interface | Interface with one abstract method | @FunctionalInterface |
| Predicate | T -> boolean | n -> n > 0 |
| Function | T -> R | s -> s.length() |
| Consumer | T -> void | s -> System.out.println(s) |
| Supplier | () -> T | () -> new ArrayList<>() |
| UnaryOperator | T -> T | s -> s.toUpperCase() |
| BinaryOperator | (T, T) -> T | (a, b) -> a + b |
| Method reference | Shorthand for single-method lambda | String::toUpperCase |
| Effectively final | Local vars captured by lambdas cannot be modified | Use AtomicInteger or stream reduce() |
| this keyword | In lambdas, refers to enclosing class (not the lambda) | Unlike anonymous classes |
| Checked exceptions | Standard functional interfaces don’t allow checked exceptions | Use wrapper or custom interface |
Imagine an assembly line in a factory. Raw materials enter at one end, pass through a series of workstations — each performing a specific operation like cutting, painting, or inspecting — and a finished product comes out the other end. The assembly line does not store the materials; it processes them as they flow through.
The Java Stream API, introduced in Java 8, works exactly like that assembly line. A Stream is a sequence of elements that supports a pipeline of operations to process data declaratively — you describe what you want, not how to do it step by step.
Key characteristics of Streams:
Every Stream pipeline has three parts:
| Part | Description | Example |
|---|---|---|
| Source | Where the data comes from | list.stream(), Arrays.stream(arr) |
| Intermediate operations | Transform the stream (lazy, return a new Stream) | filter(), map(), sorted() |
| Terminal operation | Produces a result or side effect (triggers execution) | collect(), forEach(), count() |
import java.util.Arrays;
import java.util.List;
public class StreamIntro {
public static void main(String[] args) {
List names = Arrays.asList("Alice", "Bob", "Charlie", "David", "Eve");
// Stream pipeline: source -> intermediate ops -> terminal op
long count = names.stream() // Source: create stream from list
.filter(n -> n.length() > 3) // Intermediate: keep names longer than 3 chars
.map(String::toUpperCase) // Intermediate: convert to uppercase
.count(); // Terminal: count remaining elements
System.out.println("Count: " + count);
// Output: Count: 3
// The original list is unchanged
System.out.println("Original: " + names);
// Output: Original: [Alice, Bob, Charlie, David, Eve]
}
}
Before you can process data with the Stream API, you need to create a Stream. Java provides multiple ways to do this depending on your data source.
The most common way. Every class that implements Collection (List, Set, Queue) has a stream() method.
import java.util.*;
public class StreamFromCollections {
public static void main(String[] args) {
// From a List
List list = List.of("Java", "Python", "Go");
list.stream().forEach(System.out::println);
// From a Set
Set set = Set.of(1, 2, 3, 4, 5);
set.stream().filter(n -> n % 2 == 0).forEach(System.out::println);
// From a Map (via entrySet, keySet, or values)
Map map = Map.of("Alice", 90, "Bob", 85);
map.entrySet().stream()
.filter(e -> e.getValue() > 87)
.forEach(e -> System.out.println(e.getKey() + ": " + e.getValue()));
// Output: Alice: 90
}
}
import java.util.Arrays;
import java.util.stream.Stream;
public class StreamFromArrays {
public static void main(String[] args) {
String[] colors = {"Red", "Green", "Blue"};
// Using Arrays.stream()
Arrays.stream(colors).forEach(System.out::println);
// Partial array: from index 1 (inclusive) to 3 (exclusive)
Arrays.stream(colors, 1, 3).forEach(System.out::println);
// Output: Green, Blue
// Using Stream.of()
Stream.of("One", "Two", "Three").forEach(System.out::println);
// From a primitive array -- returns IntStream, not Stream
int[] numbers = {10, 20, 30};
int sum = Arrays.stream(numbers).sum();
System.out.println("Sum: " + sum); // Output: Sum: 60
}
}
import java.util.stream.Stream;
import java.util.stream.IntStream;
import java.util.List;
public class StreamFactoryMethods {
public static void main(String[] args) {
// Stream.empty() -- useful as a return value instead of null
Stream empty = Stream.empty();
System.out.println("Empty count: " + empty.count()); // Output: Empty count: 0
// Stream.of() -- create from individual elements
Stream languages = Stream.of("Java", "Python", "Go");
// Stream.generate() -- infinite stream from a Supplier
// MUST use limit() or it runs forever!
Stream.generate(Math::random)
.limit(3)
.forEach(n -> System.out.printf("%.2f%n", n));
// Stream.iterate() -- infinite stream with a seed and unary operator
// Java 8 style (no predicate -- must use limit)
Stream.iterate(1, n -> n * 2)
.limit(5)
.forEach(System.out::println);
// Output: 1, 2, 4, 8, 16
// Java 9+ style (with predicate -- like a for loop)
Stream.iterate(1, n -> n <= 100, n -> n * 2)
.forEach(System.out::println);
// Output: 1, 2, 4, 8, 16, 32, 64
// IntStream.range() and rangeClosed()
IntStream.range(1, 5).forEach(System.out::println); // 1, 2, 3, 4
IntStream.rangeClosed(1, 5).forEach(System.out::println); // 1, 2, 3, 4, 5
}
}
You can create a Stream of lines from a file using Files.lines(). This is memory-efficient because it reads lines lazily rather than loading the entire file into memory.
import java.nio.file.Files;
import java.nio.file.Paths;
import java.io.IOException;
import java.util.stream.Stream;
public class StreamFromFiles {
public static void main(String[] args) {
// Files.lines() returns a Stream -- one element per line
// Use try-with-resources because the stream must be closed
try (Stream lines = Files.lines(Paths.get("data.txt"))) {
lines.filter(line -> !line.isBlank())
.map(String::trim)
.forEach(System.out::println);
} catch (IOException e) {
System.err.println("Error reading file: " + e.getMessage());
}
}
}
| Method | Returns | Use Case |
|---|---|---|
collection.stream() |
Stream<T> |
Most common — stream from any Collection |
Arrays.stream(array) |
Stream<T> or IntStream |
Stream from an array |
Stream.of(a, b, c) |
Stream<T> |
Stream from individual values |
Stream.empty() |
Stream<T> |
Empty stream (null-safe return) |
Stream.generate(supplier) |
Stream<T> |
Infinite stream from a Supplier |
Stream.iterate(seed, op) |
Stream<T> |
Infinite stream with iterative computation |
IntStream.range(a, b) |
IntStream |
Range of ints [a, b) |
IntStream.rangeClosed(a, b) |
IntStream |
Range of ints [a, b] |
Files.lines(path) |
Stream<String> |
Lazy line-by-line file reading |
Intermediate operations transform a Stream into another Stream. They are lazy — nothing happens until a terminal operation triggers the pipeline. You can chain as many intermediate operations as you need.
filter(Predicate<T>) keeps only the elements that match the given condition. Think of it as a sieve — elements that pass the test go through; those that do not are discarded.
import java.util.List;
import java.util.stream.Collectors;
public class FilterExample {
public static void main(String[] args) {
List numbers = List.of(1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
// Keep only even numbers
List evens = numbers.stream()
.filter(n -> n % 2 == 0)
.collect(Collectors.toList());
System.out.println("Evens: " + evens);
// Output: Evens: [2, 4, 6, 8, 10]
// Chaining multiple filters (equivalent to && in the predicate)
List result = numbers.stream()
.filter(n -> n > 3)
.filter(n -> n < 8)
.collect(Collectors.toList());
System.out.println("Between 3 and 8: " + result);
// Output: Between 3 and 8: [4, 5, 6, 7]
// Filter with objects
List names = List.of("Alice", "Bob", "Charlie", "Ana", "Albert");
List aNames = names.stream()
.filter(name -> name.startsWith("A"))
.collect(Collectors.toList());
System.out.println("A-names: " + aNames);
// Output: A-names: [Alice, Ana, Albert]
}
}
map(Function<T, R>) transforms each element from type T to type R. It applies the given function to every element and produces a new Stream of the results. This is one of the most frequently used operations.
import java.util.List;
import java.util.stream.Collectors;
public class MapExample {
public static void main(String[] args) {
List names = List.of("alice", "bob", "charlie");
// Transform: String -> String (uppercase)
List upper = names.stream()
.map(String::toUpperCase)
.collect(Collectors.toList());
System.out.println(upper);
// Output: [ALICE, BOB, CHARLIE]
// Transform: String -> Integer (get length)
List lengths = names.stream()
.map(String::length)
.collect(Collectors.toList());
System.out.println(lengths);
// Output: [5, 3, 7]
// Transform: Integer -> String
List numbers = List.of(1, 2, 3);
List labels = numbers.stream()
.map(n -> "Item #" + n)
.collect(Collectors.toList());
System.out.println(labels);
// Output: [Item #1, Item #2, Item #3]
}
}
flatMap(Function<T, Stream<R>>) is used when each element maps to multiple elements (a stream of values). It “flattens” nested structures into a single stream. This is essential when you have lists of lists, or when a mapping function returns a collection for each element.
import java.util.List;
import java.util.stream.Collectors;
public class FlatMapExample {
public static void main(String[] args) {
// Problem: We have a list of lists and want a single flat list
List> nested = List.of(
List.of("Java", "Kotlin"),
List.of("Python", "Ruby"),
List.of("Go", "Rust")
);
// Using map() -- gives Stream>, NOT what we want
// Using flatMap() -- gives Stream, flattened!
List flat = nested.stream()
.flatMap(List::stream) // Each list becomes a stream, all merged
.collect(Collectors.toList());
System.out.println(flat);
// Output: [Java, Kotlin, Python, Ruby, Go, Rust]
// Real-world: extracting all words from sentences
List sentences = List.of("Hello World", "Java Streams are powerful");
List words = sentences.stream()
.flatMap(s -> List.of(s.split(" ")).stream())
.collect(Collectors.toList());
System.out.println(words);
// Output: [Hello, World, Java, Streams, are, powerful]
// Real-world: customers with multiple orders
// Each customer has a list of orders; we want all orders in one stream
// customer.stream().flatMap(c -> c.getOrders().stream())
}
}
sorted() sorts elements in natural order (for types implementing Comparable). You can also pass a custom Comparator for complex sorting.
import java.util.Comparator;
import java.util.List;
import java.util.stream.Collectors;
public class SortedExample {
public static void main(String[] args) {
// Natural order (ascending)
List numbers = List.of(5, 3, 8, 1, 9, 2);
List sorted = numbers.stream()
.sorted()
.collect(Collectors.toList());
System.out.println(sorted);
// Output: [1, 2, 3, 5, 8, 9]
// Reverse order
List descending = numbers.stream()
.sorted(Comparator.reverseOrder())
.collect(Collectors.toList());
System.out.println(descending);
// Output: [9, 8, 5, 3, 2, 1]
// Sorting strings by length
List names = List.of("Charlie", "Bob", "Alice", "Eve");
List byLength = names.stream()
.sorted(Comparator.comparingInt(String::length))
.collect(Collectors.toList());
System.out.println(byLength);
// Output: [Bob, Eve, Alice, Charlie]
// Sorting by length, then alphabetically for ties
List byLengthThenAlpha = names.stream()
.sorted(Comparator.comparingInt(String::length).thenComparing(Comparator.naturalOrder()))
.collect(Collectors.toList());
System.out.println(byLengthThenAlpha);
// Output: [Bob, Eve, Alice, Charlie]
}
}
distinct() removes duplicate elements from the stream. It relies on the equals() and hashCode() methods to determine equality. For custom objects, you must override these methods for distinct() to work correctly.
import java.util.List;
import java.util.stream.Collectors;
public class DistinctExample {
public static void main(String[] args) {
List numbers = List.of(1, 2, 3, 2, 4, 3, 5, 1);
List unique = numbers.stream()
.distinct()
.collect(Collectors.toList());
System.out.println(unique);
// Output: [1, 2, 3, 4, 5]
// With strings (equals/hashCode already implemented)
List words = List.of("hello", "world", "hello", "java", "world");
List uniqueWords = words.stream()
.distinct()
.collect(Collectors.toList());
System.out.println(uniqueWords);
// Output: [hello, world, java]
}
}
peek(Consumer<T>) allows you to perform a side effect on each element without modifying the stream. Its primary use is debugging — inspecting elements at a certain stage of the pipeline. Avoid using peek() for business logic; it may not execute if the pipeline is optimized away.
import java.util.List;
import java.util.stream.Collectors;
public class PeekExample {
public static void main(String[] args) {
List result = List.of("one", "two", "three", "four")
.stream()
.filter(s -> s.length() > 3)
.peek(s -> System.out.println("After filter: " + s))
.map(String::toUpperCase)
.peek(s -> System.out.println("After map: " + s))
.collect(Collectors.toList());
// Output:
// After filter: three
// After map: THREE
// After filter: four
// After map: FOUR
System.out.println("Result: " + result);
// Output: Result: [THREE, FOUR]
}
}
limit(n) truncates the stream to at most n elements. skip(n) discards the first n elements. Together, they form a powerful pagination pattern.
import java.util.List;
import java.util.stream.Collectors;
import java.util.stream.IntStream;
public class LimitSkipExample {
public static void main(String[] args) {
List numbers = List.of(1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
// First 3 elements
List firstThree = numbers.stream()
.limit(3)
.collect(Collectors.toList());
System.out.println("First 3: " + firstThree);
// Output: First 3: [1, 2, 3]
// Skip first 7 elements
List lastThree = numbers.stream()
.skip(7)
.collect(Collectors.toList());
System.out.println("Last 3: " + lastThree);
// Output: Last 3: [8, 9, 10]
// Pagination pattern: page 2, page size 3 (items 4, 5, 6)
int pageSize = 3;
int pageNumber = 2; // 1-based
List page = numbers.stream()
.skip((long) (pageNumber - 1) * pageSize)
.limit(pageSize)
.collect(Collectors.toList());
System.out.println("Page 2: " + page);
// Output: Page 2: [4, 5, 6]
}
}
These operations convert a Stream<T> to a primitive stream (IntStream, LongStream, DoubleStream). Primitive streams avoid autoboxing overhead and provide specialized methods like sum(), average(), and max().
import java.util.List;
public class MapToPrimitiveExample {
public static void main(String[] args) {
List names = List.of("Alice", "Bob", "Charlie", "David");
// mapToInt: get lengths as IntStream
int totalChars = names.stream()
.mapToInt(String::length)
.sum();
System.out.println("Total characters: " + totalChars);
// Output: Total characters: 20
// average returns OptionalDouble
names.stream()
.mapToInt(String::length)
.average()
.ifPresent(avg -> System.out.printf("Average length: %.1f%n", avg));
// Output: Average length: 5.0
// mapToDouble: useful for decimal calculations
List prices = List.of(100, 200, 300);
double totalWithTax = prices.stream()
.mapToDouble(p -> p * 1.08)
.sum();
System.out.printf("Total with tax: %.2f%n", totalWithTax);
// Output: Total with tax: 648.00
}
}
| Operation | Input | Output | Purpose |
|---|---|---|---|
filter(Predicate) |
Stream<T> |
Stream<T> |
Keep elements matching condition |
map(Function) |
Stream<T> |
Stream<R> |
Transform each element |
flatMap(Function) |
Stream<T> |
Stream<R> |
Flatten nested streams |
sorted() |
Stream<T> |
Stream<T> |
Sort elements |
distinct() |
Stream<T> |
Stream<T> |
Remove duplicates |
peek(Consumer) |
Stream<T> |
Stream<T> |
Debug / inspect |
limit(long) |
Stream<T> |
Stream<T> |
Truncate to n elements |
skip(long) |
Stream<T> |
Stream<T> |
Skip first n elements |
mapToInt(Function) |
Stream<T> |
IntStream |
Convert to primitive int stream |
Terminal operations are the final step of a stream pipeline. They trigger the execution of all intermediate operations and produce a result (a value, a collection, or a side effect). Once a terminal operation is called, the stream is consumed and cannot be reused.
forEach(Consumer<T>) performs an action on each element. It is the stream equivalent of a for-each loop. Note that forEach does not guarantee order when used with parallel streams. Use forEachOrdered() if order matters.
import java.util.List;
public class ForEachExample {
public static void main(String[] args) {
List names = List.of("Alice", "Bob", "Charlie");
// Simple forEach
names.stream().forEach(System.out::println);
// Output: Alice, Bob, Charlie
// forEach with lambda
names.stream().forEach(name -> System.out.println("Hello, " + name + "!"));
// Output:
// Hello, Alice!
// Hello, Bob!
// Hello, Charlie!
// Warning: forEach on parallel stream -- order NOT guaranteed
names.parallelStream().forEach(System.out::println);
// Output: order may vary!
// Use forEachOrdered to maintain encounter order
names.parallelStream().forEachOrdered(System.out::println);
// Output: Alice, Bob, Charlie (guaranteed order)
}
}
collect() is the most versatile terminal operation. It transforms the stream elements into a collection, string, or other summary result using a Collector. The Collectors utility class provides dozens of ready-made collectors.
import java.util.*;
import java.util.stream.Collectors;
public class CollectExample {
public static void main(String[] args) {
List names = List.of("Alice", "Bob", "Charlie", "Alice", "David");
// Collect to List
List list = names.stream()
.filter(n -> n.length() > 3)
.collect(Collectors.toList());
System.out.println("List: " + list);
// Output: List: [Alice, Charlie, Alice, David]
// Collect to Set (removes duplicates)
Set set = names.stream()
.collect(Collectors.toSet());
System.out.println("Set: " + set);
// Output: Set: [Bob, Alice, Charlie, David]
// Collect to unmodifiable List (Java 10+)
List immutable = names.stream()
.collect(Collectors.toUnmodifiableList());
// Collect to Map (name -> length)
Map nameToLength = names.stream()
.distinct()
.collect(Collectors.toMap(
name -> name, // key mapper
String::length // value mapper
));
System.out.println("Map: " + nameToLength);
// Output: Map: {Alice=5, Bob=3, Charlie=7, David=5}
// Joining strings
String joined = names.stream()
.distinct()
.collect(Collectors.joining(", "));
System.out.println("Joined: " + joined);
// Output: Joined: Alice, Bob, Charlie, David
// Joining with prefix and suffix
String formatted = names.stream()
.distinct()
.collect(Collectors.joining(", ", "[", "]"));
System.out.println("Formatted: " + formatted);
// Output: Formatted: [Alice, Bob, Charlie, David]
}
}
reduce() combines all elements of a stream into a single result by repeatedly applying a binary operation. It is the building block behind sum(), max(), and count() — those are all specialized reductions.
import java.util.List;
import java.util.Optional;
public class ReduceExample {
public static void main(String[] args) {
List numbers = List.of(1, 2, 3, 4, 5);
// With identity value: returns int (never empty)
int sum = numbers.stream()
.reduce(0, Integer::sum);
System.out.println("Sum: " + sum);
// Output: Sum: 15
// Without identity: returns Optional (might be empty)
Optional product = numbers.stream()
.reduce((a, b) -> a * b);
product.ifPresent(p -> System.out.println("Product: " + p));
// Output: Product: 120
// Finding the maximum
Optional max = numbers.stream()
.reduce(Integer::max);
System.out.println("Max: " + max.orElse(0));
// Output: Max: 5
// String concatenation with reduce
List words = List.of("Java", "Stream", "API");
String sentence = words.stream()
.reduce("", (a, b) -> a.isEmpty() ? b : a + " " + b);
System.out.println(sentence);
// Output: Java Stream API
// How reduce works step-by-step for sum:
// Step 1: identity(0) + 1 = 1
// Step 2: 1 + 2 = 3
// Step 3: 3 + 3 = 6
// Step 4: 6 + 4 = 10
// Step 5: 10 + 5 = 15
}
}
import java.util.List;
import java.util.Optional;
public class CountFindExample {
public static void main(String[] args) {
List names = List.of("Alice", "Bob", "Charlie", "David", "Eve");
// count() -- number of elements
long count = names.stream()
.filter(n -> n.length() > 3)
.count();
System.out.println("Names longer than 3: " + count);
// Output: Names longer than 3: 3
// findFirst() -- first element in encounter order, returns Optional
Optional first = names.stream()
.filter(n -> n.startsWith("C"))
.findFirst();
System.out.println("First C-name: " + first.orElse("none"));
// Output: First C-name: Charlie
// findAny() -- any matching element (useful in parallel streams)
Optional any = names.parallelStream()
.filter(n -> n.length() == 3)
.findAny();
System.out.println("Any 3-letter name: " + any.orElse("none"));
// Output: Any 3-letter name: Bob (or Eve in parallel)
}
}
These are short-circuiting terminal operations that return a boolean. They stop processing as soon as the answer is determined.
import java.util.List;
public class MatchExample {
public static void main(String[] args) {
List numbers = List.of(2, 4, 6, 8, 10);
// anyMatch: is there at least one element > 7?
boolean hasLarge = numbers.stream().anyMatch(n -> n > 7);
System.out.println("Any > 7? " + hasLarge);
// Output: Any > 7? true
// allMatch: are ALL elements even?
boolean allEven = numbers.stream().allMatch(n -> n % 2 == 0);
System.out.println("All even? " + allEven);
// Output: All even? true
// noneMatch: are there NO negative numbers?
boolean noNegatives = numbers.stream().noneMatch(n -> n < 0);
System.out.println("No negatives? " + noNegatives);
// Output: No negatives? true
}
}
import java.util.Comparator;
import java.util.List;
import java.util.Optional;
public class MinMaxToArrayExample {
public static void main(String[] args) {
List names = List.of("Charlie", "Bob", "Alice", "David");
// min -- requires a Comparator
Optional shortest = names.stream()
.min(Comparator.comparingInt(String::length));
System.out.println("Shortest: " + shortest.orElse("none"));
// Output: Shortest: Bob
// max
Optional longest = names.stream()
.max(Comparator.comparingInt(String::length));
System.out.println("Longest: " + longest.orElse("none"));
// Output: Longest: Charlie
// toArray -- convert stream to array
String[] nameArray = names.stream()
.filter(n -> n.length() > 3)
.toArray(String[]::new);
System.out.println("Array length: " + nameArray.length);
// Output: Array length: 3
}
}
| Operation | Return Type | Purpose |
|---|---|---|
forEach(Consumer) |
void |
Perform action on each element |
collect(Collector) |
R |
Accumulate into a collection or summary |
reduce(identity, BinaryOp) |
T |
Combine all elements into one value |
count() |
long |
Count elements |
findFirst() |
Optional<T> |
First element (encounter order) |
findAny() |
Optional<T> |
Any element (optimized for parallel) |
anyMatch(Predicate) |
boolean |
At least one matches? |
allMatch(Predicate) |
boolean |
All match? |
noneMatch(Predicate) |
boolean |
None match? |
min(Comparator) |
Optional<T> |
Minimum element |
max(Comparator) |
Optional<T> |
Maximum element |
toArray() |
Object[] or T[] |
Convert to array |
The Collectors class is the powerhouse of the Stream API. Beyond basic toList() and toSet(), it provides sophisticated collectors for grouping, partitioning, summarizing, and more. Mastering these collectors will dramatically improve the expressiveness of your code.
groupingBy() groups stream elements by a classification function, producing a Map<K, List<T>>. This is the stream equivalent of SQL's GROUP BY.
import java.util.*;
import java.util.stream.Collectors;
public class GroupingByExample {
public static void main(String[] args) {
List names = List.of("Alice", "Bob", "Charlie", "Anna", "Ben", "Chris");
// Group by first letter
Map> byFirstLetter = names.stream()
.collect(Collectors.groupingBy(name -> name.charAt(0)));
System.out.println(byFirstLetter);
// Output: {A=[Alice, Anna], B=[Bob, Ben], C=[Charlie, Chris]}
// Group by string length
Map> byLength = names.stream()
.collect(Collectors.groupingBy(String::length));
System.out.println(byLength);
// Output: {3=[Bob, Ben], 4=[Anna], 5=[Alice, Chris], 7=[Charlie]}
// groupingBy with downstream collector: count per group
Map countByLetter = names.stream()
.collect(Collectors.groupingBy(
name -> name.charAt(0),
Collectors.counting()
));
System.out.println(countByLetter);
// Output: {A=2, B=2, C=2}
// groupingBy with downstream collector: join names per group
Map joinedByLetter = names.stream()
.collect(Collectors.groupingBy(
name -> name.charAt(0),
Collectors.joining(", ")
));
System.out.println(joinedByLetter);
// Output: {A=Alice, Anna, B=Bob, Ben, C=Charlie, Chris}
// Multi-level grouping: group by length, then by first letter
Map>> multiLevel = names.stream()
.collect(Collectors.groupingBy(
String::length,
Collectors.groupingBy(name -> name.charAt(0))
));
System.out.println(multiLevel);
// Output: {3={B=[Bob, Ben]}, 4={A=[Anna]}, 5={A=[Alice], C=[Chris]}, 7={C=[Charlie]}}
}
}
partitioningBy() is a special case of groupingBy() that splits elements into exactly two groups based on a Predicate -- a true group and a false group. The result is always Map<Boolean, List<T>>.
import java.util.*;
import java.util.stream.Collectors;
public class PartitioningByExample {
public static void main(String[] args) {
List numbers = List.of(1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
// Partition into even and odd
Map> evenOdd = numbers.stream()
.collect(Collectors.partitioningBy(n -> n % 2 == 0));
System.out.println("Even: " + evenOdd.get(true));
System.out.println("Odd: " + evenOdd.get(false));
// Output:
// Even: [2, 4, 6, 8, 10]
// Odd: [1, 3, 5, 7, 9]
// Partition with downstream collector: count each group
Map counts = numbers.stream()
.collect(Collectors.partitioningBy(
n -> n > 5,
Collectors.counting()
));
System.out.println("Greater than 5: " + counts.get(true));
System.out.println("5 or less: " + counts.get(false));
// Output:
// Greater than 5: 5
// 5 or less: 5
}
}
When collecting to a Map, duplicate keys cause an IllegalStateException. You must provide a merge function to handle collisions.
import java.util.*;
import java.util.stream.Collectors;
public class ToMapMergeExample {
public static void main(String[] args) {
List words = List.of("hello", "world", "hello", "java", "world");
// Problem: duplicate keys without merge function throws exception
// Solution: provide a merge function
Map wordCount = words.stream()
.collect(Collectors.toMap(
word -> word, // key: the word itself
word -> 1, // value: count of 1
Integer::sum // merge: add counts for duplicate keys
));
System.out.println(wordCount);
// Output: {hello=2, world=2, java=1}
// Collecting to a specific Map implementation (LinkedHashMap preserves order)
Map orderedCount = words.stream()
.collect(Collectors.toMap(
word -> word,
word -> 1,
Integer::sum,
LinkedHashMap::new // supplier for the Map type
));
System.out.println(orderedCount);
// Output: {hello=2, world=2, java=1}
}
}
summarizingInt(), summarizingLong(), and summarizingDouble() collect comprehensive statistics in a single pass -- count, sum, min, max, and average.
import java.util.*;
import java.util.stream.Collectors;
public class SummarizingExample {
public static void main(String[] args) {
List names = List.of("Alice", "Bob", "Charlie", "David", "Eve");
IntSummaryStatistics stats = names.stream()
.collect(Collectors.summarizingInt(String::length));
System.out.println("Count: " + stats.getCount()); // 5
System.out.println("Sum: " + stats.getSum()); // 24
System.out.println("Min: " + stats.getMin()); // 3
System.out.println("Max: " + stats.getMax()); // 7
System.out.printf("Average: %.1f%n", stats.getAverage()); // 4.8
}
}
Parallel streams split the data into multiple chunks and process them simultaneously on different threads using the ForkJoinPool. This can significantly speed up processing of large datasets on multi-core machines -- but parallelism is not free and can hurt performance when used incorrectly.
import java.util.List;
import java.util.stream.IntStream;
public class ParallelStreamExample {
public static void main(String[] args) {
List numbers = List.of(1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
// Method 1: parallelStream() from collection
long sum1 = numbers.parallelStream()
.mapToLong(Integer::longValue)
.sum();
// Method 2: .parallel() on an existing stream
long sum2 = numbers.stream()
.parallel()
.mapToLong(Integer::longValue)
.sum();
System.out.println("Sum1: " + sum1 + ", Sum2: " + sum2);
// Output: Sum1: 55, Sum2: 55
// Demonstrating parallel execution with thread names
System.out.println("--- Sequential ---");
IntStream.range(1, 5).forEach(i ->
System.out.println(i + " on " + Thread.currentThread().getName()));
System.out.println("--- Parallel ---");
IntStream.range(1, 5).parallel().forEach(i ->
System.out.println(i + " on " + Thread.currentThread().getName()));
// Parallel output shows different thread names (ForkJoinPool.commonPool-worker-*)
}
}
| Use Parallel When | Avoid Parallel When |
|---|---|
| Large datasets (100,000+ elements) | Small datasets (overhead > benefit) |
| CPU-intensive operations per element | I/O-bound operations (network, file) |
| Operations are independent (no shared state) | Operations depend on encounter order |
| Source is easy to split (ArrayList, arrays) | Source is hard to split (LinkedList, Stream.iterate) |
| Stateless intermediate operations | Stateful operations (sorted, distinct, limit) |
Common mistake: Using parallel streams with shared mutable state. This leads to race conditions and incorrect results.
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.stream.Collectors;
import java.util.stream.IntStream;
public class ParallelStreamDanger {
public static void main(String[] args) {
// WRONG: modifying a shared list from a parallel stream
List unsafeList = new ArrayList<>();
IntStream.range(0, 1000)
.parallel()
.forEach(unsafeList::add); // Race condition!
System.out.println("Unsafe size: " + unsafeList.size());
// Output: might be less than 1000 or throw ArrayIndexOutOfBoundsException!
// RIGHT: use collect() instead
List safeList = IntStream.range(0, 1000)
.parallel()
.boxed()
.collect(Collectors.toList());
System.out.println("Safe size: " + safeList.size());
// Output: Safe size: 1000
}
}
Many terminal stream operations return an Optional -- a container that may or may not hold a value. This forces you to handle the "no result" case explicitly, preventing NullPointerException.
import java.util.List;
import java.util.Optional;
import java.util.stream.Stream;
public class OptionalWithStreams {
public static void main(String[] args) {
List names = List.of("Alice", "Bob", "Charlie");
// findFirst returns Optional
Optional first = names.stream()
.filter(n -> n.startsWith("Z"))
.findFirst();
// Handle the Optional
String result = first.orElse("No match found");
System.out.println(result);
// Output: No match found
// ifPresent -- only execute if a value exists
names.stream()
.filter(n -> n.startsWith("C"))
.findFirst()
.ifPresent(name -> System.out.println("Found: " + name));
// Output: Found: Charlie
// map on Optional -- transform the value if present
Optional length = names.stream()
.filter(n -> n.startsWith("A"))
.findFirst()
.map(String::length);
System.out.println("Length: " + length.orElse(0));
// Output: Length: 5
// orElseThrow -- throw exception if empty
// names.stream().filter(n -> n.startsWith("Z")).findFirst()
// .orElseThrow(() -> new IllegalArgumentException("No Z names"));
// Java 9+: Optional.stream() -- converts Optional to a 0-or-1 element stream
// Useful for flatMapping a stream of Optionals
List> optionals = List.of(
Optional.of("Hello"),
Optional.empty(),
Optional.of("World")
);
List values = optionals.stream()
.flatMap(Optional::stream)
.toList();
System.out.println(values);
// Output: [Hello, World]
}
}
Java provides three specialized stream types for primitives: IntStream, LongStream, and DoubleStream. These avoid the overhead of autoboxing (converting int to Integer and back) and provide specialized methods like sum(), average(), and summaryStatistics().
import java.util.OptionalDouble;
import java.util.OptionalInt;
import java.util.stream.IntStream;
import java.util.stream.DoubleStream;
import java.util.List;
public class PrimitiveStreamExample {
public static void main(String[] args) {
// IntStream creation
IntStream range = IntStream.rangeClosed(1, 10);
// sum, average, min, max
int sum = IntStream.rangeClosed(1, 10).sum();
System.out.println("Sum 1-10: " + sum);
// Output: Sum 1-10: 55
OptionalDouble avg = IntStream.of(85, 90, 78, 92, 88).average();
System.out.println("Average: " + avg.orElse(0));
// Output: Average: 86.6
OptionalInt max = IntStream.of(85, 90, 78, 92, 88).max();
System.out.println("Max: " + max.orElse(0));
// Output: Max: 92
// summaryStatistics() -- all stats in one pass
var stats = IntStream.of(85, 90, 78, 92, 88).summaryStatistics();
System.out.println("Count: " + stats.getCount());
System.out.println("Sum: " + stats.getSum());
System.out.println("Min: " + stats.getMin());
System.out.println("Max: " + stats.getMax());
System.out.printf("Avg: %.1f%n", stats.getAverage());
// boxed() -- convert IntStream to Stream
List boxedList = IntStream.rangeClosed(1, 5)
.boxed()
.toList();
System.out.println("Boxed: " + boxedList);
// Output: Boxed: [1, 2, 3, 4, 5]
// mapToObj -- convert each int to an object
List labels = IntStream.rangeClosed(1, 3)
.mapToObj(i -> "Item " + i)
.toList();
System.out.println("Labels: " + labels);
// Output: Labels: [Item 1, Item 2, Item 3]
// Converting between Stream and primitive streams
List names = List.of("Alice", "Bob", "Charlie");
IntStream lengths = names.stream().mapToInt(String::length);
System.out.println("Total chars: " + lengths.sum());
// Output: Total chars: 15
}
}
This section demonstrates practical, real-world patterns you will use repeatedly in production code. These patterns solve common data-processing problems elegantly with streams.
import java.util.*;
import java.util.stream.Collectors;
public class CommonPatterns {
record Product(String name, String category, double price) {}
public static void main(String[] args) {
List products = List.of(
new Product("Laptop", "Electronics", 999.99),
new Product("Headphones", "Electronics", 79.99),
new Product("Coffee Maker", "Kitchen", 49.99),
new Product("Blender", "Kitchen", 39.99),
new Product("Monitor", "Electronics", 349.99),
new Product("Toaster", "Kitchen", 29.99)
);
// Filter by category and sort by price
List electronics = products.stream()
.filter(p -> p.category().equals("Electronics"))
.sorted(Comparator.comparingDouble(Product::price))
.collect(Collectors.toList());
electronics.forEach(p -> System.out.println(p.name() + " $" + p.price()));
// Output:
// Headphones $79.99
// Monitor $349.99
// Laptop $999.99
// Find the top 2 most expensive products
List topTwo = products.stream()
.sorted(Comparator.comparingDouble(Product::price).reversed())
.limit(2)
.map(Product::name)
.collect(Collectors.toList());
System.out.println("Top 2: " + topTwo);
// Output: Top 2: [Laptop, Monitor]
// Group by category and calculate average price per category
Map avgByCategory = products.stream()
.collect(Collectors.groupingBy(
Product::category,
Collectors.averagingDouble(Product::price)
));
avgByCategory.forEach((cat, avg) ->
System.out.printf("%s avg: $%.2f%n", cat, avg));
// Output:
// Electronics avg: $476.66
// Kitchen avg: $39.99
// Create a comma-separated string of product names
String productList = products.stream()
.map(Product::name)
.collect(Collectors.joining(", "));
System.out.println("Products: " + productList);
// Output: Products: Laptop, Headphones, Coffee Maker, Blender, Monitor, Toaster
// Convert to a Map: name -> price
Map priceMap = products.stream()
.collect(Collectors.toMap(Product::name, Product::price));
System.out.println("Laptop price: $" + priceMap.get("Laptop"));
// Output: Laptop price: $999.99
}
}
import java.util.*;
import java.util.stream.Collectors;
public class FlatteningPattern {
record Student(String name, List courses) {}
public static void main(String[] args) {
List students = List.of(
new Student("Alice", List.of("Math", "Physics", "CS")),
new Student("Bob", List.of("CS", "English", "Math")),
new Student("Charlie", List.of("Biology", "Chemistry"))
);
// Get all unique courses offered
Set allCourses = students.stream()
.flatMap(s -> s.courses().stream())
.collect(Collectors.toSet());
System.out.println("All courses: " + allCourses);
// Output: All courses: [Biology, CS, Chemistry, English, Math, Physics]
// Find students taking "CS"
List csStudents = students.stream()
.filter(s -> s.courses().contains("CS"))
.map(Student::name)
.collect(Collectors.toList());
System.out.println("CS students: " + csStudents);
// Output: CS students: [Alice, Bob]
}
}
Streams are not always better than loops, and loops are not always better than streams. Understanding when to use each is a sign of a mature Java developer.
| Criteria | Stream | Traditional Loop |
|---|---|---|
| Readability | Excellent for data transformations (filter, map, collect) | Better for simple iterations with side effects |
| Debugging | Harder -- stack traces are less clear, peek() helps | Easier -- set breakpoints, inspect variables |
| Performance | Slight overhead for small datasets; parallel() helps with large | Generally faster for simple operations on small data |
| Mutability | Encourages immutability (functional style) | Naturally works with mutable state |
| Short-circuiting | Built-in (findFirst, anyMatch, limit) | Manual (break, return) |
| Parallelism | Trivial -- just call parallel() | Complex -- manual thread management |
| State management | Stateless operations preferred | Stateful iteration is natural |
import java.util.*;
import java.util.stream.Collectors;
public class StreamVsLoop {
public static void main(String[] args) {
List names = List.of("Alice", "Bob", "Charlie", "David", "Eve", "Frank");
// Task: Get uppercase names that are longer than 3 characters
// --- Loop approach ---
List resultLoop = new ArrayList<>();
for (String name : names) {
if (name.length() > 3) {
resultLoop.add(name.toUpperCase());
}
}
System.out.println("Loop: " + resultLoop);
// --- Stream approach ---
List resultStream = names.stream()
.filter(n -> n.length() > 3)
.map(String::toUpperCase)
.collect(Collectors.toList());
System.out.println("Stream: " + resultStream);
// Both output: [ALICE, CHARLIE, DAVID, FRANK]
// Stream is more readable here -- the intent is clear at a glance
}
}
Rule of thumb: Use streams for data transformation pipelines (filter, map, collect, group). Use loops when you need to maintain complex local state, perform multiple related side effects, or when the logic is inherently imperative (like building a graph or managing indices).
These are mistakes that even experienced developers make when working with the Stream API. Understanding them will save you hours of debugging.
A stream can only be consumed once. Attempting to reuse it throws an IllegalStateException.
import java.util.List;
import java.util.stream.Stream;
public class ReuseStreamMistake {
public static void main(String[] args) {
List names = List.of("Alice", "Bob", "Charlie");
Stream stream = names.stream().filter(n -> n.length() > 3);
// First use -- works fine
long count = stream.count();
System.out.println("Count: " + count);
// Second use -- THROWS IllegalStateException!
// stream.forEach(System.out::println);
// java.lang.IllegalStateException: stream has already been operated upon or closed
// Fix: create a new stream each time
long count2 = names.stream().filter(n -> n.length() > 3).count();
names.stream().filter(n -> n.length() > 3).forEach(System.out::println);
}
}
Intermediate operations like map() and filter() should be stateless and free of side effects. Modifying external state from these operations leads to unpredictable behavior, especially with parallel streams.
import java.util.ArrayList;
import java.util.List;
import java.util.stream.Collectors;
public class SideEffectMistake {
public static void main(String[] args) {
List names = List.of("Alice", "Bob", "Charlie");
// WRONG: modifying external state inside map()
List sideEffectList = new ArrayList<>();
names.stream()
.map(n -> {
sideEffectList.add(n); // Side effect! Don't do this.
return n.toUpperCase();
})
.collect(Collectors.toList());
// RIGHT: use collect() to gather results
List upper = names.stream()
.map(String::toUpperCase)
.collect(Collectors.toList());
}
}
import java.util.stream.Stream;
public class InfiniteStreamMistake {
public static void main(String[] args) {
// WRONG: this runs forever and causes OutOfMemoryError
// Stream.generate(Math::random).forEach(System.out::println);
// RIGHT: always use limit() with generate() or iterate()
Stream.generate(Math::random)
.limit(5)
.forEach(n -> System.out.printf("%.2f%n", n));
// Or use the Java 9+ iterate with a predicate
Stream.iterate(1, n -> n <= 100, n -> n * 2)
.forEach(System.out::println);
}
}
import java.util.ArrayList;
import java.util.List;
import java.util.stream.Collectors;
public class ModifySourceMistake {
public static void main(String[] args) {
List names = new ArrayList<>(List.of("Alice", "Bob", "Charlie"));
// WRONG: modifying the source while streaming -- ConcurrentModificationException!
// names.stream()
// .filter(n -> n.startsWith("A"))
// .forEach(n -> names.remove(n));
// RIGHT: collect results, then modify
List toRemove = names.stream()
.filter(n -> n.startsWith("A"))
.collect(Collectors.toList());
names.removeAll(toRemove);
System.out.println(names);
// Output: [Bob, Charlie]
// Or use removeIf() which is simpler
// names.removeIf(n -> n.startsWith("A"));
}
}
import java.util.List;
import java.util.stream.Collectors;
public class PerformanceTrapMistake {
public static void main(String[] args) {
List numbers = List.of(1, 2, 3, 4, 5);
// SLOW: unnecessary boxing -- Stream instead of IntStream
int sum1 = numbers.stream()
.map(n -> n * 2) // boxes/unboxes Integer repeatedly
.reduce(0, Integer::sum);
// FAST: use primitive stream
int sum2 = numbers.stream()
.mapToInt(n -> n * 2) // works with primitive int
.sum();
// WASTEFUL: sorting the entire stream just to find the max
// numbers.stream().sorted(Comparator.reverseOrder()).findFirst();
// EFFICIENT: use max() directly
// numbers.stream().max(Comparator.naturalOrder());
System.out.println("Sum: " + sum2);
// Output: Sum: 30
}
}
| Mistake | Symptom | Fix |
|---|---|---|
| Reusing a consumed stream | IllegalStateException |
Create a new stream each time |
| Side effects in map/filter | Unpredictable results in parallel | Use collect() for results, keep lambdas pure |
| Infinite stream without limit | Program hangs or OutOfMemoryError |
Always use limit() with generate()/iterate() |
| Modifying source during stream | ConcurrentModificationException |
Collect first, then modify; or use removeIf() |
| Unnecessary boxing | Poor performance | Use mapToInt()/mapToLong()/mapToDouble() |
| Sorting just to get min/max | O(n log n) instead of O(n) | Use min()/max() directly |
Following these best practices will help you write stream code that is clean, efficient, and maintainable.
Each lambda in a stream pipeline should do one thing. If your lambda is more than 2-3 lines, extract it into a named method.
import java.util.List;
import java.util.stream.Collectors;
public class BestPractices {
// POOR: complex inline lambda
// list.stream().filter(e -> e.getAge() > 18 && e.getSalary() > 50000
// && e.getDepartment().equals("Engineering")).collect(Collectors.toList());
// BETTER: extract to a method
static boolean isSeniorEngineer(Employee e) {
return e.age > 18
&& e.salary > 50000
&& e.department.equals("Engineering");
}
record Employee(String name, int age, double salary, String department) {}
public static void main(String[] args) {
List employees = List.of(
new Employee("Alice", 30, 85000, "Engineering"),
new Employee("Bob", 25, 45000, "Marketing")
);
List seniors = employees.stream()
.filter(BestPractices::isSeniorEngineer)
.collect(Collectors.toList());
}
}
Method references are more concise and communicate intent better than equivalent lambdas.
import java.util.List;
import java.util.stream.Collectors;
public class MethodReferencePractice {
public static void main(String[] args) {
List names = List.of("alice", "bob", "charlie");
// Lambda (works but verbose)
List upper1 = names.stream()
.map(s -> s.toUpperCase())
.collect(Collectors.toList());
// Method reference (cleaner)
List upper2 = names.stream()
.map(String::toUpperCase)
.collect(Collectors.toList());
// More examples:
// s -> System.out.println(s) -> System.out::println
// s -> s.length() -> String::length
// s -> Integer.parseInt(s) -> Integer::parseInt
// () -> new ArrayList<>() -> ArrayList::new
}
}
Each operation in a stream pipeline should be on its own line, with consistent indentation. This makes the pipeline easy to read and modify.
import java.util.List;
import java.util.stream.Collectors;
public class FormattingPractice {
record Employee(String name, String dept, double salary) {}
public static void main(String[] args) {
List employees = List.of(
new Employee("Alice", "Engineering", 95000),
new Employee("Bob", "Marketing", 65000),
new Employee("Charlie", "Engineering", 85000)
);
// POOR: all on one line
// List result = employees.stream().filter(e -> e.dept().equals("Engineering")).map(Employee::name).sorted().collect(Collectors.toList());
// GOOD: one operation per line, aligned at the dot
List result = employees.stream()
.filter(e -> e.dept().equals("Engineering"))
.map(Employee::name)
.sorted()
.collect(Collectors.toList());
System.out.println(result);
// Output: [Alice, Charlie]
}
}
| # | Practice | Why |
|---|---|---|
| 1 | Keep lambdas short; extract complex logic to named methods | Readability, testability, reusability |
| 2 | Use method references (String::toUpperCase) |
Cleaner, more concise |
| 3 | Avoid side effects in intermediate operations | Predictable behavior, safe parallelism |
| 4 | Use primitive streams for numbers (mapToInt) |
Avoids autoboxing overhead |
| 5 | Do not over-use streams; simple loops are fine | Not everything benefits from streams |
| 6 | Format one operation per line | Readability, easy to add/remove steps |
| 7 | Prefer collect() over forEach() + mutation |
Thread-safe, functional style |
| 8 | Use Optional results properly (orElse, ifPresent) |
Avoid NullPointerException |
| 9 | Use parallel streams only when justified | Parallelism has overhead; profile first |
| 10 | Favor toList() (Java 16+) over collect(Collectors.toList()) |
Shorter and returns unmodifiable list |
Let us tie everything together with a real-world example. We will build an Employee analytics system that uses streams to answer common business questions: filtering by department, calculating salary statistics, grouping, partitioning, finding top performers, and generating a report.
import java.util.*;
import java.util.stream.Collectors;
public class EmployeeAnalytics {
// --- Employee class ---
static class Employee {
private final String name;
private final String department;
private final double salary;
private final int yearsOfExperience;
public Employee(String name, String department, double salary, int yearsOfExperience) {
this.name = name;
this.department = department;
this.salary = salary;
this.yearsOfExperience = yearsOfExperience;
}
public String getName() { return name; }
public String getDepartment() { return department; }
public double getSalary() { return salary; }
public int getYearsOfExperience() { return yearsOfExperience; }
public boolean isSenior() { return yearsOfExperience >= 5; }
@Override
public String toString() {
return String.format("%-12s | %-12s | $%,10.2f | %2d yrs",
name, department, salary, yearsOfExperience);
}
}
public static void main(String[] args) {
// --- Sample data ---
List employees = List.of(
new Employee("Alice", "Engineering", 120000, 8),
new Employee("Bob", "Engineering", 95000, 3),
new Employee("Charlie", "Engineering", 110000, 6),
new Employee("Diana", "Marketing", 85000, 10),
new Employee("Eve", "Marketing", 72000, 2),
new Employee("Frank", "Sales", 78000, 5),
new Employee("Grace", "Sales", 82000, 7),
new Employee("Henry", "HR", 68000, 4),
new Employee("Ivy", "HR", 71000, 6),
new Employee("Jack", "Engineering", 135000, 12)
);
System.out.println("=== EMPLOYEE ANALYTICS REPORT ===\n");
// -------------------------------------------------------
// 1. FILTER: Engineers earning above $100K
// -------------------------------------------------------
System.out.println("--- 1. High-Earning Engineers (>$100K) ---");
List highEarningEngineers = employees.stream()
.filter(e -> e.getDepartment().equals("Engineering"))
.filter(e -> e.getSalary() > 100000)
.sorted(Comparator.comparingDouble(Employee::getSalary).reversed())
.collect(Collectors.toList());
highEarningEngineers.forEach(System.out::println);
// Output:
// Jack | Engineering | $135,000.00 | 12 yrs
// Alice | Engineering | $120,000.00 | 8 yrs
// Charlie | Engineering | $110,000.00 | 6 yrs
// -------------------------------------------------------
// 2. MAP + COLLECT: Average salary per department
// -------------------------------------------------------
System.out.println("\n--- 2. Average Salary by Department ---");
Map avgSalaryByDept = employees.stream()
.collect(Collectors.groupingBy(
Employee::getDepartment,
Collectors.averagingDouble(Employee::getSalary)
));
avgSalaryByDept.entrySet().stream()
.sorted(Map.Entry.comparingByValue().reversed())
.forEach(e -> System.out.printf(" %-12s $%,.2f%n", e.getKey(), e.getValue()));
// Output:
// Engineering $115,000.00
// Sales $80,000.00
// Marketing $78,500.00
// HR $69,500.00
// -------------------------------------------------------
// 3. GROUPING: Employees grouped by department
// -------------------------------------------------------
System.out.println("\n--- 3. Employees by Department ---");
Map> namesByDept = employees.stream()
.collect(Collectors.groupingBy(
Employee::getDepartment,
Collectors.mapping(Employee::getName, Collectors.toList())
));
namesByDept.forEach((dept, names) ->
System.out.printf(" %-12s %s%n", dept, names));
// Output:
// Engineering [Alice, Bob, Charlie, Jack]
// Marketing [Diana, Eve]
// Sales [Frank, Grace]
// HR [Henry, Ivy]
// -------------------------------------------------------
// 4. REDUCE: Highest-paid employee
// -------------------------------------------------------
System.out.println("\n--- 4. Highest Paid Employee ---");
employees.stream()
.max(Comparator.comparingDouble(Employee::getSalary))
.ifPresent(e -> System.out.println(" " + e));
// Output:
// Jack | Engineering | $135,000.00 | 12 yrs
// -------------------------------------------------------
// 5. PARTITIONING: Senior vs Junior (5+ years = senior)
// -------------------------------------------------------
System.out.println("\n--- 5. Senior vs Junior ---");
Map> seniorPartition = employees.stream()
.collect(Collectors.partitioningBy(Employee::isSenior));
System.out.println(" Senior (" + seniorPartition.get(true).size() + "):");
seniorPartition.get(true).forEach(e -> System.out.println(" " + e.getName()));
System.out.println(" Junior (" + seniorPartition.get(false).size() + "):");
seniorPartition.get(false).forEach(e -> System.out.println(" " + e.getName()));
// Output:
// Senior (6):
// Alice, Charlie, Diana, Frank, Grace, Ivy
// Junior (4):
// Bob, Eve, Henry, Jack... wait, Jack has 12 years!
// Actually: Bob, Eve, Henry
// -------------------------------------------------------
// 6. STATISTICS: Salary summary
// -------------------------------------------------------
System.out.println("\n--- 6. Salary Statistics ---");
DoubleSummaryStatistics salaryStats = employees.stream()
.mapToDouble(Employee::getSalary)
.summaryStatistics();
System.out.printf(" Count: %d%n", salaryStats.getCount());
System.out.printf(" Total: $%,.2f%n", salaryStats.getSum());
System.out.printf(" Min: $%,.2f%n", salaryStats.getMin());
System.out.printf(" Max: $%,.2f%n", salaryStats.getMax());
System.out.printf(" Average: $%,.2f%n", salaryStats.getAverage());
// Output:
// Count: 10
// Total: $916,000.00
// Min: $68,000.00
// Max: $135,000.00
// Average: $91,600.00
// -------------------------------------------------------
// 7. TOP N: Top 3 highest salaries
// -------------------------------------------------------
System.out.println("\n--- 7. Top 3 Highest Salaries ---");
employees.stream()
.sorted(Comparator.comparingDouble(Employee::getSalary).reversed())
.limit(3)
.forEach(e -> System.out.println(" " + e));
// Output:
// Jack | Engineering | $135,000.00 | 12 yrs
// Alice | Engineering | $120,000.00 | 8 yrs
// Charlie | Engineering | $110,000.00 | 6 yrs
// -------------------------------------------------------
// 8. STRING JOINING: Department roster
// -------------------------------------------------------
System.out.println("\n--- 8. Department Roster ---");
Map rosters = employees.stream()
.collect(Collectors.groupingBy(
Employee::getDepartment,
Collectors.mapping(
Employee::getName,
Collectors.joining(", ")
)
));
rosters.forEach((dept, roster) ->
System.out.printf(" %-12s %s%n", dept, roster));
// Output:
// Engineering Alice, Bob, Charlie, Jack
// Marketing Diana, Eve
// Sales Frank, Grace
// HR Henry, Ivy
// -------------------------------------------------------
// 9. COMPLEX: Department with highest average salary
// -------------------------------------------------------
System.out.println("\n--- 9. Highest-Paying Department ---");
employees.stream()
.collect(Collectors.groupingBy(
Employee::getDepartment,
Collectors.averagingDouble(Employee::getSalary)
))
.entrySet().stream()
.max(Map.Entry.comparingByValue())
.ifPresent(e -> System.out.printf(" %s with avg $%,.2f%n", e.getKey(), e.getValue()));
// Output:
// Engineering with avg $115,000.00
// -------------------------------------------------------
// 10. BOOLEAN CHECKS: Quick analytics
// -------------------------------------------------------
System.out.println("\n--- 10. Quick Checks ---");
boolean anyOver130K = employees.stream()
.anyMatch(e -> e.getSalary() > 130000);
System.out.println(" Anyone earning >$130K? " + anyOver130K); // true
boolean allAbove50K = employees.stream()
.allMatch(e -> e.getSalary() > 50000);
System.out.println(" All earning >$50K? " + allAbove50K); // true
long totalExperience = employees.stream()
.mapToInt(Employee::getYearsOfExperience)
.sum();
System.out.println(" Total years of experience: " + totalExperience); // 63
System.out.println("\n=== END OF REPORT ===");
}
}
| # | Concept | Where Used |
|---|---|---|
| 1 | filter() |
Section 1 -- filtering by department and salary |
| 2 | sorted() with Comparator |
Sections 1, 2, 7 -- sorting by salary |
| 3 | collect(Collectors.toList()) |
Sections 1, 3 -- gathering results |
| 4 | groupingBy() |
Sections 2, 3, 8, 9 -- grouping by department |
| 5 | averagingDouble() |
Sections 2, 9 -- average salary |
| 6 | mapping() downstream |
Sections 3, 8 -- extracting names within groups |
| 7 | max() with Comparator |
Section 4 -- highest-paid employee |
| 8 | partitioningBy() |
Section 5 -- senior vs junior split |
| 9 | summaryStatistics() |
Section 6 -- comprehensive salary stats |
| 10 | limit() |
Section 7 -- top 3 |
| 11 | Collectors.joining() |
Section 8 -- comma-separated roster |
| 12 | Chained stream operations | Section 9 -- collect then stream the result |
| 13 | anyMatch(), allMatch() |
Section 10 -- boolean checks |
| 14 | mapToInt() + sum() |
Section 10 -- total experience |
| Category | Operation | Type | Returns |
|---|---|---|---|
| Create | collection.stream() |
Source | Stream<T> |
| Create | Stream.of(a, b, c) |
Source | Stream<T> |
| Create | IntStream.rangeClosed(1, 10) |
Source | IntStream |
| Transform | filter(Predicate) |
Intermediate | Stream<T> |
| Transform | map(Function) |
Intermediate | Stream<R> |
| Transform | flatMap(Function) |
Intermediate | Stream<R> |
| Transform | sorted() |
Intermediate | Stream<T> |
| Transform | distinct() |
Intermediate | Stream<T> |
| Transform | limit(n) / skip(n) |
Intermediate | Stream<T> |
| Collect | collect(Collectors.toList()) |
Terminal | List<T> |
| Collect | collect(Collectors.toSet()) |
Terminal | Set<T> |
| Collect | collect(Collectors.toMap(...)) |
Terminal | Map<K,V> |
| Collect | collect(Collectors.groupingBy(...)) |
Terminal | Map<K,List<T>> |
| Collect | collect(Collectors.joining(...)) |
Terminal | String |
| Reduce | reduce(identity, BinaryOp) |
Terminal | T |
| Reduce | count() |
Terminal | long |
| Reduce | min(Comparator) / max(Comparator) |
Terminal | Optional<T> |
| Search | findFirst() / findAny() |
Terminal | Optional<T> |
| Match | anyMatch / allMatch / noneMatch |
Terminal | boolean |
| Action | forEach(Consumer) |
Terminal | void |
By default, search results are returned sorted by relevance, with the most relevant docs first.
The relevance score of each document is represented by a positive floating-point number called the _score. The higher the _score, the more relevant the document.
A query clause generates a _score for each document. How that score is calculated depends on the type of query clause. Different query clauses are used for different purposes: a fuzzy query might determine the _score by calculating how similar the spelling of the found word is to the original search term; a terms query would incor‐ porate the percentage of terms that were found. However, what we usually mean by relevance is the algorithm that we use to calculate how similar the contents of a full- text field are to a full-text query string.
The standard similarity algorithm used in Elasticsearch is known as term frequency/ inverse document frequency, or TF/IDF, which takes the following factors into account. The more often, the more relevant. A field containing five mentions of the same term is more likely to be relevant than a field containing just one mention.
Sorting allows you to add one or more sorts on specific fields. Each sort can be reversed(ascending or descending) as well. The sort is defined on a per field level, with special field name for _score to sort by score, and _doc to sort by index order.
The order option can have either asc or desc.
The order defaults to desc when sorting on the _score, and defaults to asc when sorting on anything else.
GET users/_search
{
"query" : {
"filtered" : {
"filter" : { "term" : { "id" : 1 }}
}
},
"sort": { "date": { "order": "desc" }}
}
Perhaps we want to combine the _score from a query with the date, and show all matching results sorted first by date, then by relevance.
GET /_search
{
"query" : {
"filtered" : {
"query": { "match": { "description": "student" }},
"filter" : { "term" : { "id" : 2 }}
}
},
"sort": [
{
"date": {"order":"desc"}
},
{
"_score": { "order": "desc" }
}
]
}
Order is important. Results are sorted by the first criterion first. Only results whose first sort value is identical will then be sorted by the second criterion, and so on. Multilevel sorting doesn’t have to involve the _score. You could sort by using several different fields, on geo-distance or on a custom value calculated in a script.
Elasticsearch supports sorting by array or multi-valued fields. The mode option controls what array value is picked for sorting the document it belongs to. The mode option can have the following values.
min |
Pick the lowest value. |
max |
Pick the highest value. |
sum |
Use the sum of all values as sort value. Only applicable for number based array fields. |
avg |
Use the average of all values as sort value. Only applicable for number based array fields. |
median |
Use the median of all values as sort value. Only applicable for number based array fields. |
The default sort mode in the ascending sort order is min — the lowest value is picked. The default sort mode in the descending order is max — the highest value is picked.
Note that filters have no bearing on _score, and the missing-but-implied match_all query just sets the _score to a neutral value of 1 for all documents. In other words, all documents are considered to be equally relevant.
For numeric fields it is also possible to cast the values from one type to another using the numeric_type option. This option accepts the following values: ["double", "long", "date", "date_nanos"] and can be useful for searches across multiple data streams or indices where the sort field is mapped differently.
Sometimes you want to sort by how close a location is to a single point(lat/long). You can do this in elasticsearch.
GET elasticsearch_learning/_search
{
"sort":[{
"_geo_distance" : {
"addresses.location" : [
{
"lat" : 40.414897,
"lon" : -111.881186
}
],
"unit" : "m",
"distance_type" : "arc",
"order" : "desc",
"nested" : {
"path" : "addresses",
"filter" : {
"geo_distance" : {
"addresses.location" : [
-111.881186,
40.414897
],
"distance" : 1609.344,
"distance_type" : "arc",
"validation_method" : "STRICT",
"ignore_unmapped" : false,
"boost" : 1.0
}
}
},
"validation_method" : "STRICT",
"ignore_unmapped" : false
}
}]
}
/**
* https://www.elastic.co/guide/en/elasticsearch/reference/7.x/query-dsl-nested-query.html<br>
* https://www.elastic.co/guide/en/elasticsearch/reference/7.3/search-request-body.html#geo-sorting<br>
* Sort results based on how close locations are to a certain point.
*/
@Test
void sortQueryWithGeoLocation() {
int pageNumber = 0;
int pageSize = 10;
SearchRequest searchRequest = new SearchRequest(database);
searchRequest.allowPartialSearchResults(true);
searchRequest.indicesOptions(IndicesOptions.lenientExpandOpen());
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.from(pageNumber * pageSize);
searchSourceBuilder.size(pageSize);
searchSourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS));
/**
* fetch only a few fields
*/
searchSourceBuilder.fetchSource(new String[]{"id", "firstName", "lastName", "rating", "dateOfBirth", "addresses.street", "addresses.zipcode", "addresses.city"}, new String[]{""});
/**
* Lehi skate park: 40.414897, -111.881186<br>
* get locations/addresses close to skate park(from a radius).<br>
*/
searchSourceBuilder.sort(new GeoDistanceSortBuilder("addresses.location", 40.414897,
-111.881186).order(SortOrder.DESC)
.setNestedSort(
new NestedSortBuilder("addresses").setFilter(QueryBuilders.geoDistanceQuery("addresses.location").point(40.414897, -111.881186).distance(1, DistanceUnit.MILES))));
log.info("\n{\n\"sort\":{}\n}", searchSourceBuilder.sorts().toString());
searchRequest.source(searchSourceBuilder);
searchRequest.preference("nested-address");
try {
SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
log.info("hits={}, isTimedOut={}, totalShards={}, totalHits={}", searchResponse.getHits().getHits().length, searchResponse.isTimedOut(), searchResponse.getTotalShards(),
searchResponse.getHits().getTotalHits().value);
List<User> users = getResponseResult(searchResponse.getHits());
log.info("results={}", ObjectUtils.toJson(users));
} catch (IOException e) {
log.warn("IOException, msg={}", e.getLocalizedMessage());
e.printStackTrace();
} catch (Exception e) {
log.warn("Exception, msg={}", e.getLocalizedMessage());
e.printStackTrace();
}
}
Adding explain produces a lot of output for every hit, which can look overwhelming, but it is worth taking the time to understand what it all means. Don’t worry if it doesn’t all make sense now; you can refer to this section when you need it. We’ll work through the output for one hit bit by bit.
GET users/_search?explain
{
"query" :{"match":{"description":"student"}} }
}
Producing the explain output is expensive. It is a debugging tool only. Don’t leave it turned on in production.
To make sorting efficient, Elasticsearch loads all the values for the field that you want to sort on into memory. This is referred to as fielddata. Elasticsearch doesn’t just load the values for the documents that matched a particular query. It loads the values from every docu‐ ment in your index, regardless of the document type.
The reason that Elasticsearch loads all values into memory is that uninverting the index from disk is slow. Even though you may need the values for only a few docs for the current request, you will probably need access to the values for other docs on the next request, so it makes sense to load all the values into memory at once, and to keep them there.
All you need to know is what fielddata is, and to be aware that it can be memory hungry. We will talk about how to determine the amount of memory that fielddata is using, how to limit the amount of memory that is available to it, and how to preload fielddata to improve the user experience.
If you have ever tried to process a 10 GB log file by reading it entirely into memory, you already know why generators and iterators matter. They are Python’s answer to a fundamental problem: how do you work with sequences of data without materializing everything in memory at once?
An iterator is any object that produces values one at a time through a standard protocol. A generator is a special kind of iterator that you create with a function containing yield statements. Together, they let you build lazy pipelines that process data element by element, consuming only the memory needed for a single item at a time.
This is not just an academic concept. Every for loop in Python uses the iterator protocol under the hood. When you iterate over a file, a database cursor, or a range of numbers, you are already using iterators. Understanding how they work gives you the ability to write code that scales to datasets of any size without blowing up your memory footprint.
In this tutorial, we will cover the iterator protocol from the ground up, build custom iterators and generators, chain them into processing pipelines, and explore the itertools module. By the end, you will have a complete mental model for lazy evaluation in Python.
The iterator protocol is deceptively simple. It consists of two methods:
__iter__() — Returns the iterator object itself. This is what makes an object usable in a for loop.__next__() — Returns the next value in the sequence. When there are no more values, it raises StopIteration.That is the entire contract. Any object that implements both methods is an iterator. Any object that implements __iter__() (even if it returns a separate iterator object) is an iterable.
The distinction matters: a list is an iterable (it has __iter__() that returns a list iterator), but it is not itself an iterator (it does not have __next__()). The iterator is a separate object that tracks the current position.
# The iterator protocol in action
numbers = [10, 20, 30]
# Get an iterator from the iterable
it = iter(numbers) # Calls numbers.__iter__()
print(next(it)) # 10 — Calls it.__next__()
print(next(it)) # 20
print(next(it)) # 30
# print(next(it)) # Raises StopIteration
# This is exactly what a for loop does internally:
# 1. Calls iter() on the iterable to get an iterator
# 2. Calls next() repeatedly until StopIteration
# 3. Catches StopIteration silently and exits the loop
for num in [10, 20, 30]:
print(num)
# Equivalent to the manual iter()/next() calls above
Understanding StopIteration is key. It is not an error — it is the signal that tells Python the sequence is exhausted. The for loop catches it automatically, but if you call next() manually, you need to handle it yourself or pass a default value:
# Handling StopIteration manually
it = iter([1, 2])
print(next(it)) # 1
print(next(it)) # 2
print(next(it, "done")) # "done" — default value instead of StopIteration
# Without a default, you must catch the exception
it = iter([1])
try:
print(next(it)) # 1
print(next(it)) # StopIteration raised here
except StopIteration:
print("Iterator exhausted")
To make your own class work with for loops, implement the iterator protocol. Here is a class that counts up from a start value to a stop value:
class CountUp:
"""An iterator that counts from start to stop (inclusive)."""
def __init__(self, start, stop):
self.start = start
self.stop = stop
self.current = start
def __iter__(self):
return self
def __next__(self):
if self.current > self.stop:
raise StopIteration
value = self.current
self.current += 1
return value
# Use it in a for loop
for num in CountUp(1, 5):
print(num, end=" ") # 1 2 3 4 5
# Use it with list() to materialize all values
print(list(CountUp(10, 15))) # [10, 11, 12, 13, 14, 15]
# Use it with sum(), max(), any(), etc.
print(sum(CountUp(1, 100))) # 5050
Python’s built-in types are all iterable. The iter() function extracts an iterator from any iterable, and next() advances it one step.
# Lists
list_iter = iter([1, 2, 3])
print(next(list_iter)) # 1
print(next(list_iter)) # 2
# Strings (iterate character by character)
str_iter = iter("Python")
print(next(str_iter)) # 'P'
print(next(str_iter)) # 'y'
# Dictionaries (iterate over keys by default)
data = {"name": "Alice", "age": 30, "role": "engineer"}
dict_iter = iter(data)
print(next(dict_iter)) # 'name'
print(next(dict_iter)) # 'age'
# Iterate over values or key-value pairs
for value in data.values():
print(value, end=" ") # Alice 30 engineer
for key, value in data.items():
print(f"{key}={value}", end=" ") # name=Alice age=30 role=engineer
# Sets (order is not guaranteed)
set_iter = iter({3, 1, 4, 1, 5})
print(next(set_iter)) # Could be any element
# Files are iterators (they yield lines)
with open("example.txt", "w") as f:
f.write("line 1\nline 2\nline 3\n")
with open("example.txt") as f:
for line in f: # f is its own iterator
print(line.strip())
# line 1
# line 2
# line 3
Notice that files are their own iterators — calling iter(f) returns f itself. This is why you can iterate over a file directly in a for loop. It also means you can only iterate through a file once without resetting the file pointer.
Let us build a few more custom iterators to solidify the pattern. Each one implements __iter__() and __next__().
class Fibonacci:
"""An iterator that produces Fibonacci numbers up to a maximum value."""
def __init__(self, max_value):
self.max_value = max_value
self.a = 0
self.b = 1
def __iter__(self):
return self
def __next__(self):
if self.a > self.max_value:
raise StopIteration
value = self.a
self.a, self.b = self.b, self.a + self.b
return value
print(list(Fibonacci(100)))
# [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89]
# Works with any function that consumes an iterable
print(sum(Fibonacci(1000))) # 2583
class MyRange:
"""A simplified reimplementation of range()."""
def __init__(self, start, stop=None, step=1):
if stop is None:
self.start = 0
self.stop = start
else:
self.start = start
self.stop = stop
self.step = step
def __iter__(self):
# Return a new iterator each time — this allows reuse
current = self.start
while (self.step > 0 and current < self.stop) or \
(self.step < 0 and current > self.stop):
yield current # Using yield here makes __iter__ a generator
current += self.step
def __len__(self):
return max(0, (self.stop - self.start + self.step - 1) // self.step)
def __repr__(self):
return f"MyRange({self.start}, {self.stop}, {self.step})"
# Forward range
print(list(MyRange(5))) # [0, 1, 2, 3, 4]
print(list(MyRange(2, 8))) # [2, 3, 4, 5, 6, 7]
print(list(MyRange(0, 10, 3))) # [0, 3, 6, 9]
# Reverse range
print(list(MyRange(10, 0, -2))) # [10, 8, 6, 4, 2]
# Reusable (unlike a plain iterator)
r = MyRange(3)
print(list(r)) # [0, 1, 2]
print(list(r)) # [0, 1, 2] — works again because __iter__ creates a new generator
Notice the MyRange trick: instead of implementing __next__() directly, the __iter__() method uses yield, which makes it a generator function. Each call to __iter__() creates a fresh generator object, so the range is reusable. This is a common and powerful pattern.
Writing custom iterator classes is verbose. You need __init__, __iter__, __next__, manual state management, and StopIteration handling. Generators solve this by letting you write iterator logic as a simple function with yield statements.
When Python encounters a yield in a function body, that function becomes a generator function. Calling it does not execute the body — it returns a generator object that implements the iterator protocol automatically.
def count_up(start, stop):
"""A generator that counts from start to stop."""
current = start
while current <= stop:
yield current # Pause here, return current value
current += 1 # Resume here on next() call
# Calling the function returns a generator object (does NOT run the body)
gen = count_up(1, 5)
print(type(gen)) # <class 'generator'>
# The generator implements the iterator protocol
print(next(gen)) # 1
print(next(gen)) # 2
print(next(gen)) # 3
# Use in a for loop
for num in count_up(1, 5):
print(num, end=" ") # 1 2 3 4 5
When you call next() on a generator, execution proceeds from the current position until it hits a yield statement. At that point, the yielded value is returned to the caller, and the generator's entire state (local variables, instruction pointer) is frozen. The next next() call resumes from exactly where it left off.
def demonstrate_state():
print("Step 1: Starting")
yield "first"
print("Step 2: Resumed after first yield")
yield "second"
print("Step 3: Resumed after second yield")
yield "third"
print("Step 4: About to finish")
# No more yields — StopIteration will be raised
gen = demonstrate_state()
print(next(gen))
# Step 1: Starting
# 'first'
print(next(gen))
# Step 2: Resumed after first yield
# 'second'
print(next(gen))
# Step 3: Resumed after second yield
# 'third'
# print(next(gen))
# Step 4: About to finish
# Raises StopIteration
You can inspect a generator's state using the inspect module:
import inspect
def simple_gen():
yield 1
yield 2
gen = simple_gen()
print(inspect.getgeneratorstate(gen)) # GEN_CREATED
next(gen)
print(inspect.getgeneratorstate(gen)) # GEN_SUSPENDED
next(gen)
print(inspect.getgeneratorstate(gen)) # GEN_SUSPENDED
try:
next(gen)
except StopIteration:
pass
print(inspect.getgeneratorstate(gen)) # GEN_CLOSED
A generator moves through four states: GEN_CREATED (just created, not started), GEN_RUNNING (currently executing), GEN_SUSPENDED (paused at a yield), and GEN_CLOSED (finished or closed).
Compare the class-based Fibonacci iterator from earlier with the generator version:
# Generator version — drastically simpler
def fibonacci(max_value=None):
a, b = 0, 1
while max_value is None or a <= max_value:
yield a
a, b = b, a + b
# Finite sequence
print(list(fibonacci(100)))
# [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89]
# Infinite sequence (use itertools.islice to take a finite portion)
import itertools
print(list(itertools.islice(fibonacci(), 15)))
# [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377]
The generator version is 4 lines of logic compared to 12+ lines for the class. No __init__, no __iter__, no __next__, no StopIteration — Python handles all of it.
Generator expressions are to generators what list comprehensions are to lists. They use the same syntax as list comprehensions, but with parentheses instead of square brackets. The critical difference is that a generator expression produces values lazily — one at a time — while a list comprehension builds the entire list in memory.
import sys
# List comprehension — builds entire list in memory
squares_list = [x ** 2 for x in range(1_000_000)]
print(f"List size: {sys.getsizeof(squares_list):,} bytes") # ~8,448,728 bytes
# Generator expression — produces values on demand
squares_gen = (x ** 2 for x in range(1_000_000))
print(f"Generator size: {sys.getsizeof(squares_gen):,} bytes") # ~200 bytes
# Both support filtering
even_squares = (x ** 2 for x in range(20) if x % 2 == 0)
print(list(even_squares)) # [0, 4, 16, 36, 64, 100, 144, 196, 256, 324]
# Generator expressions can be passed directly to functions
# (no extra parentheses needed when it is the only argument)
total = sum(x ** 2 for x in range(1000))
print(total) # 332833500
max_val = max(len(word) for word in ["Python", "generators", "are", "powerful"])
print(max_val) # 10
has_negative = any(x < 0 for x in [1, -2, 3, 4])
print(has_negative) # True
import sys
def compare_memory(n):
"""Compare memory usage of list vs generator for n elements."""
# List comprehension
data_list = [x * 2 for x in range(n)]
list_size = sys.getsizeof(data_list)
# Generator expression
data_gen = (x * 2 for x in range(n))
gen_size = sys.getsizeof(data_gen)
print(f"n={n:>12,} | List: {list_size:>12,} bytes | Generator: {gen_size:>6,} bytes | Ratio: {list_size/gen_size:.0f}x")
compare_memory(100)
compare_memory(10_000)
compare_memory(1_000_000)
compare_memory(10_000_000)
# Output:
# n= 100 | List: 920 bytes | Generator: 200 bytes | Ratio: 5x
# n= 10,000 | List: 87,624 bytes | Generator: 200 bytes | Ratio: 438x
# n= 1,000,000 | List: 8,448,728 bytes | Generator: 200 bytes | Ratio: 42244x
# n= 10,000,000 | List: 80,000,056 bytes | Generator: 200 bytes | Ratio: 400000x
The generator's memory footprint is constant regardless of how many elements it produces. This is the fundamental advantage of lazy evaluation.
The yield from expression, introduced in Python 3.3, delegates iteration to a sub-generator or any iterable. It is cleaner than manually looping over a sub-iterable and yielding each element.
# Without yield from
def chain_manual(*iterables):
for iterable in iterables:
for item in iterable:
yield item
# With yield from — cleaner
def chain_elegant(*iterables):
for iterable in iterables:
yield from iterable
# Both produce the same result
result = list(chain_elegant([1, 2, 3], "abc", (10, 20)))
print(result) # [1, 2, 3, 'a', 'b', 'c', 10, 20]
def flatten(nested):
"""Recursively flatten a nested structure."""
for item in nested:
if isinstance(item, (list, tuple)):
yield from flatten(item) # Delegate to recursive call
else:
yield item
data = [1, [2, 3], [4, [5, 6, [7, 8]]], 9]
print(list(flatten(data))) # [1, 2, 3, 4, 5, 6, 7, 8, 9]
# Works with mixed nesting
mixed = [1, (2, [3, 4]), [5, (6,)], 7]
print(list(flatten(mixed))) # [1, 2, 3, 4, 5, 6, 7]
def header_rows():
yield "Name,Age,City"
def data_rows():
yield "Alice,30,New York"
yield "Bob,25,San Francisco"
yield "Charlie,35,Chicago"
def footer_rows():
yield "---END OF REPORT---"
def full_report():
yield from header_rows()
yield from data_rows()
yield from footer_rows()
for line in full_report():
print(line)
# Name,Age,City
# Alice,30,New York
# Bob,25,San Francisco
# Charlie,35,Chicago
# ---END OF REPORT---
Generators are not just producers — they can also receive values. The send() method resumes a generator and sends a value that becomes the result of the yield expression inside the generator. This turns generators into coroutines that can both produce and consume data.
def running_average():
"""A generator that computes a running average."""
total = 0
count = 0
average = None
while True:
value = yield average # Receive a value, yield the current average
if value is None:
break
total += value
count += 1
average = total / count
# Usage
avg = running_average()
next(avg) # Prime the generator (advance to first yield)
print(avg.send(10)) # 10.0
print(avg.send(20)) # 15.0
print(avg.send(30)) # 20.0
print(avg.send(40)) # 25.0
The first next() call is necessary to "prime" the generator — it advances execution to the first yield expression, where the generator is ready to receive a value. After that, send() both sends a value in and gets the next yielded value out.
def accumulator():
"""A coroutine that accumulates values and reports the running total."""
total = 0
while True:
value = yield total
if value is None:
return total # return value becomes StopIteration.value
total += value
acc = accumulator()
next(acc) # Prime
print(acc.send(5)) # 5
print(acc.send(10)) # 15
print(acc.send(3)) # 18
# Close the generator gracefully
try:
acc.send(None) # Triggers the return statement
except StopIteration as e:
print(f"Final total: {e.value}") # Final total: 18
# Practical coroutine: a filter that receives items and forwards matches
def grep_coroutine(pattern):
"""A coroutine that filters lines matching a pattern."""
print(f"Looking for: {pattern}")
matches = []
while True:
line = yield
if line is None:
break
if pattern in line:
matches.append(line)
print(f" Match: {line}")
return matches
# Usage
searcher = grep_coroutine("error")
next(searcher) # Prime
searcher.send("INFO: Server started")
searcher.send("ERROR: Connection timeout") # Match
searcher.send("DEBUG: Request received")
searcher.send("ERROR: Disk full") # Match
searcher.send("INFO: Shutting down")
try:
searcher.send(None) # Signal completion
except StopIteration as e:
print(f"All matches: {e.value}")
# Match: ERROR: Connection timeout
# Match: ERROR: Disk full
# All matches: ['ERROR: Connection timeout', 'ERROR: Disk full']
One of the most powerful patterns in Python is chaining generators into a processing pipeline. Each generator reads from the previous one, transforms the data, and passes it along. This works like Unix pipes — data flows through a chain of transformations without any intermediate lists being created in memory.
# Pipeline: Read lines -> filter non-empty -> strip whitespace -> convert to uppercase
def read_lines(text):
"""Stage 1: Split text into lines."""
for line in text.split("\n"):
yield line
def filter_non_empty(lines):
"""Stage 2: Remove empty lines."""
for line in lines:
if line.strip():
yield line
def strip_whitespace(lines):
"""Stage 3: Strip leading/trailing whitespace."""
for line in lines:
yield line.strip()
def to_uppercase(lines):
"""Stage 4: Convert to uppercase."""
for line in lines:
yield line.upper()
# Chain the pipeline
raw_text = """
hello world
Python generators
are powerful
and memory efficient
"""
pipeline = to_uppercase(
strip_whitespace(
filter_non_empty(
read_lines(raw_text)
)
)
)
for line in pipeline:
print(line)
# HELLO WORLD
# PYTHON GENERATORS
# ARE POWERFUL
# AND MEMORY EFFICIENT
# A more realistic pipeline: process log entries
def parse_log_entries(lines):
"""Parse each line into a structured dict."""
for line in lines:
parts = line.split(" | ")
if len(parts) == 3:
yield {
"timestamp": parts[0],
"level": parts[1],
"message": parts[2]
}
def filter_errors(entries):
"""Keep only ERROR entries."""
for entry in entries:
if entry["level"] == "ERROR":
yield entry
def format_alerts(entries):
"""Format entries as alert strings."""
for entry in entries:
yield f"ALERT [{entry['timestamp']}]: {entry['message']}"
# Simulate log data
log_data = [
"2024-01-15 10:00:01 | INFO | Server started",
"2024-01-15 10:00:05 | ERROR | Database connection failed",
"2024-01-15 10:00:10 | INFO | Retry attempt 1",
"2024-01-15 10:00:15 | ERROR | Database connection failed again",
"2024-01-15 10:00:20 | INFO | Connection restored",
"2024-01-15 10:00:25 | ERROR | Disk space low",
]
# Build the pipeline
alerts = format_alerts(filter_errors(parse_log_entries(log_data)))
for alert in alerts:
print(alert)
# ALERT [2024-01-15 10:00:05]: Database connection failed
# ALERT [2024-01-15 10:00:15]: Database connection failed again
# ALERT [2024-01-15 10:00:25]: Disk space low
Each stage processes one item at a time. No intermediate lists are created. This means you could pipe a 100 GB log file through this pipeline and it would use a trivial amount of memory.
The itertools module is Python's standard library for efficient iterator operations. Every function in it returns an iterator, so they compose naturally with generators and pipelines. Here are the functions you will use most often.
import itertools
# count: count from a start value with a step
for i in itertools.islice(itertools.count(10, 2), 5):
print(i, end=" ") # 10 12 14 16 18
print()
# cycle: repeat an iterable forever
colors = itertools.cycle(["red", "green", "blue"])
for _ in range(7):
print(next(colors), end=" ") # red green blue red green blue red
print()
# repeat: repeat a value n times (or forever)
fives = list(itertools.repeat(5, 4))
print(fives) # [5, 5, 5, 5]
# Practical use of repeat: initialize a grid
row = list(itertools.repeat(0, 5))
grid = [list(itertools.repeat(0, 5)) for _ in range(3)]
print(grid) # [[0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0]]
import itertools
# chain: concatenate multiple iterables
combined = list(itertools.chain([1, 2], [3, 4], [5, 6]))
print(combined) # [1, 2, 3, 4, 5, 6]
# chain.from_iterable: chain from a single iterable of iterables
nested = [[1, 2], [3, 4], [5, 6]]
flat = list(itertools.chain.from_iterable(nested))
print(flat) # [1, 2, 3, 4, 5, 6]
# islice: slice an iterator (like list slicing but for iterators)
print(list(itertools.islice(range(100), 5))) # [0, 1, 2, 3, 4]
print(list(itertools.islice(range(100), 10, 20, 3))) # [10, 13, 16, 19]
# takewhile / dropwhile: take/drop based on a predicate
nums = [1, 3, 5, 7, 2, 4, 6, 8]
print(list(itertools.takewhile(lambda x: x < 6, nums))) # [1, 3, 5]
print(list(itertools.dropwhile(lambda x: x < 6, nums))) # [7, 2, 4, 6, 8]
# groupby: group consecutive elements by a key function
data = [("A", 1), ("A", 2), ("B", 3), ("B", 4), ("A", 5)]
for key, group in itertools.groupby(data, key=lambda x: x[0]):
print(f"{key}: {list(group)}")
# A: [('A', 1), ('A', 2)]
# B: [('B', 3), ('B', 4)]
# A: [('A', 5)] <-- Note: only groups CONSECUTIVE matches
import itertools
# combinations: all r-length combinations (no repeats, order doesn't matter)
print(list(itertools.combinations("ABCD", 2)))
# [('A','B'), ('A','C'), ('A','D'), ('B','C'), ('B','D'), ('C','D')]
# combinations_with_replacement: combinations allowing repeats
print(list(itertools.combinations_with_replacement("AB", 3)))
# [('A','A','A'), ('A','A','B'), ('A','B','B'), ('B','B','B')]
# permutations: all r-length arrangements (order matters)
print(list(itertools.permutations("ABC", 2)))
# [('A','B'), ('A','C'), ('B','A'), ('B','C'), ('C','A'), ('C','B')]
# product: Cartesian product (like nested for loops)
print(list(itertools.product("AB", [1, 2])))
# [('A',1), ('A',2), ('B',1), ('B',2)]
# Practical: generate all possible configs
sizes = ["small", "medium", "large"]
colors = ["red", "blue"]
materials = ["cotton", "silk"]
for combo in itertools.product(sizes, colors, materials):
print(combo)
# ('small', 'red', 'cotton')
# ('small', 'red', 'silk')
# ('small', 'blue', 'cotton')
# ... (12 total combinations)
This is the canonical use case for generators. Instead of loading an entire file into memory, you process it one line at a time.
def read_large_file(file_path):
"""Read a file line by line using a generator."""
with open(file_path, "r") as f:
for line in f:
yield line.strip()
def count_errors_in_log(file_path):
"""Count error lines in a log file without loading it into memory."""
error_count = 0
for line in read_large_file(file_path):
if "ERROR" in line:
error_count += 1
return error_count
# For a 10 GB log file, this uses ~1 line of memory at a time
# Instead of loading all 10 GB:
# count = count_errors_in_log("/var/log/huge_application.log")
# Alternative using generator expression:
# error_count = sum(1 for line in read_large_file(path) if "ERROR" in line)
import itertools
def primes():
"""Generate prime numbers indefinitely using a sieve approach."""
yield 2
composites = {} # Maps composite number -> list of primes that divide it
candidate = 3
while True:
if candidate not in composites:
# candidate is prime
yield candidate
composites[candidate * candidate] = [candidate]
else:
# candidate is composite; advance its prime factors
for prime in composites[candidate]:
composites.setdefault(candidate + prime, []).append(prime)
del composites[candidate]
candidate += 2 # Skip even numbers
# Get the first 20 prime numbers
first_20_primes = list(itertools.islice(primes(), 20))
print(first_20_primes)
# [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71]
# Sum of the first 1000 primes
print(sum(itertools.islice(primes(), 1000))) # 3682913
import csv
from io import StringIO
# Simulated CSV data
csv_data = """name,department,salary
Alice,Engineering,120000
Bob,Marketing,85000
Charlie,Engineering,135000
Diana,Marketing,90000
Eve,Engineering,110000
Frank,HR,75000
Grace,Engineering,140000
"""
def read_csv_rows(csv_text):
"""Stage 1: Parse CSV into dictionaries."""
reader = csv.DictReader(StringIO(csv_text))
for row in reader:
yield row
def filter_department(rows, dept):
"""Stage 2: Keep only rows matching the department."""
for row in rows:
if row["department"] == dept:
yield row
def transform_salary(rows):
"""Stage 3: Convert salary to int and add a bonus field."""
for row in rows:
salary = int(row["salary"])
row["salary"] = salary
row["bonus"] = salary * 0.1 # 10% bonus
yield row
def aggregate(rows):
"""Stage 4: Compute total salary and average."""
total = 0
count = 0
for row in rows:
total += row["salary"]
count += 1
yield row # Pass through for downstream consumers
# After iteration, print the summary
if count > 0:
print(f"\nTotal salary: ${total:,}")
print(f"Average salary: ${total/count:,.0f}")
print(f"Headcount: {count}")
# Build and run the pipeline
pipeline = aggregate(
transform_salary(
filter_department(
read_csv_rows(csv_data),
"Engineering"
)
)
)
for emp in pipeline:
print(f"{emp['name']}: ${emp['salary']:,} (bonus: ${emp['bonus']:,.0f})")
# Alice: $120,000 (bonus: $12,000)
# Charlie: $135,000 (bonus: $13,500)
# Eve: $110,000 (bonus: $11,000)
# Grace: $140,000 (bonus: $14,000)
#
# Total salary: $505,000
# Average salary: $126,250
# Headcount: 4
import time
def paginated_api_fetch(base_url, page_size=100):
"""
Generator that fetches paginated API results.
Yields individual items across all pages.
"""
page = 1
while True:
# Simulate API call (replace with real requests.get())
url = f"{base_url}?page={page}&size={page_size}"
print(f"Fetching: {url}")
# Simulated response
if page <= 3:
results = [{"id": i, "name": f"Item {i}"}
for i in range((page-1)*page_size + 1, page*page_size + 1)]
else:
results = [] # No more data
if not results:
break # No more pages
yield from results # Yield each item individually
page += 1
time.sleep(0.1) # Rate limiting
# The consumer does not need to know about pagination
for item in paginated_api_fetch("https://api.example.com/items", page_size=2):
print(f" Processing: {item}")
if item["id"] >= 5:
break # Stop early — remaining pages are never fetched!
# Output:
# Fetching: https://api.example.com/items?page=1&size=2
# Processing: {'id': 1, 'name': 'Item 1'}
# Processing: {'id': 2, 'name': 'Item 2'}
# Fetching: https://api.example.com/items?page=2&size=2
# Processing: {'id': 3, 'name': 'Item 3'}
# Processing: {'id': 4, 'name': 'Item 4'}
# Fetching: https://api.example.com/items?page=3&size=2
# Processing: {'id': 5, 'name': 'Item 5'}
Notice the key advantage: when the consumer breaks out of the loop, the generator stops fetching. Pages 4, 5, 6, etc. are never requested. Lazy evaluation means you only do the work that is actually needed.
Let us put hard numbers on the difference between lists and generators.
import sys
import time
import tracemalloc
def benchmark_list_vs_generator(n):
"""Compare list vs generator for summing n squared numbers."""
# List approach
tracemalloc.start()
start = time.perf_counter()
result_list = sum([x ** 2 for x in range(n)])
list_time = time.perf_counter() - start
list_peak = tracemalloc.get_traced_memory()[1]
tracemalloc.stop()
# Generator approach
tracemalloc.start()
start = time.perf_counter()
result_gen = sum(x ** 2 for x in range(n))
gen_time = time.perf_counter() - start
gen_peak = tracemalloc.get_traced_memory()[1]
tracemalloc.stop()
assert result_list == result_gen
print(f"n = {n:>12,}")
print(f" List: {list_time:.4f}s | Peak memory: {list_peak:>12,} bytes")
print(f" Generator: {gen_time:.4f}s | Peak memory: {gen_peak:>12,} bytes")
print(f" Memory saved: {(1 - gen_peak/list_peak)*100:.1f}%")
print()
benchmark_list_vs_generator(100_000)
benchmark_list_vs_generator(1_000_000)
benchmark_list_vs_generator(10_000_000)
# Typical output:
# n = 100,000
# List: 0.0234s | Peak memory: 824,464 bytes
# Generator: 0.0228s | Peak memory: 464 bytes
# Memory saved: 99.9%
#
# n = 1,000,000
# List: 0.2451s | Peak memory: 8,448,688 bytes
# Generator: 0.2389s | Peak memory: 464 bytes
# Memory saved: 100.0%
#
# n = 10,000,000
# List: 2.5102s | Peak memory: 80,000,048 bytes
# Generator: 2.4231s | Peak memory: 464 bytes
# Memory saved: 100.0%
Key takeaways from the benchmark:
sum(), generators are slightly faster because they avoid the overhead of allocating and populating a list.Generators have some surprising behaviors that trip up even experienced developers. Here are the ones you must know.
# Generators can only be consumed ONCE
gen = (x ** 2 for x in range(5))
print(list(gen)) # [0, 1, 4, 9, 16]
print(list(gen)) # [] — exhausted! No error, just empty.
# This is a common bug:
def get_numbers():
yield 1
yield 2
yield 3
nums = get_numbers()
print(sum(nums)) # 6
print(sum(nums)) # 0 — the generator is already exhausted!
# Fix: recreate the generator each time, or use a list if you need multiple passes
nums_list = list(get_numbers())
print(sum(nums_list)) # 6
print(sum(nums_list)) # 6
gen = (x for x in range(10))
# These all fail:
# gen[0] # TypeError: 'generator' object is not subscriptable
# gen[2:5] # TypeError: 'generator' object is not subscriptable
# len(gen) # TypeError: object of type 'generator' has no len()
# Workarounds:
import itertools
# Get the nth element (consumes n elements)
def nth(iterable, n, default=None):
return next(itertools.islice(iterable, n, None), default)
gen = (x ** 2 for x in range(10))
print(nth(gen, 3)) # 9 (the 4th element, 0-indexed)
# Slice an iterator
gen = (x ** 2 for x in range(10))
print(list(itertools.islice(gen, 2, 5))) # [4, 9, 16]
# A subtle bug: storing a generator and trying to use it in multiple places
def get_even_numbers(n):
return (x for x in range(n) if x % 2 == 0)
evens = get_even_numbers(20)
# First use works fine
for x in evens:
if x > 6:
break
print(f"Stopped at {x}") # Stopped at 8
# Second use — CONTINUES from where we left off, not from the beginning!
remaining = list(evens)
print(remaining) # [10, 12, 14, 16, 18]
# If you expected [0, 2, 4, 6, 8, 10, 12, 14, 16, 18], you have a bug.
# Variables in generator expressions are evaluated lazily
funcs = []
for i in range(5):
funcs.append(lambda: i) # All lambdas capture the SAME variable i
print([f() for f in funcs]) # [4, 4, 4, 4, 4] — not [0, 1, 2, 3, 4]!
# Fix: use a default argument to capture the current value
funcs = []
for i in range(5):
funcs.append(lambda i=i: i) # Each lambda gets its own copy
print([f() for f in funcs]) # [0, 1, 2, 3, 4]
Here are the guidelines I follow when deciding how to use generators in production code.
# GOOD: generator for processing a large file
def process_log_file(path):
with open(path) as f:
for line in f:
if "ERROR" in line:
yield parse_error(line)
# BAD: loading entire file into memory
def process_log_file_bad(path):
with open(path) as f:
lines = f.readlines() # Entire file in memory!
return [parse_error(line) for line in lines if "ERROR" in line]
# GOOD: generator expression passed directly to sum() total = sum(order.total for order in orders if order.status == "completed") # UNNECESSARY: creating an intermediate list total = sum([order.total for order in orders if order.status == "completed"])
import itertools
# GOOD: use itertools.chain instead of nested loops
all_items = itertools.chain(list_a, list_b, list_c)
# GOOD: use itertools.groupby for grouping
for key, group in itertools.groupby(sorted_data, key=extract_key):
process_group(key, list(group))
# GOOD: use itertools.islice for taking the first N items from an iterator
first_ten = list(itertools.islice(infinite_generator(), 10))
# If you need to iterate multiple times, use a class with __iter__
class DataSource:
def __init__(self, path):
self.path = path
def __iter__(self):
with open(self.path) as f:
for line in f:
yield line.strip()
# Each for loop gets a fresh iterator
source = DataSource("data.txt")
count = sum(1 for _ in source) # First pass: count lines
total = sum(len(line) for line in source) # Second pass: total chars
def fetch_records(query):
"""
Yield records matching the query from the database.
WARNING: This generator can only be consumed once.
If you need multiple passes, materialize with list().
"""
cursor = db.execute(query)
for row in cursor:
yield transform(row)
__iter__() and __next__(). They produce values one at a time and raise StopIteration when done. Every for loop in Python uses this protocol.yield. They are dramatically simpler to write than class-based iterators. The function's state is automatically saved and restored between next() calls.(expr for x in iterable if condition). They use constant memory regardless of the source size.chain, islice, groupby, combinations, permutations, and product instead of writing your own.