Subscribe To Our Newsletter
You will receive our latest post and tutorial.
Thank you for subscribing!

required
required


MySQL Json

MySQL supports the native JSON data type since version 5.7.8. The native JSON data type allows you to store JSON documents more efficiently than the JSON text format in the previous versions.

MySQL stores JSON documents in an internal format that allows quick read access to document elements. The JSON binary format is structured in the way that permits the server to search for values within the JSON document directly by key or array index, which is very fast.

The storage of a JSON document is approximately the same as the storage of LONGBLOB or LONGTEXT data.

Json Data type

CREATE TABLE events (
    ...
    browser_info JSON,
    ... 
);

Insert into json column

INSERT INTO events(browser_info) 
VALUES (
   '{ "name": "Safari", "os": "Mac", "resolution": { "x": 1920, "y": 1080 } }'
)

Automatic validation of JSON documents stored in JSON columns. Invalid documents produce an error.

Json Object

Evaluates a (possibly empty) list of key-value pairs and returns a JSON object containing those pairs. An error occurs if any key name is NULL or the number of arguments is odd.

SELECT JSON_OBJECT('id',u.id, 
'firstName',u.first_name, 
'lastName',u.first_name) as jsonUser
FROM user as u;

Json Array

Evaluates a (possibly empty) list of values and returns a JSON array containing those values.

SELECT JSON_ARRAY(u.id, 
u.first_name, 
u.first_name) as jsonUser
FROM user as u;

Json Object Agg
Return result set as a single JSON object
Takes two column names or expressions as arguments, the first of these being used as a key and the second as a value, and returns a JSON object containing key-value pairs. Returns NULL if the result contains no rows, or in the event of an error. An error occurs if any key name is NULL or the number of arguments is not equal to 2.

 

SELECT JSON_OBJECTAGG(u.id, u.firstName, u.lastName) as jsonData
FROM user as u;

// output
{
  "id": 1,
  "firstName": "John",
  "lastName": "Peter"
}

Json Array Agg

Return result set as a single JSON array
Aggregates a result set as a single JSON array whose elements consist of the rows. The order of elements in this array is undefined. The function acts on a column or an expression that evaluates to a single value. Returns NULL if the result contains no rows, or in the event of an error.

SELECT JSON_PRETTY(JSON_OBJECT('userId', u.id, 'cards', cardList)) as jsonData
FROM user as u
LEFT JOIN (SELECT c.user_id, 
    JSON_ARRAYAGG(
        JSON_OBJECT(
        'cardId', c.id,
        'cardNumber', c.card_number)
        ) as cardList
    FROM card as c
    GROUP BY c.user_id) as cards ON u.id = cards.user_id;
{
  "cards": [
    {
      "cardId": 4,
      "cardNumber": "2440531"
    },
    {
      "cardId": 11,
      "cardNumber": "4061190"
    }
  ],
  "userId": 1
}

How to accomplish JSON_ARRAYAGG before version 5.7.8

SELECT JSON_PRETTY(JSON_OBJECT('userId', u.id, 'cards', cardList)) as jsonData
FROM user as u
LEFT JOIN (SELECT c.user_id, 
    CONCAT('[', GROUP_CONCAT(
        JSON_OBJECT(
        'cardId', c.id,
        'cardNumber', c.card_number)
        ), ']') as cardList
    FROM card as c
    GROUP BY c.user_id) as cards ON u.id = cards.user_id;

 

Json Pretty

Provides pretty-printing of JSON values similar to that implemented in PHP and by other languages and database systems. The value supplied must be a JSON value or a valid string representation of a JSON value. 

SELECT JSON_PRETTY(JSON_OBJECT('id',u.id, 
'firstName',u.first_name, 
'lastName',u.first_name)) as jsonUser
FROM user as u;

Json Extract

 json_extract(json_docpath[, path] ...) 

Returns data from a JSON document, selected from the parts of the document matched by the path arguments. Returns NULL if any argument is NULL or no paths locate a value in the document. An error occurs if the json_doc argument is not a valid JSON document or any path argument is not a valid path expression.

 

 

July 25, 2021

Java – Lambda Expression

1. What is a Lambda Expression?

Imagine you need to give someone quick instructions. You could write a full manual with a title page, table of contents, and chapters — or you could just hand them a sticky note: “Sort these by price, lowest first.” A lambda expression is that sticky note. It is a concise way to represent a small piece of behavior — a function — without the ceremony of defining an entire class or method.

Introduced in Java 8, lambda expressions bring functional programming capabilities to Java. Before Java 8, every piece of behavior had to live inside a class. If you wanted to pass a comparator to a sort method, you had to create an anonymous inner class with boilerplate code. Lambdas eliminate that boilerplate.

Formally defined: A lambda expression is an anonymous function — a function with no name, no access modifier, and no return type declaration. It provides a clear and concise way to implement a single abstract method of a functional interface.

What lambdas give you:

  • Less boilerplate — Replace verbose anonymous classes with one-liners
  • Readability — Code reads closer to what it does, not how it is wired up
  • Functional programming — Pass behavior as arguments, return behavior from methods, store behavior in variables
  • Foundation for Streams — The Stream API (filter, map, reduce) relies heavily on lambdas

Here is a before-and-after comparison to see the difference immediately:

import java.util.Arrays;
import java.util.Collections;
import java.util.Comparator;
import java.util.List;

public class LambdaBeforeAfter {
    public static void main(String[] args) {
        List names = Arrays.asList("Charlie", "Alice", "Bob");

        // BEFORE Java 8: Anonymous inner class
        Collections.sort(names, new Comparator() {
            @Override
            public int compare(String a, String b) {
                return a.compareTo(b);
            }
        });
        System.out.println("Sorted (anonymous class): " + names);
        // Output: Sorted (anonymous class): [Alice, Bob, Charlie]

        // AFTER Java 8: Lambda expression
        List names2 = Arrays.asList("Charlie", "Alice", "Bob");
        Collections.sort(names2, (a, b) -> a.compareTo(b));
        System.out.println("Sorted (lambda): " + names2);
        // Output: Sorted (lambda): [Alice, Bob, Charlie]

        // EVEN SHORTER: Method reference
        List names3 = Arrays.asList("Charlie", "Alice", "Bob");
        names3.sort(String::compareTo);
        System.out.println("Sorted (method reference): " + names3);
        // Output: Sorted (method reference): [Alice, Bob, Charlie]
    }
}

Five lines of anonymous class code reduced to a single expression. That is the power of lambdas.

2. Lambda Syntax

The general syntax of a lambda expression is:

(parameters) -> expression
        OR
(parameters) -> { statements; }

The arrow operator -> separates the parameter list from the body. The left side defines what goes in, the right side defines what comes out (or what happens).

2.1 Syntax Variations

Depending on the number of parameters and the complexity of the body, the syntax can be simplified in several ways:

Variation Syntax Example
No parameters () -> expression () -> System.out.println("Hello")
Single parameter (no parens needed) param -> expression name -> name.toUpperCase()
Single parameter (with parens) (param) -> expression (name) -> name.toUpperCase()
Multiple parameters (p1, p2) -> expression (a, b) -> a + b
Expression body (implicit return) (params) -> expression (x) -> x * x
Block body (explicit return) (params) -> { return expr; } (x) -> { return x * x; }
Block body (void, no return) (params) -> { statements; } (msg) -> { System.out.println(msg); }
Explicit parameter types (Type p1, Type p2) -> expr (String a, String b) -> a.compareTo(b)
import java.util.function.*;

public class LambdaSyntaxVariations {
    public static void main(String[] args) {

        // 1. No parameters
        Runnable greet = () -> System.out.println("Hello, World!");
        greet.run();
        // Output: Hello, World!

        // 2. Single parameter - parentheses optional
        Consumer print = message -> System.out.println(message);
        print.accept("Lambda with one param");
        // Output: Lambda with one param

        // 3. Single parameter - with parentheses (also valid)
        Consumer print2 = (message) -> System.out.println(message);
        print2.accept("Lambda with parens");
        // Output: Lambda with parens

        // 4. Multiple parameters
        BinaryOperator add = (a, b) -> a + b;
        System.out.println("Sum: " + add.apply(3, 7));
        // Output: Sum: 10

        // 5. Expression body - implicit return
        Function square = x -> x * x;
        System.out.println("Square of 5: " + square.apply(5));
        // Output: Square of 5: 25

        // 6. Block body - explicit return required
        Function classify = x -> {
            if (x > 0) {
                return "Positive";
            } else if (x < 0) {
                return "Negative";
            } else {
                return "Zero";
            }
        };
        System.out.println("10 is: " + classify.apply(10));
        // Output: 10 is: Positive

        // 7. Explicit types (usually unnecessary due to type inference)
        BinaryOperator concat = (String a, String b) -> a + " " + b;
        System.out.println(concat.apply("Hello", "Lambda"));
        // Output: Hello Lambda

        // 8. Multi-line block body with no return (void)
        Consumer logger = (msg) -> {
            String timestamp = java.time.LocalDateTime.now().toString();
            System.out.println("[" + timestamp + "] " + msg);
        };
        logger.accept("Application started");
        // Output: [2024-01-15T10:30:00.123] Application started
    }
}

2.2 Type Inference

In most cases, the Java compiler can infer the parameter types from the context (the functional interface the lambda implements). You do not need to declare them explicitly.

The compiler looks at the target type — the functional interface type the lambda is being assigned to — and determines the parameter types from its single abstract method.

import java.util.Comparator;
import java.util.function.BiFunction;

public class TypeInference {
    public static void main(String[] args) {

        // The compiler knows this is Comparator, so a and b are String
        Comparator comp1 = (a, b) -> a.compareTo(b);

        // You CAN specify types explicitly -- sometimes useful for clarity
        Comparator comp2 = (String a, String b) -> a.compareTo(b);

        // IMPORTANT: You cannot mix -- either all types or no types
        // Comparator comp3 = (String a, b) -> a.compareTo(b); // COMPILE ERROR

        // Type inference works with generics too
        BiFunction repeat = (text, times) -> text.repeat(times);
        System.out.println(repeat.apply("Ha", 3));
        // Output: HaHaHa
    }
}

3. Functional Interfaces

Lambdas do not exist in a vacuum. Every lambda expression in Java is an implementation of a functional interface. Understanding functional interfaces is essential to understanding lambdas.

3.1 What is a Functional Interface?

A functional interface is an interface that has exactly one abstract method. It can have any number of default methods, static methods, and private methods — but only one abstract method. This single abstract method (SAM) is what the lambda implements.

Key rules:

  • Exactly one abstract method (the SAM)
  • Can have multiple default and static methods
  • Methods inherited from Object (like toString(), equals()) do not count
  • The @FunctionalInterface annotation is optional but recommended — it causes a compile error if the interface has more than one abstract method
// A functional interface - has exactly ONE abstract method
@FunctionalInterface
interface Greeting {
    void greet(String name);  // single abstract method
}

// Still a functional interface - default methods don't count
@FunctionalInterface
interface MathOperation {
    double calculate(double a, double b);  // single abstract method

    default void printResult(double a, double b) {
        System.out.println("Result: " + calculate(a, b));
    }
}

// NOT a functional interface - has TWO abstract methods
// @FunctionalInterface  // This would cause a compile error!
interface NotFunctional {
    void methodOne();
    void methodTwo();
}

// Still a functional interface - toString() comes from Object, doesn't count
@FunctionalInterface
interface Converter {
    T convert(F from);

    @Override
    String toString();  // From Object -- does NOT count as abstract
}

3.2 Creating Custom Functional Interfaces

You can create your own functional interfaces for domain-specific behavior. The @FunctionalInterface annotation tells the compiler (and other developers) that this interface is intended for lambda use.

@FunctionalInterface
interface Validator {
    boolean validate(T item);
}

@FunctionalInterface
interface Transformer {
    R transform(T input);
}

@FunctionalInterface
interface TriFunction {
    R apply(A a, B b, C c);
}

public class CustomFunctionalInterfaces {
    public static void main(String[] args) {

        // Using custom Validator
        Validator emailValidator = email ->
            email != null && email.contains("@") && email.contains(".");
        System.out.println("valid@email.com: " + emailValidator.validate("valid@email.com"));
        // Output: valid@email.com: true
        System.out.println("invalid: " + emailValidator.validate("invalid"));
        // Output: invalid: false

        // Using custom Transformer
        Transformer wordCounter = text -> text.split("\\s+").length;
        System.out.println("Word count: " + wordCounter.transform("Java lambdas are powerful"));
        // Output: Word count: 4

        // Using custom TriFunction (Java doesn't provide one by default)
        TriFunction clamp =
            (value, min, max) -> Math.max(min, Math.min(max, value));
        System.out.println("Clamp 15 to [0,10]: " + clamp.apply(15, 0, 10));
        // Output: Clamp 15 to [0,10]: 10
        System.out.println("Clamp 5 to [0,10]: " + clamp.apply(5, 0, 10));
        // Output: Clamp 5 to [0,10]: 5
    }
}

3.3 Well-Known Functional Interfaces You Already Use

Many interfaces that existed before Java 8 qualify as functional interfaces. The @FunctionalInterface annotation was added to them retroactively:

Interface Abstract Method Package
Runnable void run() java.lang
Callable V call() java.util.concurrent
Comparator int compare(T o1, T o2) java.util
ActionListener void actionPerformed(ActionEvent e) java.awt.event

This means you can use lambdas anywhere these interfaces are expected — no code changes needed on the caller side.

4. Built-in Functional Interfaces

Java 8 introduced the java.util.function package with 43 functional interfaces. You do not need to memorize all of them. Most are specializations of four core interfaces. Master these four and the rest will follow naturally.

4.1 Predicate<T> — Testing a Condition

A Predicate takes one argument and returns a boolean. Use it for filtering, validation, and condition-checking.

Method Description
boolean test(T t) The abstract method — evaluates the predicate on the given argument
and(Predicate other) Logical AND — both predicates must be true
or(Predicate other) Logical OR — at least one predicate must be true
negate() Logical NOT — inverts the predicate
Predicate.isEqual(target) Static method — creates predicate that tests equality to target
import java.util.List;
import java.util.function.Predicate;
import java.util.stream.Collectors;

public class PredicateExamples {
    public static void main(String[] args) {

        // Basic predicate
        Predicate isPositive = n -> n > 0;
        System.out.println("5 is positive: " + isPositive.test(5));   // true
        System.out.println("-3 is positive: " + isPositive.test(-3)); // false

        // Composing predicates with and(), or(), negate()
        Predicate isEven = n -> n % 2 == 0;
        Predicate isPositiveAndEven = isPositive.and(isEven);
        Predicate isPositiveOrEven = isPositive.or(isEven);
        Predicate isNotPositive = isPositive.negate();

        System.out.println("6 is positive AND even: " + isPositiveAndEven.test(6));   // true
        System.out.println("3 is positive AND even: " + isPositiveAndEven.test(3));   // false
        System.out.println("-4 is positive OR even: " + isPositiveOrEven.test(-4));   // true
        System.out.println("-3 is NOT positive: " + isNotPositive.test(-3));          // true

        // Practical example: filtering a list
        List names = List.of("Alice", "Bob", "Charlie", "Dave", "Eve");
        Predicate longerThan3 = name -> name.length() > 3;
        Predicate startsWithC = name -> name.startsWith("C");

        List filtered = names.stream()
            .filter(longerThan3.and(startsWithC))
            .collect(Collectors.toList());
        System.out.println("Long names starting with C: " + filtered);
        // Output: Long names starting with C: [Charlie]

        // Predicate.isEqual() - useful for null-safe equality
        Predicate isAlice = Predicate.isEqual("Alice");
        System.out.println("Is Alice: " + isAlice.test("Alice")); // true
        System.out.println("Is Alice: " + isAlice.test(null));    // false
    }
}

4.2 Function<T, R> — Transforming Data

A Function takes one argument of type T and returns a result of type R. Use it for transformations, conversions, and mappings.

Method Description
R apply(T t) The abstract method — applies the function to the argument
andThen(Function after) Compose: apply this function first, then apply after
compose(Function before) Compose: apply before first, then apply this function
Function.identity() Static method — returns a function that always returns its input
import java.util.function.Function;

public class FunctionExamples {
    public static void main(String[] args) {

        // Basic function: String -> Integer
        Function stringLength = s -> s.length();
        System.out.println("Length of 'Lambda': " + stringLength.apply("Lambda"));
        // Output: Length of 'Lambda': 6

        // Function composition with andThen()
        // Apply first function, then apply second to the result
        Function toUpperCase = s -> s.toUpperCase();
        Function addExclamation = s -> s + "!";

        Function shout = toUpperCase.andThen(addExclamation);
        System.out.println(shout.apply("hello"));
        // Output: HELLO!

        // Function composition with compose()
        // Apply the argument function FIRST, then apply this function
        Function multiplyBy2 = n -> n * 2;
        Function add10 = n -> n + 10;

        // compose: add10 runs first, then multiplyBy2
        Function add10ThenDouble = multiplyBy2.compose(add10);
        System.out.println("compose(5): " + add10ThenDouble.apply(5));
        // Output: compose(5): 30    (5+10=15, 15*2=30)

        // andThen: multiplyBy2 runs first, then add10
        Function doubleThenAdd10 = multiplyBy2.andThen(add10);
        System.out.println("andThen(5): " + doubleThenAdd10.apply(5));
        // Output: andThen(5): 20    (5*2=10, 10+10=20)

        // Function.identity() - returns input unchanged
        Function identity = Function.identity();
        System.out.println(identity.apply("unchanged"));
        // Output: unchanged

        // Practical: build a text processing pipeline
        Function trim = String::trim;
        Function lower = String::toLowerCase;
        Function normalize = trim.andThen(lower).andThen(s -> s.replaceAll("\\s+", " "));

        System.out.println("'" + normalize.apply("   Hello   WORLD   ") + "'");
        // Output: 'hello world'
    }
}

4.3 Consumer<T> — Performing an Action

A Consumer takes one argument and returns nothing (void). Use it for actions, side effects, printing, logging, or saving data.

Method Description
void accept(T t) The abstract method — performs the action on the argument
andThen(Consumer after) Chain: perform this action, then perform after
import java.util.List;
import java.util.function.Consumer;

public class ConsumerExamples {
    public static void main(String[] args) {

        // Basic consumer
        Consumer print = s -> System.out.println(s);
        print.accept("Hello from Consumer!");
        // Output: Hello from Consumer!

        // Chaining consumers with andThen()
        Consumer toUpper = s -> System.out.println("Upper: " + s.toUpperCase());
        Consumer toLower = s -> System.out.println("Lower: " + s.toLowerCase());
        Consumer both = toUpper.andThen(toLower);

        both.accept("Lambda");
        // Output:
        // Upper: LAMBDA
        // Lower: lambda

        // Practical: process a list of items
        List emails = List.of("alice@example.com", "bob@example.com", "charlie@example.com");

        Consumer validate = email -> {
            if (!email.contains("@")) {
                System.out.println("INVALID: " + email);
            }
        };
        Consumer sendWelcome = email -> System.out.println("Sending welcome email to: " + email);
        Consumer logAction = email -> System.out.println("Logged: processed " + email);

        Consumer processEmail = validate.andThen(sendWelcome).andThen(logAction);
        emails.forEach(processEmail);
        // Output:
        // Sending welcome email to: alice@example.com
        // Logged: processed alice@example.com
        // Sending welcome email to: bob@example.com
        // Logged: processed bob@example.com
        // Sending welcome email to: charlie@example.com
        // Logged: processed charlie@example.com
    }
}

4.4 Supplier<T> — Providing a Value

A Supplier takes no arguments and returns a value. Use it for lazy evaluation, factory methods, and deferred computation.

Method Description
T get() The abstract method — produces a result with no input
import java.time.LocalDateTime;
import java.util.Random;
import java.util.function.Supplier;

public class SupplierExamples {
    public static void main(String[] args) {

        // Basic supplier
        Supplier helloSupplier = () -> "Hello, World!";
        System.out.println(helloSupplier.get());
        // Output: Hello, World!

        // Supplier for current timestamp
        Supplier now = () -> LocalDateTime.now();
        System.out.println("Current time: " + now.get());
        // Output: Current time: 2024-01-15T10:30:00.123

        // Supplier as a factory
        Supplier randomFactory = () -> new Random();
        Random r1 = randomFactory.get();
        Random r2 = randomFactory.get();
        System.out.println("Same instance? " + (r1 == r2)); // false -- new object each time

        // Lazy evaluation -- the expensive computation only runs when needed
        Supplier expensiveCalculation = () -> {
            System.out.println("  ...performing expensive calculation...");
            double result = 0;
            for (int i = 0; i < 1000; i++) {
                result += Math.sqrt(i);
            }
            return result;
        };

        boolean needResult = true;
        if (needResult) {
            System.out.println("Result: " + expensiveCalculation.get());
        }
        // Output:
        //   ...performing expensive calculation...
        // Result: 21065.833...

        // Supplier for default values
        String name = null;
        Supplier defaultName = () -> "Anonymous";
        String displayName = (name != null) ? name : defaultName.get();
        System.out.println("Name: " + displayName);
        // Output: Name: Anonymous
    }
}

4.5 UnaryOperator<T> and BinaryOperator<T>

UnaryOperator is a specialization of Function where the input and output types are the same. BinaryOperator is a specialization of BiFunction. These are convenience interfaces for operations that do not change the type.

import java.util.Arrays;
import java.util.List;
import java.util.function.BinaryOperator;
import java.util.function.UnaryOperator;

public class OperatorExamples {
    public static void main(String[] args) {

        // UnaryOperator: same input and output type
        UnaryOperator toUpper = s -> s.toUpperCase();
        System.out.println(toUpper.apply("lambda"));
        // Output: LAMBDA

        UnaryOperator doubleIt = n -> n * 2;
        System.out.println(doubleIt.apply(7));
        // Output: 14

        // UnaryOperator with List.replaceAll()
        List names = Arrays.asList("alice", "bob", "charlie");
        names.replaceAll(String::toUpperCase);
        System.out.println(names);
        // Output: [ALICE, BOB, CHARLIE]

        // BinaryOperator: two inputs of same type, same output type
        BinaryOperator max = (a, b) -> a > b ? a : b;
        System.out.println("Max of 5 and 9: " + max.apply(5, 9));
        // Output: Max of 5 and 9: 9

        BinaryOperator join = (a, b) -> a + ", " + b;
        System.out.println(join.apply("Hello", "World"));
        // Output: Hello, World

        // BinaryOperator with reduce()
        List numbers = List.of(1, 2, 3, 4, 5);
        int sum = numbers.stream().reduce(0, Integer::sum);
        System.out.println("Sum: " + sum);
        // Output: Sum: 15

        // BinaryOperator.minBy() and maxBy()
        BinaryOperator longerString = BinaryOperator.maxBy(
            (a, b) -> Integer.compare(a.length(), b.length())
        );
        System.out.println(longerString.apply("short", "much longer"));
        // Output: much longer
    }
}

4.6 Bi-Variants: BiFunction, BiPredicate, BiConsumer

Java provides “Bi” versions of Function, Predicate, and Consumer that accept two arguments instead of one.

import java.util.HashMap;
import java.util.Map;
import java.util.function.BiConsumer;
import java.util.function.BiFunction;
import java.util.function.BiPredicate;

public class BiFunctionExamples {
    public static void main(String[] args) {

        // BiFunction - takes two args, returns a result
        BiFunction repeat = (text, times) -> text.repeat(times);
        System.out.println(repeat.apply("Ha", 3));
        // Output: HaHaHa

        // BiPredicate - takes two args, returns boolean
        BiPredicate isLongerThan = (str, length) -> str.length() > length;
        System.out.println("'Lambda' longer than 3? " + isLongerThan.test("Lambda", 3));
        // Output: 'Lambda' longer than 3? true
        System.out.println("'Hi' longer than 3? " + isLongerThan.test("Hi", 3));
        // Output: 'Hi' longer than 3? false

        // BiConsumer - takes two args, returns nothing
        BiConsumer printEntry = (key, value) ->
            System.out.println(key + " = " + value);

        // BiConsumer is especially useful with Map.forEach()
        Map scores = new HashMap<>();
        scores.put("Alice", 95);
        scores.put("Bob", 87);
        scores.put("Charlie", 92);

        System.out.println("Scores:");
        scores.forEach(printEntry);
        // Output:
        // Scores:
        // Alice = 95
        // Bob = 87
        // Charlie = 92

        // BiFunction with Map.replaceAll()
        Map prices = new HashMap<>();
        prices.put("Apple", 100);
        prices.put("Banana", 50);
        prices.put("Cherry", 200);

        // Apply 10% discount to everything
        prices.replaceAll((item, price) -> (int)(price * 0.9));
        System.out.println("Discounted: " + prices);
        // Output: Discounted: {Apple=90, Banana=45, Cherry=180}
    }
}

4.7 Complete Reference Table

Here is a summary of the most commonly used functional interfaces from java.util.function:

Interface Abstract Method Input Output Use Case
Predicate test(T) T boolean Filtering, validation
BiPredicate test(T, U) T, U boolean Two-argument conditions
Function apply(T) T R Transformation, mapping
BiFunction apply(T, U) T, U R Two-argument transformation
Consumer accept(T) T void Printing, logging, saving
BiConsumer accept(T, U) T, U void Map.forEach(), two-arg actions
Supplier get() none T Factories, lazy evaluation
UnaryOperator apply(T) T T Same-type transformation
BinaryOperator apply(T, T) T, T T Reduction, combining

There are also primitive specializations like IntPredicate, LongFunction, DoubleSupplier, IntUnaryOperator, and others that avoid autoboxing overhead. Use them when working with primitive types in performance-sensitive code.

5. Lambda with Collections

Java 8 added several methods to the Collection interfaces that accept functional interfaces — making lambdas a natural fit for everyday collection operations. These methods let you process data in place without creating streams.

import java.util.*;

public class LambdaWithCollections {
    public static void main(String[] args) {

        // ========== forEach() ==========
        // Iterable.forEach(Consumer) - perform an action on each element
        List fruits = Arrays.asList("Apple", "Banana", "Cherry", "Date");

        System.out.println("--- forEach ---");
        fruits.forEach(fruit -> System.out.println("Fruit: " + fruit));
        // Output:
        // Fruit: Apple
        // Fruit: Banana
        // Fruit: Cherry
        // Fruit: Date

        // forEach on a Map
        Map ages = new LinkedHashMap<>();
        ages.put("Alice", 30);
        ages.put("Bob", 25);
        ages.put("Charlie", 35);

        System.out.println("\n--- Map forEach ---");
        ages.forEach((name, age) -> System.out.println(name + " is " + age + " years old"));
        // Output:
        // Alice is 30 years old
        // Bob is 25 years old
        // Charlie is 35 years old


        // ========== removeIf() ==========
        // Collection.removeIf(Predicate) - remove elements that match condition
        List numbers = new ArrayList<>(Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 10));

        numbers.removeIf(n -> n % 2 == 0);  // Remove all even numbers
        System.out.println("\n--- removeIf (removed evens) ---");
        System.out.println(numbers);
        // Output: [1, 3, 5, 7, 9]


        // ========== replaceAll() ==========
        // List.replaceAll(UnaryOperator) - transform each element in place
        List names = new ArrayList<>(Arrays.asList("alice", "bob", "charlie"));

        names.replaceAll(name -> name.substring(0, 1).toUpperCase() + name.substring(1));
        System.out.println("\n--- replaceAll (capitalized) ---");
        System.out.println(names);
        // Output: [Alice, Bob, Charlie]


        // ========== sort() ==========
        // List.sort(Comparator) - sort the list using a lambda comparator
        List cities = new ArrayList<>(Arrays.asList("New York", "London", "Tokyo", "Paris", "Sydney"));

        // Sort alphabetically
        cities.sort((a, b) -> a.compareTo(b));
        System.out.println("\n--- sort (alphabetical) ---");
        System.out.println(cities);
        // Output: [London, New York, Paris, Sydney, Tokyo]

        // Sort by length
        cities.sort((a, b) -> Integer.compare(a.length(), b.length()));
        System.out.println("\n--- sort (by length) ---");
        System.out.println(cities);
        // Output: [Paris, Tokyo, London, Sydney, New York]

        // Using Comparator helper methods (cleaner than raw lambda)
        cities.sort(Comparator.comparingInt(String::length).reversed());
        System.out.println("\n--- sort (by length, descending) ---");
        System.out.println(cities);
        // Output: [New York, London, Sydney, Paris, Tokyo]


        // ========== Map.computeIfAbsent() ==========
        // Compute a value only if the key is not already present
        Map> groups = new HashMap<>();

        groups.computeIfAbsent("fruits", k -> new ArrayList<>()).add("Apple");
        groups.computeIfAbsent("fruits", k -> new ArrayList<>()).add("Banana");
        groups.computeIfAbsent("veggies", k -> new ArrayList<>()).add("Carrot");

        System.out.println("\n--- computeIfAbsent ---");
        System.out.println(groups);
        // Output: {veggies=[Carrot], fruits=[Apple, Banana]}


        // ========== Map.merge() ==========
        // Merge a new value with an existing value
        Map wordCount = new HashMap<>();
        String[] words = {"apple", "banana", "apple", "cherry", "banana", "apple"};

        for (String word : words) {
            wordCount.merge(word, 1, (oldVal, newVal) -> oldVal + newVal);
        }
        System.out.println("\n--- merge (word count) ---");
        System.out.println(wordCount);
        // Output: {banana=2, cherry=1, apple=3}
    }
}

6. Lambda with Streams

The Stream API is where lambdas truly shine. Streams provide a declarative pipeline for processing collections, and virtually every stream operation accepts a lambda expression. Here are the most common operations showing lambda syntax alongside method reference alternatives.

import java.util.*;
import java.util.stream.Collectors;

public class LambdaWithStreams {
    public static void main(String[] args) {

        List names = List.of("Alice", "Bob", "Charlie", "David", "Eve", "Alice");

        // ========== filter() -- takes a Predicate ==========
        // Lambda version
        List longNames = names.stream()
            .filter(name -> name.length() > 3)
            .collect(Collectors.toList());
        System.out.println("Filter (lambda): " + longNames);
        // Output: Filter (lambda): [Alice, Charlie, David, Alice]


        // ========== map() -- takes a Function ==========
        // Lambda version
        List nameLengths = names.stream()
            .map(name -> name.length())
            .collect(Collectors.toList());
        System.out.println("Map (lambda): " + nameLengths);
        // Output: Map (lambda): [5, 3, 7, 5, 3, 5]

        // Method reference version
        List upperNames = names.stream()
            .map(String::toUpperCase)
            .collect(Collectors.toList());
        System.out.println("Map (method ref): " + upperNames);
        // Output: Map (method ref): [ALICE, BOB, CHARLIE, DAVID, EVE, ALICE]


        // ========== reduce() -- takes a BinaryOperator ==========
        List numbers = List.of(1, 2, 3, 4, 5);

        // Lambda version
        int sum = numbers.stream()
            .reduce(0, (a, b) -> a + b);
        System.out.println("Reduce (lambda): " + sum);
        // Output: Reduce (lambda): 15

        // Method reference version
        int sum2 = numbers.stream()
            .reduce(0, Integer::sum);
        System.out.println("Reduce (method ref): " + sum2);
        // Output: Reduce (method ref): 15


        // ========== collect() -- grouping with lambdas ==========
        List allNames = List.of("Alice", "Anna", "Bob", "Bill", "Charlie", "Chris");

        Map> grouped = allNames.stream()
            .collect(Collectors.groupingBy(name -> name.charAt(0)));
        System.out.println("Grouped: " + grouped);
        // Output: Grouped: {A=[Alice, Anna], B=[Bob, Bill], C=[Charlie, Chris]}


        // ========== sorted() -- takes a Comparator ==========
        List sorted = allNames.stream()
            .sorted((a, b) -> Integer.compare(a.length(), b.length()))
            .collect(Collectors.toList());
        System.out.println("Sorted by length: " + sorted);
        // Output: Sorted by length: [Bob, Bill, Anna, Chris, Alice, Charlie]

        // Comparator helper (cleaner)
        List sorted2 = allNames.stream()
            .sorted(Comparator.comparingInt(String::length).thenComparing(Comparator.naturalOrder()))
            .collect(Collectors.toList());
        System.out.println("Sorted by length then alpha: " + sorted2);
        // Output: Sorted by length then alpha: [Bob, Anna, Bill, Alice, Chris, Charlie]


        // ========== forEach() -- takes a Consumer ==========
        System.out.println("forEach:");
        names.stream()
            .distinct()
            .forEach(name -> System.out.println("  - " + name));
        // Output:
        // forEach:
        //   - Alice
        //   - Bob
        //   - Charlie
        //   - David
        //   - Eve


        // ========== Combining multiple operations ==========
        String result = names.stream()
            .filter(name -> name.length() > 3)        // Predicate
            .map(String::toUpperCase)                  // Function (method ref)
            .distinct()                                 // Remove duplicates
            .sorted()                                   // Natural order
            .collect(Collectors.joining(", "));         // Join into a string
        System.out.println("Pipeline: " + result);
        // Output: Pipeline: ALICE, CHARLIE, DAVID
    }
}

7. Variable Capture

A lambda expression can access variables from its enclosing scope — this is called variable capture. However, there are strict rules about which variables can be accessed and how.

7.1 Effectively Final Variables

A lambda can access a local variable from its enclosing scope only if that variable is effectively final — meaning its value is never modified after initialization. You do not need to explicitly declare it final, but you cannot change it.

import java.util.List;
import java.util.function.Consumer;

public class VariableCapture {
    // Instance variable - CAN be modified in lambdas
    private int instanceCounter = 0;

    // Static variable - CAN be modified in lambdas
    private static int staticCounter = 0;

    public void demonstrate() {
        // ===== Local variables must be effectively final =====

        // This works -- prefix is effectively final (never reassigned)
        String prefix = "Hello";
        Consumer greeter = name -> System.out.println(prefix + ", " + name);
        greeter.accept("Alice");
        // Output: Hello, Alice

        // This DOES NOT compile -- count is modified after the lambda captures it
        // int count = 0;
        // Runnable r = () -> System.out.println(count); // OK so far
        // count = 1;  // ERROR: Variable used in lambda must be effectively final

        // This DOES NOT compile either -- you cannot modify a captured variable inside a lambda
        // int total = 0;
        // List.of(1, 2, 3).forEach(n -> total += n);  // ERROR: Cannot modify local variable


        // ===== Instance variables CAN be modified =====
        List.of(1, 2, 3).forEach(n -> instanceCounter += n);
        System.out.println("Instance counter: " + instanceCounter);
        // Output: Instance counter: 6

        // ===== Static variables CAN be modified =====
        List.of(1, 2, 3).forEach(n -> staticCounter += n);
        System.out.println("Static counter: " + staticCounter);
        // Output: Static counter: 6
    }

    public static void main(String[] args) {
        new VariableCapture().demonstrate();
    }
}

7.2 Why This Restriction?

The restriction exists because lambdas capture a copy of local variables, not a reference to them. Local variables live on the stack and disappear when the method returns, but the lambda might be executed later (e.g., in another thread). If the lambda modified its copy, changes would not reflect in the original — creating confusing bugs. Java prevents this at compile time.

Instance and static variables are different — they live on the heap and are accessed through references, so lambdas can read and modify them safely.

7.3 Workarounds for Mutable State

When you genuinely need to accumulate or modify a value inside a lambda, use one of these approaches:

import java.util.List;
import java.util.concurrent.atomic.AtomicInteger;

public class VariableCaptureWorkarounds {
    public static void main(String[] args) {

        List numbers = List.of(1, 2, 3, 4, 5);

        // Workaround 1: AtomicInteger (preferred for thread-safe counting)
        AtomicInteger atomicSum = new AtomicInteger(0);
        numbers.forEach(n -> atomicSum.addAndGet(n));
        System.out.println("AtomicInteger sum: " + atomicSum.get());
        // Output: AtomicInteger sum: 15

        // Workaround 2: Single-element array (the array reference is effectively final)
        int[] arraySum = {0};
        numbers.forEach(n -> arraySum[0] += n);
        System.out.println("Array wrapper sum: " + arraySum[0]);
        // Output: Array wrapper sum: 15

        // Workaround 3: Use stream reduce() instead (BEST approach -- no side effects)
        int streamSum = numbers.stream().reduce(0, Integer::sum);
        System.out.println("Stream reduce sum: " + streamSum);
        // Output: Stream reduce sum: 15

        // Workaround 4: Mutable container
        List results = new java.util.ArrayList<>();
        numbers.forEach(n -> {
            if (n % 2 == 0) {
                results.add("Even: " + n);
            }
        });
        System.out.println("Results: " + results);
        // Output: Results: [Even: 2, Even: 4]

        // BEST PRACTICE: Prefer stream operations over mutation
        List betterResults = numbers.stream()
            .filter(n -> n % 2 == 0)
            .map(n -> "Even: " + n)
            .collect(java.util.stream.Collectors.toList());
        System.out.println("Better results: " + betterResults);
        // Output: Better results: [Even: 2, Even: 4]
    }
}

8. Lambda vs Anonymous Class

Before lambdas, anonymous inner classes were the primary way to pass behavior as an argument. Both achieve similar goals, but they differ in important ways.

8.1 Side-by-Side Comparison

import java.util.Arrays;
import java.util.Comparator;
import java.util.List;

public class LambdaVsAnonymousClass {
    private String instanceField = "I'm an instance field";

    public void compare() {
        List names = Arrays.asList("Charlie", "Alice", "Bob");

        // ========== Anonymous inner class ==========
        names.sort(new Comparator() {
            @Override
            public int compare(String a, String b) {
                // 'this' refers to the anonymous Comparator instance
                System.out.println("this class: " + this.getClass().getSimpleName());
                return a.compareTo(b);
            }
        });
        System.out.println("Anonymous class sort: " + names);

        // ========== Lambda expression ==========
        List names2 = Arrays.asList("Charlie", "Alice", "Bob");
        names2.sort((a, b) -> {
            // 'this' refers to the enclosing LambdaVsAnonymousClass instance
            System.out.println("this field: " + this.instanceField);
            return a.compareTo(b);
        });
        System.out.println("Lambda sort: " + names2);
    }

    public static void main(String[] args) {
        new LambdaVsAnonymousClass().compare();
        // Output:
        // this class:
        // this class:
        // Anonymous class sort: [Alice, Bob, Charlie]
        // this field: I'm an instance field
        // this field: I'm an instance field
        // Lambda sort: [Alice, Bob, Charlie]
    }
}

8.2 Detailed Comparison Table

Aspect Anonymous Class Lambda Expression
Syntax Verbose — requires new Interface() { ... } Concise — (params) -> body
this keyword Refers to the anonymous class instance Refers to the enclosing class instance
Interface requirement Can implement any interface (including multi-method) Can only implement a functional interface (single abstract method)
State Can have its own fields and state Cannot have fields — stateless
Compilation Generates a separate .class file (e.g., Outer$1.class) Uses invokedynamic — no extra class file
Performance Slightly more overhead (class loading) Slightly better (deferred binding with invokedynamic)
Readability Harder to read for simple operations Much cleaner for simple operations
Shadowing Can shadow variables from enclosing scope Cannot shadow — shares enclosing scope

8.3 When to Use Each

Use a lambda when:

  • The interface has exactly one abstract method (functional interface)
  • The implementation is short (1-3 lines)
  • You do not need this to refer to the implementation itself
  • You do not need to maintain state

Use an anonymous class when:

  • The interface has multiple abstract methods
  • You need this to refer to the implementation instance
  • You need instance fields to maintain state across method calls
  • You want to override multiple methods from an abstract class

8.4 Migration Example

import java.util.ArrayList;
import java.util.Collections;
import java.util.List;

public class MigrationExample {
    public static void main(String[] args) {

        List names = new ArrayList<>(List.of("Charlie", "Alice", "Bob", "David"));

        // STEP 1: Original anonymous class
        Collections.sort(names, new java.util.Comparator() {
            @Override
            public int compare(String a, String b) {
                return a.compareToIgnoreCase(b);
            }
        });
        System.out.println("Step 1 (anonymous): " + names);

        // STEP 2: Replace with lambda
        names = new ArrayList<>(List.of("Charlie", "Alice", "Bob", "David"));
        Collections.sort(names, (a, b) -> a.compareToIgnoreCase(b));
        System.out.println("Step 2 (lambda): " + names);

        // STEP 3: Use List.sort() instead of Collections.sort()
        names = new ArrayList<>(List.of("Charlie", "Alice", "Bob", "David"));
        names.sort((a, b) -> a.compareToIgnoreCase(b));
        System.out.println("Step 3 (List.sort): " + names);

        // STEP 4: Use method reference
        names = new ArrayList<>(List.of("Charlie", "Alice", "Bob", "David"));
        names.sort(String::compareToIgnoreCase);
        System.out.println("Step 4 (method ref): " + names);

        // All output: [Alice, Bob, Charlie, David]
    }
}

9. Method References

A method reference is a shorthand notation for a lambda expression that simply calls an existing method. If your lambda does nothing more than call a single method, a method reference is cleaner.

There are four types of method references:

Type Syntax Lambda Equivalent Example
Static method Class::staticMethod (args) -> Class.staticMethod(args) Integer::parseInt
Instance method (bound) object::instanceMethod (args) -> object.instanceMethod(args) System.out::println
Instance method (unbound) Class::instanceMethod (obj, args) -> obj.instanceMethod(args) String::toUpperCase
Constructor Class::new (args) -> new Class(args) ArrayList::new
import java.util.Arrays;
import java.util.List;
import java.util.function.Function;
import java.util.function.Supplier;
import java.util.stream.Collectors;

public class MethodReferenceExamples {
    public static void main(String[] args) {

        List words = List.of("hello", "world", "java", "lambda");

        // ========== 1. Static method reference ==========
        // Lambda:        s -> Integer.parseInt(s)
        // Method ref:    Integer::parseInt
        List numberStrings = List.of("1", "2", "3", "4", "5");
        List numbers = numberStrings.stream()
            .map(Integer::parseInt)             // static method reference
            .collect(Collectors.toList());
        System.out.println("Static: " + numbers);
        // Output: Static: [1, 2, 3, 4, 5]


        // ========== 2. Bound instance method reference ==========
        // Lambda:        s -> System.out.println(s)
        // Method ref:    System.out::println
        System.out.println("Bound instance:");
        words.forEach(System.out::println);     // bound to System.out
        // Output:
        // hello
        // world
        // java
        // lambda


        // ========== 3. Unbound instance method reference ==========
        // Lambda:        s -> s.toUpperCase()
        // Method ref:    String::toUpperCase
        List upper = words.stream()
            .map(String::toUpperCase)           // unbound -- called on each element
            .collect(Collectors.toList());
        System.out.println("Unbound: " + upper);
        // Output: Unbound: [HELLO, WORLD, JAVA, LAMBDA]

        // Unbound with two arguments (used in Comparator)
        // Lambda:        (a, b) -> a.compareToIgnoreCase(b)
        // Method ref:    String::compareToIgnoreCase
        List sorted = Arrays.asList("banana", "Apple", "cherry");
        sorted.sort(String::compareToIgnoreCase);
        System.out.println("Sorted: " + sorted);
        // Output: Sorted: [Apple, banana, cherry]


        // ========== 4. Constructor reference ==========
        // Lambda:        () -> new ArrayList()
        // Method ref:    ArrayList::new
        Supplier> listFactory = java.util.ArrayList::new;
        List newList = listFactory.get();
        newList.add("Created with constructor reference");
        System.out.println("Constructor: " + newList);
        // Output: Constructor: [Created with constructor reference]

        // Constructor reference with parameters
        Function sbFactory = StringBuilder::new;
        StringBuilder sb = sbFactory.apply("Initial value");
        System.out.println("StringBuilder: " + sb);
        // Output: StringBuilder: Initial value
    }
}

Rule of thumb: If your lambda is (x) -> someMethod(x) or (x) -> x.someMethod(), it can usually be replaced with a method reference. Use method references when they improve clarity; stick with lambdas when the reference would be confusing.

10. Common Patterns

Lambdas are not just syntactic sugar — they enable cleaner implementations of well-known design patterns. Here are patterns you will use regularly.

10.1 Event Handling / Callbacks

Lambdas simplify callback-style programming. Instead of creating a class for every callback, pass behavior directly.

import java.util.ArrayList;
import java.util.List;
import java.util.function.Consumer;

// A simple event system using lambdas as callbacks
class EventEmitter {
    private final List> listeners = new ArrayList<>();

    public void on(Consumer listener) {
        listeners.add(listener);
    }

    public void emit(T event) {
        listeners.forEach(listener -> listener.accept(event));
    }
}

public class EventHandlingPattern {
    public static void main(String[] args) {
        EventEmitter emitter = new EventEmitter<>();

        // Register listeners using lambdas
        emitter.on(msg -> System.out.println("[LOG] " + msg));
        emitter.on(msg -> System.out.println("[ALERT] " + msg.toUpperCase()));
        emitter.on(msg -> {
            if (msg.contains("error")) {
                System.out.println("[ERROR HANDLER] Escalating: " + msg);
            }
        });

        emitter.emit("User logged in");
        // Output:
        // [LOG] User logged in
        // [ALERT] USER LOGGED IN

        System.out.println();

        emitter.emit("Database connection error");
        // Output:
        // [LOG] Database connection error
        // [ALERT] DATABASE CONNECTION ERROR
        // [ERROR HANDLER] Escalating: Database connection error
    }
}

10.2 Strategy Pattern

The Strategy pattern defines a family of algorithms and makes them interchangeable. With lambdas, you no longer need a separate class for each strategy.

import java.util.function.BiFunction;

public class StrategyPattern {

    // Before lambdas: separate classes for each strategy
    interface DiscountStrategy {
        double applyDiscount(double price, int quantity);
    }

    // With lambdas: strategies are just functions
    public static void main(String[] args) {

        // Define strategies as lambdas
        BiFunction noDiscount =
            (price, qty) -> price * qty;

        BiFunction percentageDiscount =
            (price, qty) -> price * qty * 0.9;  // 10% off

        BiFunction bulkDiscount =
            (price, qty) -> qty >= 10 ? price * qty * 0.8 : price * qty;  // 20% off for 10+

        BiFunction buyOneGetOneFree =
            (price, qty) -> price * (qty - qty / 2);  // Every second item free

        // Use the strategies
        double price = 25.0;

        System.out.println("No discount (5 items): $" + noDiscount.apply(price, 5));
        // Output: No discount (5 items): $125.0

        System.out.println("10% off (5 items): $" + percentageDiscount.apply(price, 5));
        // Output: 10% off (5 items): $112.5

        System.out.println("Bulk (15 items): $" + bulkDiscount.apply(price, 15));
        // Output: Bulk (15 items): $300.0

        System.out.println("BOGO (6 items): $" + buyOneGetOneFree.apply(price, 6));
        // Output: BOGO (6 items): $75.0
    }
}

10.3 Decorator Pattern

The Decorator pattern wraps behavior around a function. With lambdas, you compose decorators by chaining Function instances.

import java.util.function.Function;

public class DecoratorPattern {

    // A decorator that adds logging around any function
    static  Function withLogging(String name, Function fn) {
        return input -> {
            System.out.println("  [LOG] Calling " + name + " with: " + input);
            R result = fn.apply(input);
            System.out.println("  [LOG] " + name + " returned: " + result);
            return result;
        };
    }

    // A decorator that adds timing around any function
    static  Function withTiming(String name, Function fn) {
        return input -> {
            long start = System.nanoTime();
            R result = fn.apply(input);
            long elapsed = System.nanoTime() - start;
            System.out.println("  [TIMING] " + name + " took " + elapsed / 1000 + " microseconds");
            return result;
        };
    }

    public static void main(String[] args) {

        // Original function
        Function reverseString = s ->
            new StringBuilder(s).reverse().toString();

        // Decorate with logging
        Function loggedReverse = withLogging("reverse", reverseString);

        // Decorate with logging AND timing
        Function fullReverse = withTiming("reverse", withLogging("reverse", reverseString));

        System.out.println("--- Logged only ---");
        String result = loggedReverse.apply("Lambda");
        System.out.println("Result: " + result);
        // Output:
        //   [LOG] Calling reverse with: Lambda
        //   [LOG] reverse returned: adbmaL
        // Result: adbmaL

        System.out.println("\n--- Logged and timed ---");
        result = fullReverse.apply("Decorator");
        System.out.println("Result: " + result);
        // Output:
        //   [TIMING] reverse took ... microseconds
        //   [LOG] Calling reverse with: Decorator
        //   [LOG] reverse returned: rotaroceD
        // Result: rotaroceD
    }
}

10.4 Lazy Evaluation

Lambdas enable lazy evaluation — deferring computation until the result is actually needed. This can save significant resources when a value might not be used.

import java.util.function.Supplier;

public class LazyEvaluation {

    // Simulates an expensive computation
    static String loadConfiguration() {
        System.out.println("  Loading configuration from disk...");
        try { Thread.sleep(100); } catch (InterruptedException e) {}
        return "DB_URL=jdbc:mysql://localhost:3306/mydb";
    }

    // Without lazy evaluation: always computes the value
    static void logEager(boolean isDebug, String message) {
        if (isDebug) {
            System.out.println("[DEBUG] " + message);
        }
    }

    // With lazy evaluation: computes only if needed
    static void logLazy(boolean isDebug, Supplier messageSupplier) {
        if (isDebug) {
            System.out.println("[DEBUG] " + messageSupplier.get());
        }
    }

    public static void main(String[] args) {
        boolean debugMode = false;

        // EAGER: loadConfiguration() runs even though debugMode is false
        System.out.println("--- Eager (debug=false) ---");
        logEager(debugMode, "Config: " + loadConfiguration());
        // Output:
        //   Loading configuration from disk...
        // (the value was computed but never used!)

        // LAZY: loadConfiguration() does NOT run because debugMode is false
        System.out.println("\n--- Lazy (debug=false) ---");
        logLazy(debugMode, () -> "Config: " + loadConfiguration());
        // Output: (nothing -- the supplier was never called)

        // LAZY with debug enabled
        debugMode = true;
        System.out.println("\n--- Lazy (debug=true) ---");
        logLazy(debugMode, () -> "Config: " + loadConfiguration());
        // Output:
        //   Loading configuration from disk...
        // [DEBUG] Config: DB_URL=jdbc:mysql://localhost:3306/mydb
    }
}

11. Common Mistakes

Even experienced developers make mistakes with lambdas. Here are the most common pitfalls and how to avoid them.

11.1 Checked Exceptions in Lambdas

The built-in functional interfaces (Function, Consumer, Predicate, etc.) do not declare checked exceptions. If your lambda needs to throw a checked exception, it will not compile.

import java.util.List;
import java.util.function.Function;

public class CheckedExceptionMistake {

    // This is a method that throws a checked exception
    static String readFile(String path) throws java.io.IOException {
        // Simulate reading a file
        if (path.contains("missing")) {
            throw new java.io.IOException("File not found: " + path);
        }
        return "Content of " + path;
    }

    // Custom functional interface that allows checked exceptions
    @FunctionalInterface
    interface ThrowingFunction {
        R apply(T t) throws Exception;
    }

    // Wrapper method to convert a throwing function into a standard Function
    static  Function unchecked(ThrowingFunction fn) {
        return t -> {
            try {
                return fn.apply(t);
            } catch (Exception e) {
                throw new RuntimeException(e);
            }
        };
    }

    public static void main(String[] args) {

        List paths = List.of("file1.txt", "file2.txt");

        // PROBLEM: This does NOT compile!
        // paths.stream()
        //     .map(path -> readFile(path))  // ERROR: Unhandled IOException
        //     .forEach(System.out::println);

        // SOLUTION 1: Wrap in try-catch inside the lambda
        paths.stream()
            .map(path -> {
                try {
                    return readFile(path);
                } catch (java.io.IOException e) {
                    throw new RuntimeException(e);
                }
            })
            .forEach(System.out::println);
        // Output:
        // Content of file1.txt
        // Content of file2.txt

        // SOLUTION 2: Use a wrapper function (cleaner)
        paths.stream()
            .map(unchecked(CheckedExceptionMistake::readFile))
            .forEach(System.out::println);
        // Output:
        // Content of file1.txt
        // Content of file2.txt
    }
}

11.2 Side Effects in Stream Lambdas

Lambdas used in stream operations should be side-effect-free. Modifying external state from inside a stream pipeline leads to unpredictable behavior, especially with parallel streams.

import java.util.ArrayList;
import java.util.List;
import java.util.stream.Collectors;

public class SideEffectMistake {
    public static void main(String[] args) {

        List names = List.of("Alice", "Bob", "Charlie", "David");

        // BAD: Modifying external list from inside map()
        List results = new ArrayList<>();
        names.stream()
            .map(String::toUpperCase)
            .forEach(name -> results.add(name));  // side effect!
        System.out.println("Bad (side effect): " + results);
        // This might work with sequential streams, but BREAKS with parallel streams

        // GOOD: Use collect() to build the result
        List betterResults = names.stream()
            .map(String::toUpperCase)
            .collect(Collectors.toList());
        System.out.println("Good (collect): " + betterResults);
        // Output: Good (collect): [ALICE, BOB, CHARLIE, DAVID]

        // BAD: Accumulating a count with side effects
        int[] count = {0};
        names.stream().forEach(n -> count[0]++);
        System.out.println("Bad count: " + count[0]);  // works but fragile

        // GOOD: Use count()
        long goodCount = names.stream().count();
        System.out.println("Good count: " + goodCount);
        // Output: Good count: 4
    }
}

11.3 Overly Complex Lambdas

If a lambda spans more than 3-4 lines, it is too complex. Extract it into a named method for readability, testability, and reuse.

import java.util.List;
import java.util.function.Predicate;
import java.util.stream.Collectors;

public class ComplexLambdaMistake {

    // BAD: This lambda is too complex
    static List filterBad(List emails) {
        return emails.stream()
            .filter(email -> {
                if (email == null || email.isBlank()) return false;
                if (!email.contains("@")) return false;
                String[] parts = email.split("@");
                if (parts.length != 2) return false;
                String domain = parts[1];
                if (!domain.contains(".")) return false;
                if (domain.startsWith(".") || domain.endsWith(".")) return false;
                return true;
            })
            .collect(Collectors.toList());
    }

    // GOOD: Extract the logic into a named method
    static boolean isValidEmail(String email) {
        if (email == null || email.isBlank()) return false;
        if (!email.contains("@")) return false;
        String[] parts = email.split("@");
        if (parts.length != 2) return false;
        String domain = parts[1];
        if (!domain.contains(".")) return false;
        return !domain.startsWith(".") && !domain.endsWith(".");
    }

    static List filterGood(List emails) {
        return emails.stream()
            .filter(ComplexLambdaMistake::isValidEmail)  // Clean and readable
            .collect(Collectors.toList());
    }

    public static void main(String[] args) {
        List emails = List.of(
            "alice@example.com",
            "invalid",
            "",
            "bob@test.org",
            "bad@.com",
            "ok@domain.io"
        );

        System.out.println("Valid emails: " + filterGood(emails));
        // Output: Valid emails: [alice@example.com, bob@test.org, ok@domain.io]
    }
}

11.4 Forgetting the Functional Interface Requirement

Lambdas can only be used where a functional interface is expected. You cannot use a lambda to implement an interface with multiple abstract methods, or assign a lambda to an Object variable without a cast.

public class FunctionalInterfaceRequirement {
    // Interface with TWO abstract methods -- NOT functional
    interface TwoMethods {
        void methodA();
        void methodB();
    }

    public static void main(String[] args) {

        // ERROR: Cannot use lambda -- TwoMethods is not a functional interface
        // TwoMethods t = () -> System.out.println("Hello"); // COMPILE ERROR

        // ERROR: Cannot assign lambda to Object without cast
        // Object obj = () -> System.out.println("Hello"); // COMPILE ERROR

        // FIX: Cast to a specific functional interface
        Object obj = (Runnable) () -> System.out.println("Hello");
        ((Runnable) obj).run();
        // Output: Hello

        // COMMON GOTCHA: Overloaded methods can cause ambiguity
        // If a method accepts both Runnable and Callable, the compiler might not
        // know which one a no-arg lambda should map to.
    }
}

11.5 Lambda Serialization Issues

Lambdas are not serializable by default. If you need to serialize a lambda (e.g., for distributed computing frameworks), the target functional interface must extend Serializable.

import java.io.*;
import java.util.function.Predicate;

public class SerializationMistake {

    // Regular functional interface -- NOT serializable
    @FunctionalInterface
    interface RegularPredicate {
        boolean test(T t);
    }

    // Serializable functional interface
    @FunctionalInterface
    interface SerializablePredicate extends Predicate, Serializable {
    }

    public static void main(String[] args) {

        // This lambda is NOT serializable
        RegularPredicate notSerializable = s -> s.length() > 5;

        // This lambda IS serializable
        SerializablePredicate serializable = s -> s.length() > 5;

        // Or use an intersection cast (less clean but avoids a custom interface)
        Predicate alsoSerializable = (Predicate & Serializable) s -> s.length() > 5;

        System.out.println("Test 'Lambda': " + serializable.test("Lambda"));
        // Output: Test 'Lambda': true
    }
}

12. Best Practices

Follow these guidelines to write lambdas that are clean, maintainable, and efficient.

# Practice Do Don’t
1 Keep lambdas short 1-3 lines max Write 10+ line lambdas
2 Use method references String::toUpperCase s -> s.toUpperCase() when a reference is clearer
3 Avoid side effects collect() to build results Mutate external state in forEach()
4 Use meaningful parameter names (name, age) -> ... (a, b) -> ... when context is unclear
5 Extract complex lambdas Move to a named private method Inline a 10-line validation lambda
6 Prefer standard interfaces Use Predicate, Function, Consumer Create custom interface when a standard one fits
7 Use @FunctionalInterface Annotate your custom interfaces Rely on convention alone
8 Handle exceptions explicitly Wrapper methods for checked exceptions Swallow exceptions in catch blocks
9 Consider readability Use anonymous class if lambda is confusing Force everything into a lambda
10 Leverage type inference (a, b) -> a + b (Integer a, Integer b) -> a + b when types are obvious
import java.util.*;
import java.util.function.*;
import java.util.stream.Collectors;

public class LambdaBestPractices {

    // BEST PRACTICE: Extract complex logic into named methods
    static boolean isEligibleForDiscount(Map customer) {
        int age = (int) customer.get("age");
        boolean isMember = (boolean) customer.get("member");
        double totalSpent = (double) customer.get("totalSpent");
        return (age >= 65 || isMember) && totalSpent > 100.0;
    }

    // BEST PRACTICE: Use standard functional interfaces with clear names
    static  List filterBy(List items, Predicate criteria) {
        return items.stream()
            .filter(criteria)
            .collect(Collectors.toList());
    }

    // BEST PRACTICE: Compose small, focused predicates
    public static void main(String[] args) {

        List words = List.of("Lambda", "is", "a", "powerful", "feature", "in", "Java");

        // GOOD: Small, focused predicates composed together
        Predicate longerThan2 = word -> word.length() > 2;
        Predicate startsWithLower = word -> Character.isLowerCase(word.charAt(0));

        List result = words.stream()
            .filter(longerThan2.and(startsWithLower))
            .map(String::toUpperCase)           // method reference (cleaner)
            .sorted()                            // natural order
            .collect(Collectors.toList());

        System.out.println("Filtered: " + result);
        // Output: Filtered: [FEATURE, POWERFUL]

        // GOOD: Meaningful parameter names
        Map> grouped = words.stream()
            .collect(Collectors.groupingBy(word -> word.substring(0, 1).toUpperCase()));
        System.out.println("Grouped: " + grouped);

        // GOOD: Use Comparator helpers instead of raw lambdas
        List sortedByLength = new ArrayList<>(words);
        sortedByLength.sort(
            Comparator.comparingInt(String::length)
                      .thenComparing(Comparator.naturalOrder())
        );
        System.out.println("Sorted: " + sortedByLength);
        // Output: Sorted: [a, in, is, Java, Lambda, feature, powerful]
    }
}

13. Complete Practical Example: Student Data Processing

Let us put everything together with a real-world example. We will build a student records processing system that demonstrates lambdas for filtering, sorting, transforming, grouping, and reporting.

import java.util.*;
import java.util.function.*;
import java.util.stream.Collectors;

public class StudentDataProcessing {

    // ========== Student record ==========
    static class Student {
        private final String name;
        private final String major;
        private final double gpa;
        private final int age;
        private final List courses;

        Student(String name, String major, double gpa, int age, List courses) {
            this.name = name;
            this.major = major;
            this.gpa = gpa;
            this.age = age;
            this.courses = courses;
        }

        public String getName()          { return name; }
        public String getMajor()         { return major; }
        public double getGpa()           { return gpa; }
        public int getAge()              { return age; }
        public List getCourses() { return courses; }

        @Override
        public String toString() {
            return String.format("%s (Major: %s, GPA: %.1f, Age: %d)", name, major, gpa, age);
        }
    }

    // ========== Custom functional interface for reporting ==========
    @FunctionalInterface
    interface ReportGenerator {
        String generate(List data);
    }

    // ========== Utility: generic filter + transform pipeline ==========
    static  List pipeline(List data, Predicate filter, Function transform) {
        return data.stream()
            .filter(filter)
            .map(transform)
            .collect(Collectors.toList());
    }

    // ========== Main ==========
    public static void main(String[] args) {

        // Create sample data
        List students = List.of(
            new Student("Alice",   "Computer Science", 3.8, 21, List.of("Java", "Algorithms", "Databases")),
            new Student("Bob",     "Mathematics",      3.2, 22, List.of("Calculus", "Statistics", "Algorithms")),
            new Student("Charlie", "Computer Science", 3.5, 20, List.of("Java", "Networks", "AI")),
            new Student("Diana",   "Physics",          3.9, 23, List.of("Quantum", "Calculus", "Statistics")),
            new Student("Eve",     "Computer Science", 2.8, 21, List.of("Java", "Web Dev", "Databases")),
            new Student("Frank",   "Mathematics",      3.6, 22, List.of("Calculus", "Algorithms", "Statistics")),
            new Student("Grace",   "Physics",          3.1, 20, List.of("Quantum", "Mechanics", "Calculus")),
            new Student("Hank",    "Computer Science", 3.7, 23, List.of("Java", "AI", "Networks")),
            new Student("Ivy",     "Mathematics",      3.4, 21, List.of("Statistics", "Algebra", "Calculus")),
            new Student("Jack",    "Physics",          2.9, 22, List.of("Mechanics", "Quantum", "Statistics"))
        );

        System.out.println("=== STUDENT DATA PROCESSING SYSTEM ===\n");


        // ===== 1. FILTERING with Predicate =====
        System.out.println("--- 1. Honor Roll (GPA >= 3.5) ---");
        Predicate isHonorRoll = student -> student.getGpa() >= 3.5;

        students.stream()
            .filter(isHonorRoll)
            .forEach(s -> System.out.println("  " + s));
        // Output:
        //   Alice (Major: Computer Science, GPA: 3.8, Age: 21)
        //   Charlie (Major: Computer Science, GPA: 3.5, Age: 20)
        //   Diana (Major: Physics, GPA: 3.9, Age: 23)
        //   Frank (Major: Mathematics, GPA: 3.6, Age: 22)
        //   Hank (Major: Computer Science, GPA: 3.7, Age: 23)


        // ===== 2. COMPOSED PREDICATES =====
        System.out.println("\n--- 2. CS students on Honor Roll ---");
        Predicate isCS = s -> s.getMajor().equals("Computer Science");
        Predicate csHonor = isCS.and(isHonorRoll);

        students.stream()
            .filter(csHonor)
            .forEach(s -> System.out.println("  " + s));
        // Output:
        //   Alice (Major: Computer Science, GPA: 3.8, Age: 21)
        //   Charlie (Major: Computer Science, GPA: 3.5, Age: 20)
        //   Hank (Major: Computer Science, GPA: 3.7, Age: 23)


        // ===== 3. SORTING with Comparator lambdas =====
        System.out.println("\n--- 3. All students sorted by GPA (descending) ---");
        students.stream()
            .sorted(Comparator.comparingDouble(Student::getGpa).reversed())
            .forEach(s -> System.out.println("  " + s));
        // Output:
        //   Diana (Major: Physics, GPA: 3.9, Age: 23)
        //   Alice (Major: Computer Science, GPA: 3.8, Age: 21)
        //   Hank (Major: Computer Science, GPA: 3.7, Age: 23)
        //   Frank (Major: Mathematics, GPA: 3.6, Age: 22)
        //   Charlie (Major: Computer Science, GPA: 3.5, Age: 20)
        //   Ivy (Major: Mathematics, GPA: 3.4, Age: 21)
        //   Bob (Major: Mathematics, GPA: 3.2, Age: 22)
        //   Grace (Major: Physics, GPA: 3.1, Age: 20)
        //   Jack (Major: Physics, GPA: 2.9, Age: 22)
        //   Eve (Major: Computer Science, GPA: 2.8, Age: 21)


        // ===== 4. TRANSFORMATION with Function =====
        System.out.println("\n--- 4. Student names in uppercase ---");
        Function toNameUpper = s -> s.getName().toUpperCase();

        List upperNames = students.stream()
            .map(toNameUpper)
            .collect(Collectors.toList());
        System.out.println("  " + upperNames);
        // Output: [ALICE, BOB, CHARLIE, DIANA, EVE, FRANK, GRACE, HANK, IVY, JACK]


        // ===== 5. GROUPING with Collectors =====
        System.out.println("\n--- 5. Students grouped by major ---");
        Map> byMajor = students.stream()
            .collect(Collectors.groupingBy(Student::getMajor));

        byMajor.forEach((major, list) -> {
            System.out.println("  " + major + ":");
            list.forEach(s -> System.out.println("    - " + s.getName() + " (GPA: " + s.getGpa() + ")"));
        });
        // Output:
        //   Computer Science:
        //     - Alice (GPA: 3.8)
        //     - Charlie (GPA: 3.5)
        //     - Eve (GPA: 2.8)
        //     - Hank (GPA: 3.7)
        //   Mathematics:
        //     - Bob (GPA: 3.2)
        //     - Frank (GPA: 3.6)
        //     - Ivy (GPA: 3.4)
        //   Physics:
        //     - Diana (GPA: 3.9)
        //     - Grace (GPA: 3.1)
        //     - Jack (GPA: 2.9)


        // ===== 6. STATISTICS with reduce and Collectors =====
        System.out.println("\n--- 6. GPA Statistics by Major ---");
        Map statsByMajor = students.stream()
            .collect(Collectors.groupingBy(
                Student::getMajor,
                Collectors.summarizingDouble(Student::getGpa)
            ));

        statsByMajor.forEach((major, stats) ->
            System.out.printf("  %s: avg=%.2f, min=%.1f, max=%.1f%n",
                major, stats.getAverage(), stats.getMin(), stats.getMax())
        );
        // Output:
        //   Computer Science: avg=3.45, min=2.8, max=3.8
        //   Mathematics: avg=3.40, min=3.2, max=3.6
        //   Physics: avg=3.30, min=2.9, max=3.9


        // ===== 7. PIPELINE utility with Predicate + Function =====
        System.out.println("\n--- 7. Pipeline: CS student names with high GPA ---");
        List csHonorNames = pipeline(
            students,
            isCS.and(isHonorRoll),      // composed Predicate
            Student::getName             // method reference as Function
        );
        System.out.println("  " + csHonorNames);
        // Output: [Alice, Charlie, Hank]


        // ===== 8. COURSE ANALYSIS with flatMap and lambdas =====
        System.out.println("\n--- 8. Most popular courses ---");
        Map courseCounts = students.stream()
            .flatMap(s -> s.getCourses().stream())
            .collect(Collectors.groupingBy(
                course -> course,                 // grouping key
                Collectors.counting()             // count per group
            ));

        courseCounts.entrySet().stream()
            .sorted(Map.Entry.comparingByValue().reversed())
            .forEach(entry -> System.out.println("  " + entry.getKey() + ": " + entry.getValue() + " students"));
        // Output:
        //   Calculus: 4 students
        //   Java: 4 students
        //   Statistics: 4 students
        //   Quantum: 3 students
        //   Algorithms: 3 students
        //   ...


        // ===== 9. CUSTOM REPORT with functional interface =====
        System.out.println("\n--- 9. Custom Honor Roll Report ---");
        ReportGenerator honorRollReport = data -> {
            StringBuilder sb = new StringBuilder();
            sb.append("Honor Roll Report\n");
            sb.append("=================\n");

            List honorStudents = data.stream()
                .filter(isHonorRoll)
                .sorted(Comparator.comparingDouble(Student::getGpa).reversed())
                .collect(Collectors.toList());

            sb.append(String.format("Total honor students: %d / %d%n", honorStudents.size(), data.size()));
            sb.append(String.format("Percentage: %.0f%%%n%n",
                (double) honorStudents.size() / data.size() * 100));

            honorStudents.forEach(s ->
                sb.append(String.format("  %-10s | %-20s | GPA: %.1f%n",
                    s.getName(), s.getMajor(), s.getGpa()))
            );

            return sb.toString();
        };

        System.out.println(honorRollReport.generate(students));


        // ===== 10. CONSUMER chaining for notifications =====
        System.out.println("--- 10. Student notifications ---");
        Consumer emailNotification = s ->
            System.out.println("  [EMAIL] Congratulations " + s.getName() + "! You made the honor roll.");
        Consumer smsNotification = s ->
            System.out.println("  [SMS] " + s.getName() + ", check your email for honor roll details.");
        Consumer logNotification = s ->
            System.out.println("  [LOG] Notification sent to " + s.getName());

        Consumer notifyAll = emailNotification.andThen(smsNotification).andThen(logNotification);

        students.stream()
            .filter(csHonor)
            .forEach(notifyAll);
        // Output:
        //   [EMAIL] Congratulations Alice! You made the honor roll.
        //   [SMS] Alice, check your email for honor roll details.
        //   [LOG] Notification sent to Alice
        //   [EMAIL] Congratulations Charlie! You made the honor roll.
        //   [SMS] Charlie, check your email for honor roll details.
        //   [LOG] Notification sent to Charlie
        //   [EMAIL] Congratulations Hank! You made the honor roll.
        //   [SMS] Hank, check your email for honor roll details.
        //   [LOG] Notification sent to Hank


        // ===== Summary =====
        System.out.println("\n=== LAMBDA CONCEPTS DEMONSTRATED ===");
        System.out.println("1.  Predicate          - filtering students by GPA");
        System.out.println("2.  Predicate.and()     - combining CS + honor roll filters");
        System.out.println("3.  Comparator lambda   - sorting by GPA descending");
        System.out.println("4.  Function            - transforming student to name");
        System.out.println("5.  Collectors.groupingBy - grouping by major");
        System.out.println("6.  summarizingDouble   - GPA statistics per major");
        System.out.println("7.  Pipeline utility    - generic filter + transform method");
        System.out.println("8.  flatMap + lambda    - course frequency analysis");
        System.out.println("9.  Custom @FunctionalInterface - report generation");
        System.out.println("10. Consumer.andThen()  - chained notification actions");
    }
}

Quick Reference

Concept Summary Example
Lambda syntax Parameters -> body (a, b) -> a + b
Functional interface Interface with one abstract method @FunctionalInterface
Predicate T -> boolean n -> n > 0
Function T -> R s -> s.length()
Consumer T -> void s -> System.out.println(s)
Supplier () -> T () -> new ArrayList<>()
UnaryOperator T -> T s -> s.toUpperCase()
BinaryOperator (T, T) -> T (a, b) -> a + b
Method reference Shorthand for single-method lambda String::toUpperCase
Effectively final Local vars captured by lambdas cannot be modified Use AtomicInteger or stream reduce()
this keyword In lambdas, refers to enclosing class (not the lambda) Unlike anonymous classes
Checked exceptions Standard functional interfaces don’t allow checked exceptions Use wrapper or custom interface
July 23, 2021

Java – Stream

1. What is the Stream API?

Imagine an assembly line in a factory. Raw materials enter at one end, pass through a series of workstations — each performing a specific operation like cutting, painting, or inspecting — and a finished product comes out the other end. The assembly line does not store the materials; it processes them as they flow through.

The Java Stream API, introduced in Java 8, works exactly like that assembly line. A Stream is a sequence of elements that supports a pipeline of operations to process data declaratively — you describe what you want, not how to do it step by step.

Key characteristics of Streams:

  • Not a data structure — A Stream does not store elements. It pulls elements from a source (like a List or array) and pushes them through a pipeline of operations.
  • Pipeline of operations — You chain multiple operations together. Each operation takes input, transforms it, and passes the result to the next operation.
  • Lazy evaluation — Intermediate operations are not executed until a terminal operation is invoked. This allows the Stream to optimize the processing (e.g., short-circuiting).
  • Does not modify the source — Streaming over a List does not add, remove, or change elements in that List. The original data remains untouched.
  • Can only be consumed once — Once a terminal operation is called, the Stream is spent. To process the data again, you must create a new Stream.
  • Supports parallelism — You can switch to parallel execution with a single method call, leveraging multi-core processors.

Stream Pipeline Structure

Every Stream pipeline has three parts:

Part Description Example
Source Where the data comes from list.stream(), Arrays.stream(arr)
Intermediate operations Transform the stream (lazy, return a new Stream) filter(), map(), sorted()
Terminal operation Produces a result or side effect (triggers execution) collect(), forEach(), count()
import java.util.Arrays;
import java.util.List;

public class StreamIntro {
    public static void main(String[] args) {
        List names = Arrays.asList("Alice", "Bob", "Charlie", "David", "Eve");

        // Stream pipeline: source -> intermediate ops -> terminal op
        long count = names.stream()          // Source: create stream from list
                         .filter(n -> n.length() > 3)  // Intermediate: keep names longer than 3 chars
                         .map(String::toUpperCase)      // Intermediate: convert to uppercase
                         .count();                      // Terminal: count remaining elements

        System.out.println("Count: " + count);
        // Output: Count: 3

        // The original list is unchanged
        System.out.println("Original: " + names);
        // Output: Original: [Alice, Bob, Charlie, David, Eve]
    }
}

2. Creating Streams

Before you can process data with the Stream API, you need to create a Stream. Java provides multiple ways to do this depending on your data source.

2.1 From Collections

The most common way. Every class that implements Collection (List, Set, Queue) has a stream() method.

import java.util.*;

public class StreamFromCollections {
    public static void main(String[] args) {
        // From a List
        List list = List.of("Java", "Python", "Go");
        list.stream().forEach(System.out::println);

        // From a Set
        Set set = Set.of(1, 2, 3, 4, 5);
        set.stream().filter(n -> n % 2 == 0).forEach(System.out::println);

        // From a Map (via entrySet, keySet, or values)
        Map map = Map.of("Alice", 90, "Bob", 85);
        map.entrySet().stream()
           .filter(e -> e.getValue() > 87)
           .forEach(e -> System.out.println(e.getKey() + ": " + e.getValue()));
        // Output: Alice: 90
    }
}

2.2 From Arrays

import java.util.Arrays;
import java.util.stream.Stream;

public class StreamFromArrays {
    public static void main(String[] args) {
        String[] colors = {"Red", "Green", "Blue"};

        // Using Arrays.stream()
        Arrays.stream(colors).forEach(System.out::println);

        // Partial array: from index 1 (inclusive) to 3 (exclusive)
        Arrays.stream(colors, 1, 3).forEach(System.out::println);
        // Output: Green, Blue

        // Using Stream.of()
        Stream.of("One", "Two", "Three").forEach(System.out::println);

        // From a primitive array -- returns IntStream, not Stream
        int[] numbers = {10, 20, 30};
        int sum = Arrays.stream(numbers).sum();
        System.out.println("Sum: " + sum); // Output: Sum: 60
    }
}

2.3 Stream Factory Methods

import java.util.stream.Stream;
import java.util.stream.IntStream;
import java.util.List;

public class StreamFactoryMethods {
    public static void main(String[] args) {
        // Stream.empty() -- useful as a return value instead of null
        Stream empty = Stream.empty();
        System.out.println("Empty count: " + empty.count()); // Output: Empty count: 0

        // Stream.of() -- create from individual elements
        Stream languages = Stream.of("Java", "Python", "Go");

        // Stream.generate() -- infinite stream from a Supplier
        // MUST use limit() or it runs forever!
        Stream.generate(Math::random)
              .limit(3)
              .forEach(n -> System.out.printf("%.2f%n", n));

        // Stream.iterate() -- infinite stream with a seed and unary operator
        // Java 8 style (no predicate -- must use limit)
        Stream.iterate(1, n -> n * 2)
              .limit(5)
              .forEach(System.out::println);
        // Output: 1, 2, 4, 8, 16

        // Java 9+ style (with predicate -- like a for loop)
        Stream.iterate(1, n -> n <= 100, n -> n * 2)
              .forEach(System.out::println);
        // Output: 1, 2, 4, 8, 16, 32, 64

        // IntStream.range() and rangeClosed()
        IntStream.range(1, 5).forEach(System.out::println);       // 1, 2, 3, 4
        IntStream.rangeClosed(1, 5).forEach(System.out::println);  // 1, 2, 3, 4, 5
    }
}

2.4 Streams from Files

You can create a Stream of lines from a file using Files.lines(). This is memory-efficient because it reads lines lazily rather than loading the entire file into memory.

import java.nio.file.Files;
import java.nio.file.Paths;
import java.io.IOException;
import java.util.stream.Stream;

public class StreamFromFiles {
    public static void main(String[] args) {
        // Files.lines() returns a Stream -- one element per line
        // Use try-with-resources because the stream must be closed
        try (Stream lines = Files.lines(Paths.get("data.txt"))) {
            lines.filter(line -> !line.isBlank())
                 .map(String::trim)
                 .forEach(System.out::println);
        } catch (IOException e) {
            System.err.println("Error reading file: " + e.getMessage());
        }
    }
}

Stream Creation Summary

Method Returns Use Case
collection.stream() Stream<T> Most common — stream from any Collection
Arrays.stream(array) Stream<T> or IntStream Stream from an array
Stream.of(a, b, c) Stream<T> Stream from individual values
Stream.empty() Stream<T> Empty stream (null-safe return)
Stream.generate(supplier) Stream<T> Infinite stream from a Supplier
Stream.iterate(seed, op) Stream<T> Infinite stream with iterative computation
IntStream.range(a, b) IntStream Range of ints [a, b)
IntStream.rangeClosed(a, b) IntStream Range of ints [a, b]
Files.lines(path) Stream<String> Lazy line-by-line file reading

3. Intermediate Operations

Intermediate operations transform a Stream into another Stream. They are lazy — nothing happens until a terminal operation triggers the pipeline. You can chain as many intermediate operations as you need.

3.1 filter()

filter(Predicate<T>) keeps only the elements that match the given condition. Think of it as a sieve — elements that pass the test go through; those that do not are discarded.

import java.util.List;
import java.util.stream.Collectors;

public class FilterExample {
    public static void main(String[] args) {
        List numbers = List.of(1, 2, 3, 4, 5, 6, 7, 8, 9, 10);

        // Keep only even numbers
        List evens = numbers.stream()
                                     .filter(n -> n % 2 == 0)
                                     .collect(Collectors.toList());
        System.out.println("Evens: " + evens);
        // Output: Evens: [2, 4, 6, 8, 10]

        // Chaining multiple filters (equivalent to && in the predicate)
        List result = numbers.stream()
                                      .filter(n -> n > 3)
                                      .filter(n -> n < 8)
                                      .collect(Collectors.toList());
        System.out.println("Between 3 and 8: " + result);
        // Output: Between 3 and 8: [4, 5, 6, 7]

        // Filter with objects
        List names = List.of("Alice", "Bob", "Charlie", "Ana", "Albert");
        List aNames = names.stream()
                                   .filter(name -> name.startsWith("A"))
                                   .collect(Collectors.toList());
        System.out.println("A-names: " + aNames);
        // Output: A-names: [Alice, Ana, Albert]
    }
}

3.2 map()

map(Function<T, R>) transforms each element from type T to type R. It applies the given function to every element and produces a new Stream of the results. This is one of the most frequently used operations.

import java.util.List;
import java.util.stream.Collectors;

public class MapExample {
    public static void main(String[] args) {
        List names = List.of("alice", "bob", "charlie");

        // Transform: String -> String (uppercase)
        List upper = names.stream()
                                  .map(String::toUpperCase)
                                  .collect(Collectors.toList());
        System.out.println(upper);
        // Output: [ALICE, BOB, CHARLIE]

        // Transform: String -> Integer (get length)
        List lengths = names.stream()
                                     .map(String::length)
                                     .collect(Collectors.toList());
        System.out.println(lengths);
        // Output: [5, 3, 7]

        // Transform: Integer -> String
        List numbers = List.of(1, 2, 3);
        List labels = numbers.stream()
                                     .map(n -> "Item #" + n)
                                     .collect(Collectors.toList());
        System.out.println(labels);
        // Output: [Item #1, Item #2, Item #3]
    }
}

3.3 flatMap()

flatMap(Function<T, Stream<R>>) is used when each element maps to multiple elements (a stream of values). It “flattens” nested structures into a single stream. This is essential when you have lists of lists, or when a mapping function returns a collection for each element.

import java.util.List;
import java.util.stream.Collectors;

public class FlatMapExample {
    public static void main(String[] args) {
        // Problem: We have a list of lists and want a single flat list
        List> nested = List.of(
            List.of("Java", "Kotlin"),
            List.of("Python", "Ruby"),
            List.of("Go", "Rust")
        );

        // Using map() -- gives Stream>, NOT what we want
        // Using flatMap() -- gives Stream, flattened!
        List flat = nested.stream()
                                  .flatMap(List::stream)  // Each list becomes a stream, all merged
                                  .collect(Collectors.toList());
        System.out.println(flat);
        // Output: [Java, Kotlin, Python, Ruby, Go, Rust]

        // Real-world: extracting all words from sentences
        List sentences = List.of("Hello World", "Java Streams are powerful");
        List words = sentences.stream()
                                      .flatMap(s -> List.of(s.split(" ")).stream())
                                      .collect(Collectors.toList());
        System.out.println(words);
        // Output: [Hello, World, Java, Streams, are, powerful]

        // Real-world: customers with multiple orders
        // Each customer has a list of orders; we want all orders in one stream
        // customer.stream().flatMap(c -> c.getOrders().stream())
    }
}

3.4 sorted()

sorted() sorts elements in natural order (for types implementing Comparable). You can also pass a custom Comparator for complex sorting.

import java.util.Comparator;
import java.util.List;
import java.util.stream.Collectors;

public class SortedExample {
    public static void main(String[] args) {
        // Natural order (ascending)
        List numbers = List.of(5, 3, 8, 1, 9, 2);
        List sorted = numbers.stream()
                                      .sorted()
                                      .collect(Collectors.toList());
        System.out.println(sorted);
        // Output: [1, 2, 3, 5, 8, 9]

        // Reverse order
        List descending = numbers.stream()
                                          .sorted(Comparator.reverseOrder())
                                          .collect(Collectors.toList());
        System.out.println(descending);
        // Output: [9, 8, 5, 3, 2, 1]

        // Sorting strings by length
        List names = List.of("Charlie", "Bob", "Alice", "Eve");
        List byLength = names.stream()
                                     .sorted(Comparator.comparingInt(String::length))
                                     .collect(Collectors.toList());
        System.out.println(byLength);
        // Output: [Bob, Eve, Alice, Charlie]

        // Sorting by length, then alphabetically for ties
        List byLengthThenAlpha = names.stream()
            .sorted(Comparator.comparingInt(String::length).thenComparing(Comparator.naturalOrder()))
            .collect(Collectors.toList());
        System.out.println(byLengthThenAlpha);
        // Output: [Bob, Eve, Alice, Charlie]
    }
}

3.5 distinct()

distinct() removes duplicate elements from the stream. It relies on the equals() and hashCode() methods to determine equality. For custom objects, you must override these methods for distinct() to work correctly.

import java.util.List;
import java.util.stream.Collectors;

public class DistinctExample {
    public static void main(String[] args) {
        List numbers = List.of(1, 2, 3, 2, 4, 3, 5, 1);
        List unique = numbers.stream()
                                      .distinct()
                                      .collect(Collectors.toList());
        System.out.println(unique);
        // Output: [1, 2, 3, 4, 5]

        // With strings (equals/hashCode already implemented)
        List words = List.of("hello", "world", "hello", "java", "world");
        List uniqueWords = words.stream()
                                        .distinct()
                                        .collect(Collectors.toList());
        System.out.println(uniqueWords);
        // Output: [hello, world, java]
    }
}

3.6 peek()

peek(Consumer<T>) allows you to perform a side effect on each element without modifying the stream. Its primary use is debugging — inspecting elements at a certain stage of the pipeline. Avoid using peek() for business logic; it may not execute if the pipeline is optimized away.

import java.util.List;
import java.util.stream.Collectors;

public class PeekExample {
    public static void main(String[] args) {
        List result = List.of("one", "two", "three", "four")
            .stream()
            .filter(s -> s.length() > 3)
            .peek(s -> System.out.println("After filter: " + s))
            .map(String::toUpperCase)
            .peek(s -> System.out.println("After map: " + s))
            .collect(Collectors.toList());

        // Output:
        // After filter: three
        // After map: THREE
        // After filter: four
        // After map: FOUR

        System.out.println("Result: " + result);
        // Output: Result: [THREE, FOUR]
    }
}

3.7 limit() and skip()

limit(n) truncates the stream to at most n elements. skip(n) discards the first n elements. Together, they form a powerful pagination pattern.

import java.util.List;
import java.util.stream.Collectors;
import java.util.stream.IntStream;

public class LimitSkipExample {
    public static void main(String[] args) {
        List numbers = List.of(1, 2, 3, 4, 5, 6, 7, 8, 9, 10);

        // First 3 elements
        List firstThree = numbers.stream()
                                          .limit(3)
                                          .collect(Collectors.toList());
        System.out.println("First 3: " + firstThree);
        // Output: First 3: [1, 2, 3]

        // Skip first 7 elements
        List lastThree = numbers.stream()
                                         .skip(7)
                                         .collect(Collectors.toList());
        System.out.println("Last 3: " + lastThree);
        // Output: Last 3: [8, 9, 10]

        // Pagination pattern: page 2, page size 3 (items 4, 5, 6)
        int pageSize = 3;
        int pageNumber = 2; // 1-based
        List page = numbers.stream()
                                    .skip((long) (pageNumber - 1) * pageSize)
                                    .limit(pageSize)
                                    .collect(Collectors.toList());
        System.out.println("Page 2: " + page);
        // Output: Page 2: [4, 5, 6]
    }
}

3.8 mapToInt(), mapToLong(), mapToDouble()

These operations convert a Stream<T> to a primitive stream (IntStream, LongStream, DoubleStream). Primitive streams avoid autoboxing overhead and provide specialized methods like sum(), average(), and max().

import java.util.List;

public class MapToPrimitiveExample {
    public static void main(String[] args) {
        List names = List.of("Alice", "Bob", "Charlie", "David");

        // mapToInt: get lengths as IntStream
        int totalChars = names.stream()
                              .mapToInt(String::length)
                              .sum();
        System.out.println("Total characters: " + totalChars);
        // Output: Total characters: 20

        // average returns OptionalDouble
        names.stream()
             .mapToInt(String::length)
             .average()
             .ifPresent(avg -> System.out.printf("Average length: %.1f%n", avg));
        // Output: Average length: 5.0

        // mapToDouble: useful for decimal calculations
        List prices = List.of(100, 200, 300);
        double totalWithTax = prices.stream()
                                    .mapToDouble(p -> p * 1.08)
                                    .sum();
        System.out.printf("Total with tax: %.2f%n", totalWithTax);
        // Output: Total with tax: 648.00
    }
}

Intermediate Operations Summary

Operation Input Output Purpose
filter(Predicate) Stream<T> Stream<T> Keep elements matching condition
map(Function) Stream<T> Stream<R> Transform each element
flatMap(Function) Stream<T> Stream<R> Flatten nested streams
sorted() Stream<T> Stream<T> Sort elements
distinct() Stream<T> Stream<T> Remove duplicates
peek(Consumer) Stream<T> Stream<T> Debug / inspect
limit(long) Stream<T> Stream<T> Truncate to n elements
skip(long) Stream<T> Stream<T> Skip first n elements
mapToInt(Function) Stream<T> IntStream Convert to primitive int stream

4. Terminal Operations

Terminal operations are the final step of a stream pipeline. They trigger the execution of all intermediate operations and produce a result (a value, a collection, or a side effect). Once a terminal operation is called, the stream is consumed and cannot be reused.

4.1 forEach()

forEach(Consumer<T>) performs an action on each element. It is the stream equivalent of a for-each loop. Note that forEach does not guarantee order when used with parallel streams. Use forEachOrdered() if order matters.

import java.util.List;

public class ForEachExample {
    public static void main(String[] args) {
        List names = List.of("Alice", "Bob", "Charlie");

        // Simple forEach
        names.stream().forEach(System.out::println);
        // Output: Alice, Bob, Charlie

        // forEach with lambda
        names.stream().forEach(name -> System.out.println("Hello, " + name + "!"));
        // Output:
        // Hello, Alice!
        // Hello, Bob!
        // Hello, Charlie!

        // Warning: forEach on parallel stream -- order NOT guaranteed
        names.parallelStream().forEach(System.out::println);
        // Output: order may vary!

        // Use forEachOrdered to maintain encounter order
        names.parallelStream().forEachOrdered(System.out::println);
        // Output: Alice, Bob, Charlie (guaranteed order)
    }
}

4.2 collect()

collect() is the most versatile terminal operation. It transforms the stream elements into a collection, string, or other summary result using a Collector. The Collectors utility class provides dozens of ready-made collectors.

import java.util.*;
import java.util.stream.Collectors;

public class CollectExample {
    public static void main(String[] args) {
        List names = List.of("Alice", "Bob", "Charlie", "Alice", "David");

        // Collect to List
        List list = names.stream()
                                 .filter(n -> n.length() > 3)
                                 .collect(Collectors.toList());
        System.out.println("List: " + list);
        // Output: List: [Alice, Charlie, Alice, David]

        // Collect to Set (removes duplicates)
        Set set = names.stream()
                               .collect(Collectors.toSet());
        System.out.println("Set: " + set);
        // Output: Set: [Bob, Alice, Charlie, David]

        // Collect to unmodifiable List (Java 10+)
        List immutable = names.stream()
                                      .collect(Collectors.toUnmodifiableList());

        // Collect to Map (name -> length)
        Map nameToLength = names.stream()
            .distinct()
            .collect(Collectors.toMap(
                name -> name,           // key mapper
                String::length          // value mapper
            ));
        System.out.println("Map: " + nameToLength);
        // Output: Map: {Alice=5, Bob=3, Charlie=7, David=5}

        // Joining strings
        String joined = names.stream()
                             .distinct()
                             .collect(Collectors.joining(", "));
        System.out.println("Joined: " + joined);
        // Output: Joined: Alice, Bob, Charlie, David

        // Joining with prefix and suffix
        String formatted = names.stream()
                                .distinct()
                                .collect(Collectors.joining(", ", "[", "]"));
        System.out.println("Formatted: " + formatted);
        // Output: Formatted: [Alice, Bob, Charlie, David]
    }
}

4.3 reduce()

reduce() combines all elements of a stream into a single result by repeatedly applying a binary operation. It is the building block behind sum(), max(), and count() — those are all specialized reductions.

import java.util.List;
import java.util.Optional;

public class ReduceExample {
    public static void main(String[] args) {
        List numbers = List.of(1, 2, 3, 4, 5);

        // With identity value: returns int (never empty)
        int sum = numbers.stream()
                         .reduce(0, Integer::sum);
        System.out.println("Sum: " + sum);
        // Output: Sum: 15

        // Without identity: returns Optional (might be empty)
        Optional product = numbers.stream()
                                           .reduce((a, b) -> a * b);
        product.ifPresent(p -> System.out.println("Product: " + p));
        // Output: Product: 120

        // Finding the maximum
        Optional max = numbers.stream()
                                       .reduce(Integer::max);
        System.out.println("Max: " + max.orElse(0));
        // Output: Max: 5

        // String concatenation with reduce
        List words = List.of("Java", "Stream", "API");
        String sentence = words.stream()
                               .reduce("", (a, b) -> a.isEmpty() ? b : a + " " + b);
        System.out.println(sentence);
        // Output: Java Stream API

        // How reduce works step-by-step for sum:
        // Step 1: identity(0) + 1 = 1
        // Step 2: 1 + 2 = 3
        // Step 3: 3 + 3 = 6
        // Step 4: 6 + 4 = 10
        // Step 5: 10 + 5 = 15
    }
}

4.4 count(), findFirst(), findAny()

import java.util.List;
import java.util.Optional;

public class CountFindExample {
    public static void main(String[] args) {
        List names = List.of("Alice", "Bob", "Charlie", "David", "Eve");

        // count() -- number of elements
        long count = names.stream()
                          .filter(n -> n.length() > 3)
                          .count();
        System.out.println("Names longer than 3: " + count);
        // Output: Names longer than 3: 3

        // findFirst() -- first element in encounter order, returns Optional
        Optional first = names.stream()
                                      .filter(n -> n.startsWith("C"))
                                      .findFirst();
        System.out.println("First C-name: " + first.orElse("none"));
        // Output: First C-name: Charlie

        // findAny() -- any matching element (useful in parallel streams)
        Optional any = names.parallelStream()
                                    .filter(n -> n.length() == 3)
                                    .findAny();
        System.out.println("Any 3-letter name: " + any.orElse("none"));
        // Output: Any 3-letter name: Bob (or Eve in parallel)
    }
}

4.5 anyMatch(), allMatch(), noneMatch()

These are short-circuiting terminal operations that return a boolean. They stop processing as soon as the answer is determined.

import java.util.List;

public class MatchExample {
    public static void main(String[] args) {
        List numbers = List.of(2, 4, 6, 8, 10);

        // anyMatch: is there at least one element > 7?
        boolean hasLarge = numbers.stream().anyMatch(n -> n > 7);
        System.out.println("Any > 7? " + hasLarge);
        // Output: Any > 7? true

        // allMatch: are ALL elements even?
        boolean allEven = numbers.stream().allMatch(n -> n % 2 == 0);
        System.out.println("All even? " + allEven);
        // Output: All even? true

        // noneMatch: are there NO negative numbers?
        boolean noNegatives = numbers.stream().noneMatch(n -> n < 0);
        System.out.println("No negatives? " + noNegatives);
        // Output: No negatives? true
    }
}

4.6 min(), max(), and toArray()

import java.util.Comparator;
import java.util.List;
import java.util.Optional;

public class MinMaxToArrayExample {
    public static void main(String[] args) {
        List names = List.of("Charlie", "Bob", "Alice", "David");

        // min -- requires a Comparator
        Optional shortest = names.stream()
                                         .min(Comparator.comparingInt(String::length));
        System.out.println("Shortest: " + shortest.orElse("none"));
        // Output: Shortest: Bob

        // max
        Optional longest = names.stream()
                                        .max(Comparator.comparingInt(String::length));
        System.out.println("Longest: " + longest.orElse("none"));
        // Output: Longest: Charlie

        // toArray -- convert stream to array
        String[] nameArray = names.stream()
                                  .filter(n -> n.length() > 3)
                                  .toArray(String[]::new);
        System.out.println("Array length: " + nameArray.length);
        // Output: Array length: 3
    }
}

Terminal Operations Summary

Operation Return Type Purpose
forEach(Consumer) void Perform action on each element
collect(Collector) R Accumulate into a collection or summary
reduce(identity, BinaryOp) T Combine all elements into one value
count() long Count elements
findFirst() Optional<T> First element (encounter order)
findAny() Optional<T> Any element (optimized for parallel)
anyMatch(Predicate) boolean At least one matches?
allMatch(Predicate) boolean All match?
noneMatch(Predicate) boolean None match?
min(Comparator) Optional<T> Minimum element
max(Comparator) Optional<T> Maximum element
toArray() Object[] or T[] Convert to array

5. Collectors In-Depth

The Collectors class is the powerhouse of the Stream API. Beyond basic toList() and toSet(), it provides sophisticated collectors for grouping, partitioning, summarizing, and more. Mastering these collectors will dramatically improve the expressiveness of your code.

5.1 groupingBy()

groupingBy() groups stream elements by a classification function, producing a Map<K, List<T>>. This is the stream equivalent of SQL's GROUP BY.

import java.util.*;
import java.util.stream.Collectors;

public class GroupingByExample {
    public static void main(String[] args) {
        List names = List.of("Alice", "Bob", "Charlie", "Anna", "Ben", "Chris");

        // Group by first letter
        Map> byFirstLetter = names.stream()
            .collect(Collectors.groupingBy(name -> name.charAt(0)));
        System.out.println(byFirstLetter);
        // Output: {A=[Alice, Anna], B=[Bob, Ben], C=[Charlie, Chris]}

        // Group by string length
        Map> byLength = names.stream()
            .collect(Collectors.groupingBy(String::length));
        System.out.println(byLength);
        // Output: {3=[Bob, Ben], 4=[Anna], 5=[Alice, Chris], 7=[Charlie]}

        // groupingBy with downstream collector: count per group
        Map countByLetter = names.stream()
            .collect(Collectors.groupingBy(
                name -> name.charAt(0),
                Collectors.counting()
            ));
        System.out.println(countByLetter);
        // Output: {A=2, B=2, C=2}

        // groupingBy with downstream collector: join names per group
        Map joinedByLetter = names.stream()
            .collect(Collectors.groupingBy(
                name -> name.charAt(0),
                Collectors.joining(", ")
            ));
        System.out.println(joinedByLetter);
        // Output: {A=Alice, Anna, B=Bob, Ben, C=Charlie, Chris}

        // Multi-level grouping: group by length, then by first letter
        Map>> multiLevel = names.stream()
            .collect(Collectors.groupingBy(
                String::length,
                Collectors.groupingBy(name -> name.charAt(0))
            ));
        System.out.println(multiLevel);
        // Output: {3={B=[Bob, Ben]}, 4={A=[Anna]}, 5={A=[Alice], C=[Chris]}, 7={C=[Charlie]}}
    }
}

5.2 partitioningBy()

partitioningBy() is a special case of groupingBy() that splits elements into exactly two groups based on a Predicate -- a true group and a false group. The result is always Map<Boolean, List<T>>.

import java.util.*;
import java.util.stream.Collectors;

public class PartitioningByExample {
    public static void main(String[] args) {
        List numbers = List.of(1, 2, 3, 4, 5, 6, 7, 8, 9, 10);

        // Partition into even and odd
        Map> evenOdd = numbers.stream()
            .collect(Collectors.partitioningBy(n -> n % 2 == 0));
        System.out.println("Even: " + evenOdd.get(true));
        System.out.println("Odd: " + evenOdd.get(false));
        // Output:
        // Even: [2, 4, 6, 8, 10]
        // Odd: [1, 3, 5, 7, 9]

        // Partition with downstream collector: count each group
        Map counts = numbers.stream()
            .collect(Collectors.partitioningBy(
                n -> n > 5,
                Collectors.counting()
            ));
        System.out.println("Greater than 5: " + counts.get(true));
        System.out.println("5 or less: " + counts.get(false));
        // Output:
        // Greater than 5: 5
        // 5 or less: 5
    }
}

5.3 toMap() with Merge Function

When collecting to a Map, duplicate keys cause an IllegalStateException. You must provide a merge function to handle collisions.

import java.util.*;
import java.util.stream.Collectors;

public class ToMapMergeExample {
    public static void main(String[] args) {
        List words = List.of("hello", "world", "hello", "java", "world");

        // Problem: duplicate keys without merge function throws exception
        // Solution: provide a merge function
        Map wordCount = words.stream()
            .collect(Collectors.toMap(
                word -> word,              // key: the word itself
                word -> 1,                 // value: count of 1
                Integer::sum               // merge: add counts for duplicate keys
            ));
        System.out.println(wordCount);
        // Output: {hello=2, world=2, java=1}

        // Collecting to a specific Map implementation (LinkedHashMap preserves order)
        Map orderedCount = words.stream()
            .collect(Collectors.toMap(
                word -> word,
                word -> 1,
                Integer::sum,
                LinkedHashMap::new          // supplier for the Map type
            ));
        System.out.println(orderedCount);
        // Output: {hello=2, world=2, java=1}
    }
}

5.4 summarizingInt() and Other Summarizers

summarizingInt(), summarizingLong(), and summarizingDouble() collect comprehensive statistics in a single pass -- count, sum, min, max, and average.

import java.util.*;
import java.util.stream.Collectors;

public class SummarizingExample {
    public static void main(String[] args) {
        List names = List.of("Alice", "Bob", "Charlie", "David", "Eve");

        IntSummaryStatistics stats = names.stream()
            .collect(Collectors.summarizingInt(String::length));

        System.out.println("Count: " + stats.getCount());     // 5
        System.out.println("Sum: " + stats.getSum());         // 24
        System.out.println("Min: " + stats.getMin());         // 3
        System.out.println("Max: " + stats.getMax());         // 7
        System.out.printf("Average: %.1f%n", stats.getAverage()); // 4.8
    }
}

6. Parallel Streams

Parallel streams split the data into multiple chunks and process them simultaneously on different threads using the ForkJoinPool. This can significantly speed up processing of large datasets on multi-core machines -- but parallelism is not free and can hurt performance when used incorrectly.

Creating Parallel Streams

import java.util.List;
import java.util.stream.IntStream;

public class ParallelStreamExample {
    public static void main(String[] args) {
        List numbers = List.of(1, 2, 3, 4, 5, 6, 7, 8, 9, 10);

        // Method 1: parallelStream() from collection
        long sum1 = numbers.parallelStream()
                           .mapToLong(Integer::longValue)
                           .sum();

        // Method 2: .parallel() on an existing stream
        long sum2 = numbers.stream()
                           .parallel()
                           .mapToLong(Integer::longValue)
                           .sum();

        System.out.println("Sum1: " + sum1 + ", Sum2: " + sum2);
        // Output: Sum1: 55, Sum2: 55

        // Demonstrating parallel execution with thread names
        System.out.println("--- Sequential ---");
        IntStream.range(1, 5).forEach(i ->
            System.out.println(i + " on " + Thread.currentThread().getName()));

        System.out.println("--- Parallel ---");
        IntStream.range(1, 5).parallel().forEach(i ->
            System.out.println(i + " on " + Thread.currentThread().getName()));
        // Parallel output shows different thread names (ForkJoinPool.commonPool-worker-*)
    }
}

When to Use (and Not Use) Parallel Streams

Use Parallel When Avoid Parallel When
Large datasets (100,000+ elements) Small datasets (overhead > benefit)
CPU-intensive operations per element I/O-bound operations (network, file)
Operations are independent (no shared state) Operations depend on encounter order
Source is easy to split (ArrayList, arrays) Source is hard to split (LinkedList, Stream.iterate)
Stateless intermediate operations Stateful operations (sorted, distinct, limit)

Common mistake: Using parallel streams with shared mutable state. This leads to race conditions and incorrect results.

import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.stream.Collectors;
import java.util.stream.IntStream;

public class ParallelStreamDanger {
    public static void main(String[] args) {
        // WRONG: modifying a shared list from a parallel stream
        List unsafeList = new ArrayList<>();
        IntStream.range(0, 1000)
                 .parallel()
                 .forEach(unsafeList::add);  // Race condition!
        System.out.println("Unsafe size: " + unsafeList.size());
        // Output: might be less than 1000 or throw ArrayIndexOutOfBoundsException!

        // RIGHT: use collect() instead
        List safeList = IntStream.range(0, 1000)
                                          .parallel()
                                          .boxed()
                                          .collect(Collectors.toList());
        System.out.println("Safe size: " + safeList.size());
        // Output: Safe size: 1000
    }
}

7. Optional with Streams

Many terminal stream operations return an Optional -- a container that may or may not hold a value. This forces you to handle the "no result" case explicitly, preventing NullPointerException.

import java.util.List;
import java.util.Optional;
import java.util.stream.Stream;

public class OptionalWithStreams {
    public static void main(String[] args) {
        List names = List.of("Alice", "Bob", "Charlie");

        // findFirst returns Optional
        Optional first = names.stream()
                                      .filter(n -> n.startsWith("Z"))
                                      .findFirst();

        // Handle the Optional
        String result = first.orElse("No match found");
        System.out.println(result);
        // Output: No match found

        // ifPresent -- only execute if a value exists
        names.stream()
             .filter(n -> n.startsWith("C"))
             .findFirst()
             .ifPresent(name -> System.out.println("Found: " + name));
        // Output: Found: Charlie

        // map on Optional -- transform the value if present
        Optional length = names.stream()
            .filter(n -> n.startsWith("A"))
            .findFirst()
            .map(String::length);
        System.out.println("Length: " + length.orElse(0));
        // Output: Length: 5

        // orElseThrow -- throw exception if empty
        // names.stream().filter(n -> n.startsWith("Z")).findFirst()
        //       .orElseThrow(() -> new IllegalArgumentException("No Z names"));

        // Java 9+: Optional.stream() -- converts Optional to a 0-or-1 element stream
        // Useful for flatMapping a stream of Optionals
        List> optionals = List.of(
            Optional.of("Hello"),
            Optional.empty(),
            Optional.of("World")
        );
        List values = optionals.stream()
                                       .flatMap(Optional::stream)
                                       .toList();
        System.out.println(values);
        // Output: [Hello, World]
    }
}

8. Primitive Streams

Java provides three specialized stream types for primitives: IntStream, LongStream, and DoubleStream. These avoid the overhead of autoboxing (converting int to Integer and back) and provide specialized methods like sum(), average(), and summaryStatistics().

import java.util.OptionalDouble;
import java.util.OptionalInt;
import java.util.stream.IntStream;
import java.util.stream.DoubleStream;
import java.util.List;

public class PrimitiveStreamExample {
    public static void main(String[] args) {
        // IntStream creation
        IntStream range = IntStream.rangeClosed(1, 10);

        // sum, average, min, max
        int sum = IntStream.rangeClosed(1, 10).sum();
        System.out.println("Sum 1-10: " + sum);
        // Output: Sum 1-10: 55

        OptionalDouble avg = IntStream.of(85, 90, 78, 92, 88).average();
        System.out.println("Average: " + avg.orElse(0));
        // Output: Average: 86.6

        OptionalInt max = IntStream.of(85, 90, 78, 92, 88).max();
        System.out.println("Max: " + max.orElse(0));
        // Output: Max: 92

        // summaryStatistics() -- all stats in one pass
        var stats = IntStream.of(85, 90, 78, 92, 88).summaryStatistics();
        System.out.println("Count: " + stats.getCount());
        System.out.println("Sum: " + stats.getSum());
        System.out.println("Min: " + stats.getMin());
        System.out.println("Max: " + stats.getMax());
        System.out.printf("Avg: %.1f%n", stats.getAverage());

        // boxed() -- convert IntStream to Stream
        List boxedList = IntStream.rangeClosed(1, 5)
                                           .boxed()
                                           .toList();
        System.out.println("Boxed: " + boxedList);
        // Output: Boxed: [1, 2, 3, 4, 5]

        // mapToObj -- convert each int to an object
        List labels = IntStream.rangeClosed(1, 3)
                                       .mapToObj(i -> "Item " + i)
                                       .toList();
        System.out.println("Labels: " + labels);
        // Output: Labels: [Item 1, Item 2, Item 3]

        // Converting between Stream and primitive streams
        List names = List.of("Alice", "Bob", "Charlie");
        IntStream lengths = names.stream().mapToInt(String::length);
        System.out.println("Total chars: " + lengths.sum());
        // Output: Total chars: 15
    }
}

9. Common Patterns

This section demonstrates practical, real-world patterns you will use repeatedly in production code. These patterns solve common data-processing problems elegantly with streams.

9.1 Filtering and Collecting

import java.util.*;
import java.util.stream.Collectors;

public class CommonPatterns {
    record Product(String name, String category, double price) {}

    public static void main(String[] args) {
        List products = List.of(
            new Product("Laptop", "Electronics", 999.99),
            new Product("Headphones", "Electronics", 79.99),
            new Product("Coffee Maker", "Kitchen", 49.99),
            new Product("Blender", "Kitchen", 39.99),
            new Product("Monitor", "Electronics", 349.99),
            new Product("Toaster", "Kitchen", 29.99)
        );

        // Filter by category and sort by price
        List electronics = products.stream()
            .filter(p -> p.category().equals("Electronics"))
            .sorted(Comparator.comparingDouble(Product::price))
            .collect(Collectors.toList());
        electronics.forEach(p -> System.out.println(p.name() + " $" + p.price()));
        // Output:
        // Headphones $79.99
        // Monitor $349.99
        // Laptop $999.99

        // Find the top 2 most expensive products
        List topTwo = products.stream()
            .sorted(Comparator.comparingDouble(Product::price).reversed())
            .limit(2)
            .map(Product::name)
            .collect(Collectors.toList());
        System.out.println("Top 2: " + topTwo);
        // Output: Top 2: [Laptop, Monitor]

        // Group by category and calculate average price per category
        Map avgByCategory = products.stream()
            .collect(Collectors.groupingBy(
                Product::category,
                Collectors.averagingDouble(Product::price)
            ));
        avgByCategory.forEach((cat, avg) ->
            System.out.printf("%s avg: $%.2f%n", cat, avg));
        // Output:
        // Electronics avg: $476.66
        // Kitchen avg: $39.99

        // Create a comma-separated string of product names
        String productList = products.stream()
            .map(Product::name)
            .collect(Collectors.joining(", "));
        System.out.println("Products: " + productList);
        // Output: Products: Laptop, Headphones, Coffee Maker, Blender, Monitor, Toaster

        // Convert to a Map: name -> price
        Map priceMap = products.stream()
            .collect(Collectors.toMap(Product::name, Product::price));
        System.out.println("Laptop price: $" + priceMap.get("Laptop"));
        // Output: Laptop price: $999.99
    }
}

9.2 Flattening Nested Collections

import java.util.*;
import java.util.stream.Collectors;

public class FlatteningPattern {
    record Student(String name, List courses) {}

    public static void main(String[] args) {
        List students = List.of(
            new Student("Alice", List.of("Math", "Physics", "CS")),
            new Student("Bob", List.of("CS", "English", "Math")),
            new Student("Charlie", List.of("Biology", "Chemistry"))
        );

        // Get all unique courses offered
        Set allCourses = students.stream()
            .flatMap(s -> s.courses().stream())
            .collect(Collectors.toSet());
        System.out.println("All courses: " + allCourses);
        // Output: All courses: [Biology, CS, Chemistry, English, Math, Physics]

        // Find students taking "CS"
        List csStudents = students.stream()
            .filter(s -> s.courses().contains("CS"))
            .map(Student::name)
            .collect(Collectors.toList());
        System.out.println("CS students: " + csStudents);
        // Output: CS students: [Alice, Bob]
    }
}

10. Stream vs Loop

Streams are not always better than loops, and loops are not always better than streams. Understanding when to use each is a sign of a mature Java developer.

Criteria Stream Traditional Loop
Readability Excellent for data transformations (filter, map, collect) Better for simple iterations with side effects
Debugging Harder -- stack traces are less clear, peek() helps Easier -- set breakpoints, inspect variables
Performance Slight overhead for small datasets; parallel() helps with large Generally faster for simple operations on small data
Mutability Encourages immutability (functional style) Naturally works with mutable state
Short-circuiting Built-in (findFirst, anyMatch, limit) Manual (break, return)
Parallelism Trivial -- just call parallel() Complex -- manual thread management
State management Stateless operations preferred Stateful iteration is natural

The Same Problem: Both Ways

import java.util.*;
import java.util.stream.Collectors;

public class StreamVsLoop {
    public static void main(String[] args) {
        List names = List.of("Alice", "Bob", "Charlie", "David", "Eve", "Frank");

        // Task: Get uppercase names that are longer than 3 characters

        // --- Loop approach ---
        List resultLoop = new ArrayList<>();
        for (String name : names) {
            if (name.length() > 3) {
                resultLoop.add(name.toUpperCase());
            }
        }
        System.out.println("Loop: " + resultLoop);

        // --- Stream approach ---
        List resultStream = names.stream()
            .filter(n -> n.length() > 3)
            .map(String::toUpperCase)
            .collect(Collectors.toList());
        System.out.println("Stream: " + resultStream);

        // Both output: [ALICE, CHARLIE, DAVID, FRANK]
        // Stream is more readable here -- the intent is clear at a glance
    }
}

Rule of thumb: Use streams for data transformation pipelines (filter, map, collect, group). Use loops when you need to maintain complex local state, perform multiple related side effects, or when the logic is inherently imperative (like building a graph or managing indices).

11. Common Mistakes

These are mistakes that even experienced developers make when working with the Stream API. Understanding them will save you hours of debugging.

Mistake 1: Reusing a Stream

A stream can only be consumed once. Attempting to reuse it throws an IllegalStateException.

import java.util.List;
import java.util.stream.Stream;

public class ReuseStreamMistake {
    public static void main(String[] args) {
        List names = List.of("Alice", "Bob", "Charlie");
        Stream stream = names.stream().filter(n -> n.length() > 3);

        // First use -- works fine
        long count = stream.count();
        System.out.println("Count: " + count);

        // Second use -- THROWS IllegalStateException!
        // stream.forEach(System.out::println);
        // java.lang.IllegalStateException: stream has already been operated upon or closed

        // Fix: create a new stream each time
        long count2 = names.stream().filter(n -> n.length() > 3).count();
        names.stream().filter(n -> n.length() > 3).forEach(System.out::println);
    }
}

Mistake 2: Side Effects in Intermediate Operations

Intermediate operations like map() and filter() should be stateless and free of side effects. Modifying external state from these operations leads to unpredictable behavior, especially with parallel streams.

import java.util.ArrayList;
import java.util.List;
import java.util.stream.Collectors;

public class SideEffectMistake {
    public static void main(String[] args) {
        List names = List.of("Alice", "Bob", "Charlie");

        // WRONG: modifying external state inside map()
        List sideEffectList = new ArrayList<>();
        names.stream()
             .map(n -> {
                 sideEffectList.add(n);  // Side effect! Don't do this.
                 return n.toUpperCase();
             })
             .collect(Collectors.toList());

        // RIGHT: use collect() to gather results
        List upper = names.stream()
                                  .map(String::toUpperCase)
                                  .collect(Collectors.toList());
    }
}

Mistake 3: Infinite Streams Without limit()

import java.util.stream.Stream;

public class InfiniteStreamMistake {
    public static void main(String[] args) {
        // WRONG: this runs forever and causes OutOfMemoryError
        // Stream.generate(Math::random).forEach(System.out::println);

        // RIGHT: always use limit() with generate() or iterate()
        Stream.generate(Math::random)
              .limit(5)
              .forEach(n -> System.out.printf("%.2f%n", n));

        // Or use the Java 9+ iterate with a predicate
        Stream.iterate(1, n -> n <= 100, n -> n * 2)
              .forEach(System.out::println);
    }
}

Mistake 4: Modifying the Source During Streaming

import java.util.ArrayList;
import java.util.List;
import java.util.stream.Collectors;

public class ModifySourceMistake {
    public static void main(String[] args) {
        List names = new ArrayList<>(List.of("Alice", "Bob", "Charlie"));

        // WRONG: modifying the source while streaming -- ConcurrentModificationException!
        // names.stream()
        //      .filter(n -> n.startsWith("A"))
        //      .forEach(n -> names.remove(n));

        // RIGHT: collect results, then modify
        List toRemove = names.stream()
            .filter(n -> n.startsWith("A"))
            .collect(Collectors.toList());
        names.removeAll(toRemove);
        System.out.println(names);
        // Output: [Bob, Charlie]

        // Or use removeIf() which is simpler
        // names.removeIf(n -> n.startsWith("A"));
    }
}

Mistake 5: Performance Traps

import java.util.List;
import java.util.stream.Collectors;

public class PerformanceTrapMistake {
    public static void main(String[] args) {
        List numbers = List.of(1, 2, 3, 4, 5);

        // SLOW: unnecessary boxing -- Stream instead of IntStream
        int sum1 = numbers.stream()
                          .map(n -> n * 2)       // boxes/unboxes Integer repeatedly
                          .reduce(0, Integer::sum);

        // FAST: use primitive stream
        int sum2 = numbers.stream()
                          .mapToInt(n -> n * 2)  // works with primitive int
                          .sum();

        // WASTEFUL: sorting the entire stream just to find the max
        // numbers.stream().sorted(Comparator.reverseOrder()).findFirst();

        // EFFICIENT: use max() directly
        // numbers.stream().max(Comparator.naturalOrder());

        System.out.println("Sum: " + sum2);
        // Output: Sum: 30
    }
}

Common Mistakes Summary

Mistake Symptom Fix
Reusing a consumed stream IllegalStateException Create a new stream each time
Side effects in map/filter Unpredictable results in parallel Use collect() for results, keep lambdas pure
Infinite stream without limit Program hangs or OutOfMemoryError Always use limit() with generate()/iterate()
Modifying source during stream ConcurrentModificationException Collect first, then modify; or use removeIf()
Unnecessary boxing Poor performance Use mapToInt()/mapToLong()/mapToDouble()
Sorting just to get min/max O(n log n) instead of O(n) Use min()/max() directly

12. Best Practices

Following these best practices will help you write stream code that is clean, efficient, and maintainable.

1. Keep Operations Simple

Each lambda in a stream pipeline should do one thing. If your lambda is more than 2-3 lines, extract it into a named method.

import java.util.List;
import java.util.stream.Collectors;

public class BestPractices {

    // POOR: complex inline lambda
    // list.stream().filter(e -> e.getAge() > 18 && e.getSalary() > 50000
    //     && e.getDepartment().equals("Engineering")).collect(Collectors.toList());

    // BETTER: extract to a method
    static boolean isSeniorEngineer(Employee e) {
        return e.age > 18
            && e.salary > 50000
            && e.department.equals("Engineering");
    }

    record Employee(String name, int age, double salary, String department) {}

    public static void main(String[] args) {
        List employees = List.of(
            new Employee("Alice", 30, 85000, "Engineering"),
            new Employee("Bob", 25, 45000, "Marketing")
        );

        List seniors = employees.stream()
            .filter(BestPractices::isSeniorEngineer)
            .collect(Collectors.toList());
    }
}

2. Use Method References When Possible

Method references are more concise and communicate intent better than equivalent lambdas.

import java.util.List;
import java.util.stream.Collectors;

public class MethodReferencePractice {
    public static void main(String[] args) {
        List names = List.of("alice", "bob", "charlie");

        // Lambda (works but verbose)
        List upper1 = names.stream()
            .map(s -> s.toUpperCase())
            .collect(Collectors.toList());

        // Method reference (cleaner)
        List upper2 = names.stream()
            .map(String::toUpperCase)
            .collect(Collectors.toList());

        // More examples:
        // s -> System.out.println(s)  ->  System.out::println
        // s -> s.length()             ->  String::length
        // s -> Integer.parseInt(s)    ->  Integer::parseInt
        // () -> new ArrayList<>()     ->  ArrayList::new
    }
}

3. Format Multi-Line Streams for Readability

Each operation in a stream pipeline should be on its own line, with consistent indentation. This makes the pipeline easy to read and modify.

import java.util.List;
import java.util.stream.Collectors;

public class FormattingPractice {
    record Employee(String name, String dept, double salary) {}

    public static void main(String[] args) {
        List employees = List.of(
            new Employee("Alice", "Engineering", 95000),
            new Employee("Bob", "Marketing", 65000),
            new Employee("Charlie", "Engineering", 85000)
        );

        // POOR: all on one line
        // List result = employees.stream().filter(e -> e.dept().equals("Engineering")).map(Employee::name).sorted().collect(Collectors.toList());

        // GOOD: one operation per line, aligned at the dot
        List result = employees.stream()
            .filter(e -> e.dept().equals("Engineering"))
            .map(Employee::name)
            .sorted()
            .collect(Collectors.toList());
        System.out.println(result);
        // Output: [Alice, Charlie]
    }
}

Best Practices Summary

# Practice Why
1 Keep lambdas short; extract complex logic to named methods Readability, testability, reusability
2 Use method references (String::toUpperCase) Cleaner, more concise
3 Avoid side effects in intermediate operations Predictable behavior, safe parallelism
4 Use primitive streams for numbers (mapToInt) Avoids autoboxing overhead
5 Do not over-use streams; simple loops are fine Not everything benefits from streams
6 Format one operation per line Readability, easy to add/remove steps
7 Prefer collect() over forEach() + mutation Thread-safe, functional style
8 Use Optional results properly (orElse, ifPresent) Avoid NullPointerException
9 Use parallel streams only when justified Parallelism has overhead; profile first
10 Favor toList() (Java 16+) over collect(Collectors.toList()) Shorter and returns unmodifiable list

13. Complete Practical Example: Employee Analytics System

Let us tie everything together with a real-world example. We will build an Employee analytics system that uses streams to answer common business questions: filtering by department, calculating salary statistics, grouping, partitioning, finding top performers, and generating a report.

import java.util.*;
import java.util.stream.Collectors;

public class EmployeeAnalytics {

    // --- Employee class ---
    static class Employee {
        private final String name;
        private final String department;
        private final double salary;
        private final int yearsOfExperience;

        public Employee(String name, String department, double salary, int yearsOfExperience) {
            this.name = name;
            this.department = department;
            this.salary = salary;
            this.yearsOfExperience = yearsOfExperience;
        }

        public String getName()              { return name; }
        public String getDepartment()        { return department; }
        public double getSalary()            { return salary; }
        public int getYearsOfExperience()    { return yearsOfExperience; }
        public boolean isSenior()            { return yearsOfExperience >= 5; }

        @Override
        public String toString() {
            return String.format("%-12s | %-12s | $%,10.2f | %2d yrs",
                name, department, salary, yearsOfExperience);
        }
    }

    public static void main(String[] args) {
        // --- Sample data ---
        List employees = List.of(
            new Employee("Alice",    "Engineering",  120000, 8),
            new Employee("Bob",      "Engineering",   95000, 3),
            new Employee("Charlie",  "Engineering",  110000, 6),
            new Employee("Diana",    "Marketing",     85000, 10),
            new Employee("Eve",      "Marketing",     72000, 2),
            new Employee("Frank",    "Sales",         78000, 5),
            new Employee("Grace",    "Sales",         82000, 7),
            new Employee("Henry",    "HR",            68000, 4),
            new Employee("Ivy",      "HR",            71000, 6),
            new Employee("Jack",     "Engineering",  135000, 12)
        );

        System.out.println("=== EMPLOYEE ANALYTICS REPORT ===\n");

        // -------------------------------------------------------
        // 1. FILTER: Engineers earning above $100K
        // -------------------------------------------------------
        System.out.println("--- 1. High-Earning Engineers (>$100K) ---");
        List highEarningEngineers = employees.stream()
            .filter(e -> e.getDepartment().equals("Engineering"))
            .filter(e -> e.getSalary() > 100000)
            .sorted(Comparator.comparingDouble(Employee::getSalary).reversed())
            .collect(Collectors.toList());

        highEarningEngineers.forEach(System.out::println);
        // Output:
        // Jack         | Engineering  | $135,000.00 | 12 yrs
        // Alice        | Engineering  | $120,000.00 |  8 yrs
        // Charlie      | Engineering  | $110,000.00 |  6 yrs

        // -------------------------------------------------------
        // 2. MAP + COLLECT: Average salary per department
        // -------------------------------------------------------
        System.out.println("\n--- 2. Average Salary by Department ---");
        Map avgSalaryByDept = employees.stream()
            .collect(Collectors.groupingBy(
                Employee::getDepartment,
                Collectors.averagingDouble(Employee::getSalary)
            ));

        avgSalaryByDept.entrySet().stream()
            .sorted(Map.Entry.comparingByValue().reversed())
            .forEach(e -> System.out.printf("  %-12s $%,.2f%n", e.getKey(), e.getValue()));
        // Output:
        //   Engineering  $115,000.00
        //   Sales        $80,000.00
        //   Marketing    $78,500.00
        //   HR           $69,500.00

        // -------------------------------------------------------
        // 3. GROUPING: Employees grouped by department
        // -------------------------------------------------------
        System.out.println("\n--- 3. Employees by Department ---");
        Map> namesByDept = employees.stream()
            .collect(Collectors.groupingBy(
                Employee::getDepartment,
                Collectors.mapping(Employee::getName, Collectors.toList())
            ));

        namesByDept.forEach((dept, names) ->
            System.out.printf("  %-12s %s%n", dept, names));
        // Output:
        //   Engineering  [Alice, Bob, Charlie, Jack]
        //   Marketing    [Diana, Eve]
        //   Sales        [Frank, Grace]
        //   HR           [Henry, Ivy]

        // -------------------------------------------------------
        // 4. REDUCE: Highest-paid employee
        // -------------------------------------------------------
        System.out.println("\n--- 4. Highest Paid Employee ---");
        employees.stream()
            .max(Comparator.comparingDouble(Employee::getSalary))
            .ifPresent(e -> System.out.println("  " + e));
        // Output:
        //   Jack         | Engineering  | $135,000.00 | 12 yrs

        // -------------------------------------------------------
        // 5. PARTITIONING: Senior vs Junior (5+ years = senior)
        // -------------------------------------------------------
        System.out.println("\n--- 5. Senior vs Junior ---");
        Map> seniorPartition = employees.stream()
            .collect(Collectors.partitioningBy(Employee::isSenior));

        System.out.println("  Senior (" + seniorPartition.get(true).size() + "):");
        seniorPartition.get(true).forEach(e -> System.out.println("    " + e.getName()));

        System.out.println("  Junior (" + seniorPartition.get(false).size() + "):");
        seniorPartition.get(false).forEach(e -> System.out.println("    " + e.getName()));
        // Output:
        //   Senior (6):
        //     Alice, Charlie, Diana, Frank, Grace, Ivy
        //   Junior (4):
        //     Bob, Eve, Henry, Jack... wait, Jack has 12 years!
        //     Actually: Bob, Eve, Henry

        // -------------------------------------------------------
        // 6. STATISTICS: Salary summary
        // -------------------------------------------------------
        System.out.println("\n--- 6. Salary Statistics ---");
        DoubleSummaryStatistics salaryStats = employees.stream()
            .mapToDouble(Employee::getSalary)
            .summaryStatistics();

        System.out.printf("  Count:   %d%n", salaryStats.getCount());
        System.out.printf("  Total:   $%,.2f%n", salaryStats.getSum());
        System.out.printf("  Min:     $%,.2f%n", salaryStats.getMin());
        System.out.printf("  Max:     $%,.2f%n", salaryStats.getMax());
        System.out.printf("  Average: $%,.2f%n", salaryStats.getAverage());
        // Output:
        //   Count:   10
        //   Total:   $916,000.00
        //   Min:     $68,000.00
        //   Max:     $135,000.00
        //   Average: $91,600.00

        // -------------------------------------------------------
        // 7. TOP N: Top 3 highest salaries
        // -------------------------------------------------------
        System.out.println("\n--- 7. Top 3 Highest Salaries ---");
        employees.stream()
            .sorted(Comparator.comparingDouble(Employee::getSalary).reversed())
            .limit(3)
            .forEach(e -> System.out.println("  " + e));
        // Output:
        //   Jack         | Engineering  | $135,000.00 | 12 yrs
        //   Alice        | Engineering  | $120,000.00 |  8 yrs
        //   Charlie      | Engineering  | $110,000.00 |  6 yrs

        // -------------------------------------------------------
        // 8. STRING JOINING: Department roster
        // -------------------------------------------------------
        System.out.println("\n--- 8. Department Roster ---");
        Map rosters = employees.stream()
            .collect(Collectors.groupingBy(
                Employee::getDepartment,
                Collectors.mapping(
                    Employee::getName,
                    Collectors.joining(", ")
                )
            ));
        rosters.forEach((dept, roster) ->
            System.out.printf("  %-12s %s%n", dept, roster));
        // Output:
        //   Engineering  Alice, Bob, Charlie, Jack
        //   Marketing    Diana, Eve
        //   Sales        Frank, Grace
        //   HR           Henry, Ivy

        // -------------------------------------------------------
        // 9. COMPLEX: Department with highest average salary
        // -------------------------------------------------------
        System.out.println("\n--- 9. Highest-Paying Department ---");
        employees.stream()
            .collect(Collectors.groupingBy(
                Employee::getDepartment,
                Collectors.averagingDouble(Employee::getSalary)
            ))
            .entrySet().stream()
            .max(Map.Entry.comparingByValue())
            .ifPresent(e -> System.out.printf("  %s with avg $%,.2f%n", e.getKey(), e.getValue()));
        // Output:
        //   Engineering with avg $115,000.00

        // -------------------------------------------------------
        // 10. BOOLEAN CHECKS: Quick analytics
        // -------------------------------------------------------
        System.out.println("\n--- 10. Quick Checks ---");
        boolean anyOver130K = employees.stream()
            .anyMatch(e -> e.getSalary() > 130000);
        System.out.println("  Anyone earning >$130K? " + anyOver130K);  // true

        boolean allAbove50K = employees.stream()
            .allMatch(e -> e.getSalary() > 50000);
        System.out.println("  All earning >$50K? " + allAbove50K);      // true

        long totalExperience = employees.stream()
            .mapToInt(Employee::getYearsOfExperience)
            .sum();
        System.out.println("  Total years of experience: " + totalExperience); // 63

        System.out.println("\n=== END OF REPORT ===");
    }
}

Concepts Demonstrated in the Practical Example

# Concept Where Used
1 filter() Section 1 -- filtering by department and salary
2 sorted() with Comparator Sections 1, 2, 7 -- sorting by salary
3 collect(Collectors.toList()) Sections 1, 3 -- gathering results
4 groupingBy() Sections 2, 3, 8, 9 -- grouping by department
5 averagingDouble() Sections 2, 9 -- average salary
6 mapping() downstream Sections 3, 8 -- extracting names within groups
7 max() with Comparator Section 4 -- highest-paid employee
8 partitioningBy() Section 5 -- senior vs junior split
9 summaryStatistics() Section 6 -- comprehensive salary stats
10 limit() Section 7 -- top 3
11 Collectors.joining() Section 8 -- comma-separated roster
12 Chained stream operations Section 9 -- collect then stream the result
13 anyMatch(), allMatch() Section 10 -- boolean checks
14 mapToInt() + sum() Section 10 -- total experience

Quick Reference

Category Operation Type Returns
Create collection.stream() Source Stream<T>
Create Stream.of(a, b, c) Source Stream<T>
Create IntStream.rangeClosed(1, 10) Source IntStream
Transform filter(Predicate) Intermediate Stream<T>
Transform map(Function) Intermediate Stream<R>
Transform flatMap(Function) Intermediate Stream<R>
Transform sorted() Intermediate Stream<T>
Transform distinct() Intermediate Stream<T>
Transform limit(n) / skip(n) Intermediate Stream<T>
Collect collect(Collectors.toList()) Terminal List<T>
Collect collect(Collectors.toSet()) Terminal Set<T>
Collect collect(Collectors.toMap(...)) Terminal Map<K,V>
Collect collect(Collectors.groupingBy(...)) Terminal Map<K,List<T>>
Collect collect(Collectors.joining(...)) Terminal String
Reduce reduce(identity, BinaryOp) Terminal T
Reduce count() Terminal long
Reduce min(Comparator) / max(Comparator) Terminal Optional<T>
Search findFirst() / findAny() Terminal Optional<T>
Match anyMatch / allMatch / noneMatch Terminal boolean
Action forEach(Consumer) Terminal void
July 21, 2021

Elasticsearch Sorting

By default, search results are returned sorted by relevance, with the most relevant docs first.

Relevance Score

The relevance score of each document is represented by a positive floating-point number called the _score. The higher the _score, the more relevant the document.

A query clause generates a _score for each document. How that score is calculated depends on the type of query clause. Different query clauses are used for different purposes: a fuzzy query might determine the _score by calculating how similar the spelling of the found word is to the original search term; a terms query would incor‐ porate the percentage of terms that were found. However, what we usually mean by relevance is the algorithm that we use to calculate how similar the contents of a full- text field are to a full-text query string.

The standard similarity algorithm used in Elasticsearch is known as term frequency/ inverse document frequency, or TF/IDF, which takes the following factors into account. The more often, the more relevant. A field containing five mentions of the same term is more likely to be relevant than a field containing just one mention.

Order

Sorting allows you to add one or more sorts on specific fields. Each sort can be reversed(ascending or descending) as well. The sort is defined on a per field level, with special field name for _score to sort by score, and _doc to sort by index order.

The order option can have either asc or desc.

The order defaults to desc when sorting on the _score, and defaults to asc when sorting on anything else.

GET users/_search
{
     "query" : {
            "filtered" : {
                "filter" : { "term" : { "id" : 1 }}
            }
     },
     "sort": { "date": { "order": "desc" }}
}

Perhaps we want to combine the _score from a query with the date, and show all matching results sorted first by date, then by relevance.

GET /_search
{
   "query" : {
            "filtered" : {
                "query":   { "match": { "description": "student" }},
                "filter" : { "term" : { "id" : 2 }}
            }
   }, 
   "sort": [
            {
             "date": {"order":"desc"}
             },
            { 
              "_score": { "order": "desc" }
            }
   ]
}

Order is important. Results are sorted by the first criterion first. Only results whose first sort value is identical will then be sorted by the second criterion, and so on. Multilevel sorting doesn’t have to involve the _score. You could sort by using several different fields, on geo-distance or on a custom value calculated in a script.

Elasticsearch supports sorting by array or multi-valued fields. The mode option controls what array value is picked for sorting the document it belongs to. The mode option can have the following values.

min Pick the lowest value.
max Pick the highest value.
sum Use the sum of all values as sort value. Only applicable for number based array fields.
avg Use the average of all values as sort value. Only applicable for number based array fields.
median Use the median of all values as sort value. Only applicable for number based array fields.

The default sort mode in the ascending sort order is min — the lowest value is picked. The default sort mode in the descending order is max — the highest value is picked.

Note that filters have no bearing on _score, and the missing-but-implied match_all query just sets the _score to a neutral value of 1 for all documents. In other words, all documents are considered to be equally relevant.

 

Sorting Numeric Fields

For numeric fields it is also possible to cast the values from one type to another using the numeric_type option. This option accepts the following values: ["double", "long", "date", "date_nanos"] and can be useful for searches across multiple data streams or indices where the sort field is mapped differently.

Geo Distance Sorting

Sometimes you want to sort by how close a location is to a single point(lat/long). You can do this in elasticsearch.

GET elasticsearch_learning/_search
{
"sort":[{
  "_geo_distance" : {
    "addresses.location" : [
      {
        "lat" : 40.414897,
        "lon" : -111.881186
      }
    ],
    "unit" : "m",
    "distance_type" : "arc",
    "order" : "desc",
    "nested" : {
      "path" : "addresses",
      "filter" : {
        "geo_distance" : {
          "addresses.location" : [
            -111.881186,
            40.414897
          ],
          "distance" : 1609.344,
          "distance_type" : "arc",
          "validation_method" : "STRICT",
          "ignore_unmapped" : false,
          "boost" : 1.0
        }
      }
    },
    "validation_method" : "STRICT",
    "ignore_unmapped" : false
  }
}]
}

 

/**
 * https://www.elastic.co/guide/en/elasticsearch/reference/7.x/query-dsl-nested-query.html<br>
 * https://www.elastic.co/guide/en/elasticsearch/reference/7.3/search-request-body.html#geo-sorting<br>
 * Sort results based on how close locations are to a certain point.
 */
@Test
void sortQueryWithGeoLocation() {

    int pageNumber = 0;
    int pageSize = 10;

    SearchRequest searchRequest = new SearchRequest(database);
    searchRequest.allowPartialSearchResults(true);
    searchRequest.indicesOptions(IndicesOptions.lenientExpandOpen());

    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
    searchSourceBuilder.from(pageNumber * pageSize);
    searchSourceBuilder.size(pageSize);
    searchSourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS));
    /**
     * fetch only a few fields
     */
    searchSourceBuilder.fetchSource(new String[]{"id", "firstName", "lastName", "rating", "dateOfBirth", "addresses.street", "addresses.zipcode", "addresses.city"}, new String[]{""});

    /**
     * Lehi skate park: 40.414897, -111.881186<br>
     * get locations/addresses close to skate park(from a radius).<br>
     */

    searchSourceBuilder.sort(new GeoDistanceSortBuilder("addresses.location", 40.414897,
            -111.881186).order(SortOrder.DESC)
           .setNestedSort(
                   new NestedSortBuilder("addresses").setFilter(QueryBuilders.geoDistanceQuery("addresses.location").point(40.414897, -111.881186).distance(1, DistanceUnit.MILES))));
    
    log.info("\n{\n\"sort\":{}\n}", searchSourceBuilder.sorts().toString());

    searchRequest.source(searchSourceBuilder);

    searchRequest.preference("nested-address");

    try {
        SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);

        log.info("hits={}, isTimedOut={}, totalShards={}, totalHits={}", searchResponse.getHits().getHits().length, searchResponse.isTimedOut(), searchResponse.getTotalShards(),
                searchResponse.getHits().getTotalHits().value);

        List<User> users = getResponseResult(searchResponse.getHits());

        log.info("results={}", ObjectUtils.toJson(users));

    } catch (IOException e) {
        log.warn("IOException, msg={}", e.getLocalizedMessage());
        e.printStackTrace();
    } catch (Exception e) {
        log.warn("Exception, msg={}", e.getLocalizedMessage());
        e.printStackTrace();
    }

}

 

Query with explain

Adding explain produces a lot of output for every hit, which can look overwhelming, but it is worth taking the time to understand what it all means. Don’t worry if it doesn’t all make sense now; you can refer to this section when you need it. We’ll work through the output for one hit bit by bit.

GET users/_search?explain
{
   "query" :{"match":{"description":"student"}} }
}

Producing the explain output is expensive. It is a debugging tool only. Don’t leave it turned on in production.

Fielddata

To make sorting efficient, Elasticsearch loads all the values for the field that you want to sort on into memory. This is referred to as fielddata. Elasticsearch doesn’t just load the values for the documents that matched a particular query. It loads the values from every docu‐ ment in your index, regardless of the document type.

The reason that Elasticsearch loads all values into memory is that uninverting the index from disk is slow. Even though you may need the values for only a few docs for the current request, you will probably need access to the values for other docs on the next request, so it makes sense to load all the values into memory at once, and to keep them there.

All you need to know is what fielddata is, and to be aware that it can be memory hungry. We will talk about how to determine the amount of memory that fielddata is using, how to limit the amount of memory that is available to it, and how to preload fielddata to improve the user experience.

Source Code on Github

 

March 21, 2021

Python Advanced – Generators & Iterators

Introduction

If you have ever tried to process a 10 GB log file by reading it entirely into memory, you already know why generators and iterators matter. They are Python’s answer to a fundamental problem: how do you work with sequences of data without materializing everything in memory at once?

An iterator is any object that produces values one at a time through a standard protocol. A generator is a special kind of iterator that you create with a function containing yield statements. Together, they let you build lazy pipelines that process data element by element, consuming only the memory needed for a single item at a time.

This is not just an academic concept. Every for loop in Python uses the iterator protocol under the hood. When you iterate over a file, a database cursor, or a range of numbers, you are already using iterators. Understanding how they work gives you the ability to write code that scales to datasets of any size without blowing up your memory footprint.

In this tutorial, we will cover the iterator protocol from the ground up, build custom iterators and generators, chain them into processing pipelines, and explore the itertools module. By the end, you will have a complete mental model for lazy evaluation in Python.


1. The Iterator Protocol

The iterator protocol is deceptively simple. It consists of two methods:

  • __iter__() — Returns the iterator object itself. This is what makes an object usable in a for loop.
  • __next__() — Returns the next value in the sequence. When there are no more values, it raises StopIteration.

That is the entire contract. Any object that implements both methods is an iterator. Any object that implements __iter__() (even if it returns a separate iterator object) is an iterable.

The distinction matters: a list is an iterable (it has __iter__() that returns a list iterator), but it is not itself an iterator (it does not have __next__()). The iterator is a separate object that tracks the current position.

# The iterator protocol in action
numbers = [10, 20, 30]

# Get an iterator from the iterable
it = iter(numbers)       # Calls numbers.__iter__()

print(next(it))          # 10  — Calls it.__next__()
print(next(it))          # 20
print(next(it))          # 30
# print(next(it))        # Raises StopIteration

# This is exactly what a for loop does internally:
# 1. Calls iter() on the iterable to get an iterator
# 2. Calls next() repeatedly until StopIteration
# 3. Catches StopIteration silently and exits the loop

for num in [10, 20, 30]:
    print(num)
# Equivalent to the manual iter()/next() calls above

Understanding StopIteration is key. It is not an error — it is the signal that tells Python the sequence is exhausted. The for loop catches it automatically, but if you call next() manually, you need to handle it yourself or pass a default value:

# Handling StopIteration manually
it = iter([1, 2])

print(next(it))           # 1
print(next(it))           # 2
print(next(it, "done"))   # "done" — default value instead of StopIteration

# Without a default, you must catch the exception
it = iter([1])
try:
    print(next(it))       # 1
    print(next(it))       # StopIteration raised here
except StopIteration:
    print("Iterator exhausted")

Making a Class Iterable

To make your own class work with for loops, implement the iterator protocol. Here is a class that counts up from a start value to a stop value:

class CountUp:
    """An iterator that counts from start to stop (inclusive)."""
    
    def __init__(self, start, stop):
        self.start = start
        self.stop = stop
        self.current = start
    
    def __iter__(self):
        return self
    
    def __next__(self):
        if self.current > self.stop:
            raise StopIteration
        value = self.current
        self.current += 1
        return value

# Use it in a for loop
for num in CountUp(1, 5):
    print(num, end=" ")  # 1 2 3 4 5

# Use it with list() to materialize all values
print(list(CountUp(10, 15)))  # [10, 11, 12, 13, 14, 15]

# Use it with sum(), max(), any(), etc.
print(sum(CountUp(1, 100)))   # 5050

2. Built-in Iterators

Python’s built-in types are all iterable. The iter() function extracts an iterator from any iterable, and next() advances it one step.

# Lists
list_iter = iter([1, 2, 3])
print(next(list_iter))  # 1
print(next(list_iter))  # 2

# Strings (iterate character by character)
str_iter = iter("Python")
print(next(str_iter))  # 'P'
print(next(str_iter))  # 'y'

# Dictionaries (iterate over keys by default)
data = {"name": "Alice", "age": 30, "role": "engineer"}
dict_iter = iter(data)
print(next(dict_iter))  # 'name'
print(next(dict_iter))  # 'age'

# Iterate over values or key-value pairs
for value in data.values():
    print(value, end=" ")  # Alice 30 engineer

for key, value in data.items():
    print(f"{key}={value}", end=" ")  # name=Alice age=30 role=engineer

# Sets (order is not guaranteed)
set_iter = iter({3, 1, 4, 1, 5})
print(next(set_iter))  # Could be any element

# Files are iterators (they yield lines)
with open("example.txt", "w") as f:
    f.write("line 1\nline 2\nline 3\n")

with open("example.txt") as f:
    for line in f:  # f is its own iterator
        print(line.strip())
    # line 1
    # line 2
    # line 3

Notice that files are their own iterators — calling iter(f) returns f itself. This is why you can iterate over a file directly in a for loop. It also means you can only iterate through a file once without resetting the file pointer.


3. Creating Custom Iterators

Let us build a few more custom iterators to solidify the pattern. Each one implements __iter__() and __next__().

Fibonacci Iterator

class Fibonacci:
    """An iterator that produces Fibonacci numbers up to a maximum value."""
    
    def __init__(self, max_value):
        self.max_value = max_value
        self.a = 0
        self.b = 1
    
    def __iter__(self):
        return self
    
    def __next__(self):
        if self.a > self.max_value:
            raise StopIteration
        value = self.a
        self.a, self.b = self.b, self.a + self.b
        return value

print(list(Fibonacci(100)))
# [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89]

# Works with any function that consumes an iterable
print(sum(Fibonacci(1000)))  # 2583

Range Reimplementation

class MyRange:
    """A simplified reimplementation of range()."""
    
    def __init__(self, start, stop=None, step=1):
        if stop is None:
            self.start = 0
            self.stop = start
        else:
            self.start = start
            self.stop = stop
        self.step = step
    
    def __iter__(self):
        # Return a new iterator each time — this allows reuse
        current = self.start
        while (self.step > 0 and current < self.stop) or \
              (self.step < 0 and current > self.stop):
            yield current  # Using yield here makes __iter__ a generator
            current += self.step
    
    def __len__(self):
        return max(0, (self.stop - self.start + self.step - 1) // self.step)
    
    def __repr__(self):
        return f"MyRange({self.start}, {self.stop}, {self.step})"

# Forward range
print(list(MyRange(5)))         # [0, 1, 2, 3, 4]
print(list(MyRange(2, 8)))      # [2, 3, 4, 5, 6, 7]
print(list(MyRange(0, 10, 3)))  # [0, 3, 6, 9]

# Reverse range
print(list(MyRange(10, 0, -2))) # [10, 8, 6, 4, 2]

# Reusable (unlike a plain iterator)
r = MyRange(3)
print(list(r))  # [0, 1, 2]
print(list(r))  # [0, 1, 2] — works again because __iter__ creates a new generator

Notice the MyRange trick: instead of implementing __next__() directly, the __iter__() method uses yield, which makes it a generator function. Each call to __iter__() creates a fresh generator object, so the range is reusable. This is a common and powerful pattern.


4. Generator Functions

Writing custom iterator classes is verbose. You need __init__, __iter__, __next__, manual state management, and StopIteration handling. Generators solve this by letting you write iterator logic as a simple function with yield statements.

When Python encounters a yield in a function body, that function becomes a generator function. Calling it does not execute the body — it returns a generator object that implements the iterator protocol automatically.

def count_up(start, stop):
    """A generator that counts from start to stop."""
    current = start
    while current <= stop:
        yield current       # Pause here, return current value
        current += 1        # Resume here on next() call

# Calling the function returns a generator object (does NOT run the body)
gen = count_up(1, 5)
print(type(gen))  # <class 'generator'>

# The generator implements the iterator protocol
print(next(gen))  # 1
print(next(gen))  # 2
print(next(gen))  # 3

# Use in a for loop
for num in count_up(1, 5):
    print(num, end=" ")  # 1 2 3 4 5

How Generators Work Internally

When you call next() on a generator, execution proceeds from the current position until it hits a yield statement. At that point, the yielded value is returned to the caller, and the generator's entire state (local variables, instruction pointer) is frozen. The next next() call resumes from exactly where it left off.

def demonstrate_state():
    print("Step 1: Starting")
    yield "first"
    print("Step 2: Resumed after first yield")
    yield "second"
    print("Step 3: Resumed after second yield")
    yield "third"
    print("Step 4: About to finish")
    # No more yields — StopIteration will be raised

gen = demonstrate_state()

print(next(gen))
# Step 1: Starting
# 'first'

print(next(gen))
# Step 2: Resumed after first yield
# 'second'

print(next(gen))
# Step 3: Resumed after second yield
# 'third'

# print(next(gen))
# Step 4: About to finish
# Raises StopIteration

Generator State

You can inspect a generator's state using the inspect module:

import inspect

def simple_gen():
    yield 1
    yield 2

gen = simple_gen()
print(inspect.getgeneratorstate(gen))  # GEN_CREATED

next(gen)
print(inspect.getgeneratorstate(gen))  # GEN_SUSPENDED

next(gen)
print(inspect.getgeneratorstate(gen))  # GEN_SUSPENDED

try:
    next(gen)
except StopIteration:
    pass
print(inspect.getgeneratorstate(gen))  # GEN_CLOSED

A generator moves through four states: GEN_CREATED (just created, not started), GEN_RUNNING (currently executing), GEN_SUSPENDED (paused at a yield), and GEN_CLOSED (finished or closed).

Fibonacci as a Generator

Compare the class-based Fibonacci iterator from earlier with the generator version:

# Generator version — drastically simpler
def fibonacci(max_value=None):
    a, b = 0, 1
    while max_value is None or a <= max_value:
        yield a
        a, b = b, a + b

# Finite sequence
print(list(fibonacci(100)))
# [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89]

# Infinite sequence (use itertools.islice to take a finite portion)
import itertools
print(list(itertools.islice(fibonacci(), 15)))
# [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377]

The generator version is 4 lines of logic compared to 12+ lines for the class. No __init__, no __iter__, no __next__, no StopIteration — Python handles all of it.


5. Generator Expressions

Generator expressions are to generators what list comprehensions are to lists. They use the same syntax as list comprehensions, but with parentheses instead of square brackets. The critical difference is that a generator expression produces values lazily — one at a time — while a list comprehension builds the entire list in memory.

import sys

# List comprehension — builds entire list in memory
squares_list = [x ** 2 for x in range(1_000_000)]
print(f"List size: {sys.getsizeof(squares_list):,} bytes")  # ~8,448,728 bytes

# Generator expression — produces values on demand
squares_gen = (x ** 2 for x in range(1_000_000))
print(f"Generator size: {sys.getsizeof(squares_gen):,} bytes")  # ~200 bytes

# Both support filtering
even_squares = (x ** 2 for x in range(20) if x % 2 == 0)
print(list(even_squares))  # [0, 4, 16, 36, 64, 100, 144, 196, 256, 324]

# Generator expressions can be passed directly to functions
# (no extra parentheses needed when it is the only argument)
total = sum(x ** 2 for x in range(1000))
print(total)  # 332833500

max_val = max(len(word) for word in ["Python", "generators", "are", "powerful"])
print(max_val)  # 10

has_negative = any(x < 0 for x in [1, -2, 3, 4])
print(has_negative)  # True

Memory Comparison

import sys

def compare_memory(n):
    """Compare memory usage of list vs generator for n elements."""
    
    # List comprehension
    data_list = [x * 2 for x in range(n)]
    list_size = sys.getsizeof(data_list)
    
    # Generator expression
    data_gen = (x * 2 for x in range(n))
    gen_size = sys.getsizeof(data_gen)
    
    print(f"n={n:>12,}  |  List: {list_size:>12,} bytes  |  Generator: {gen_size:>6,} bytes  |  Ratio: {list_size/gen_size:.0f}x")

compare_memory(100)
compare_memory(10_000)
compare_memory(1_000_000)
compare_memory(10_000_000)

# Output:
# n=         100  |  List:          920 bytes  |  Generator:    200 bytes  |  Ratio: 5x
# n=      10,000  |  List:       87,624 bytes  |  Generator:    200 bytes  |  Ratio: 438x
# n=   1,000,000  |  List:    8,448,728 bytes  |  Generator:    200 bytes  |  Ratio: 42244x
# n=  10,000,000  |  List:   80,000,056 bytes  |  Generator:    200 bytes  |  Ratio: 400000x

The generator's memory footprint is constant regardless of how many elements it produces. This is the fundamental advantage of lazy evaluation.


6. yield from

The yield from expression, introduced in Python 3.3, delegates iteration to a sub-generator or any iterable. It is cleaner than manually looping over a sub-iterable and yielding each element.

# Without yield from
def chain_manual(*iterables):
    for iterable in iterables:
        for item in iterable:
            yield item

# With yield from — cleaner
def chain_elegant(*iterables):
    for iterable in iterables:
        yield from iterable

# Both produce the same result
result = list(chain_elegant([1, 2, 3], "abc", (10, 20)))
print(result)  # [1, 2, 3, 'a', 'b', 'c', 10, 20]

Flattening Nested Structures

def flatten(nested):
    """Recursively flatten a nested structure."""
    for item in nested:
        if isinstance(item, (list, tuple)):
            yield from flatten(item)  # Delegate to recursive call
        else:
            yield item

data = [1, [2, 3], [4, [5, 6, [7, 8]]], 9]
print(list(flatten(data)))  # [1, 2, 3, 4, 5, 6, 7, 8, 9]

# Works with mixed nesting
mixed = [1, (2, [3, 4]), [5, (6,)], 7]
print(list(flatten(mixed)))  # [1, 2, 3, 4, 5, 6, 7]

Delegating to Sub-generators

def header_rows():
    yield "Name,Age,City"

def data_rows():
    yield "Alice,30,New York"
    yield "Bob,25,San Francisco"
    yield "Charlie,35,Chicago"

def footer_rows():
    yield "---END OF REPORT---"

def full_report():
    yield from header_rows()
    yield from data_rows()
    yield from footer_rows()

for line in full_report():
    print(line)
# Name,Age,City
# Alice,30,New York
# Bob,25,San Francisco
# Charlie,35,Chicago
# ---END OF REPORT---

7. Sending Values to Generators

Generators are not just producers — they can also receive values. The send() method resumes a generator and sends a value that becomes the result of the yield expression inside the generator. This turns generators into coroutines that can both produce and consume data.

def running_average():
    """A generator that computes a running average."""
    total = 0
    count = 0
    average = None
    while True:
        value = yield average   # Receive a value, yield the current average
        if value is None:
            break
        total += value
        count += 1
        average = total / count

# Usage
avg = running_average()
next(avg)              # Prime the generator (advance to first yield)

print(avg.send(10))    # 10.0
print(avg.send(20))    # 15.0
print(avg.send(30))    # 20.0
print(avg.send(40))    # 25.0

The first next() call is necessary to "prime" the generator — it advances execution to the first yield expression, where the generator is ready to receive a value. After that, send() both sends a value in and gets the next yielded value out.

Coroutine Pattern

def accumulator():
    """A coroutine that accumulates values and reports the running total."""
    total = 0
    while True:
        value = yield total
        if value is None:
            return total        # return value becomes StopIteration.value
        total += value

acc = accumulator()
next(acc)              # Prime

print(acc.send(5))     # 5
print(acc.send(10))    # 15
print(acc.send(3))     # 18

# Close the generator gracefully
try:
    acc.send(None)     # Triggers the return statement
except StopIteration as e:
    print(f"Final total: {e.value}")  # Final total: 18
# Practical coroutine: a filter that receives items and forwards matches
def grep_coroutine(pattern):
    """A coroutine that filters lines matching a pattern."""
    print(f"Looking for: {pattern}")
    matches = []
    while True:
        line = yield
        if line is None:
            break
        if pattern in line:
            matches.append(line)
            print(f"  Match: {line}")
    return matches

# Usage
searcher = grep_coroutine("error")
next(searcher)  # Prime

searcher.send("INFO: Server started")
searcher.send("ERROR: Connection timeout")   # Match
searcher.send("DEBUG: Request received")
searcher.send("ERROR: Disk full")             # Match
searcher.send("INFO: Shutting down")

try:
    searcher.send(None)  # Signal completion
except StopIteration as e:
    print(f"All matches: {e.value}")
# Match: ERROR: Connection timeout
# Match: ERROR: Disk full
# All matches: ['ERROR: Connection timeout', 'ERROR: Disk full']

8. Generator Pipelines

One of the most powerful patterns in Python is chaining generators into a processing pipeline. Each generator reads from the previous one, transforms the data, and passes it along. This works like Unix pipes — data flows through a chain of transformations without any intermediate lists being created in memory.

# Pipeline: Read lines -> filter non-empty -> strip whitespace -> convert to uppercase

def read_lines(text):
    """Stage 1: Split text into lines."""
    for line in text.split("\n"):
        yield line

def filter_non_empty(lines):
    """Stage 2: Remove empty lines."""
    for line in lines:
        if line.strip():
            yield line

def strip_whitespace(lines):
    """Stage 3: Strip leading/trailing whitespace."""
    for line in lines:
        yield line.strip()

def to_uppercase(lines):
    """Stage 4: Convert to uppercase."""
    for line in lines:
        yield line.upper()

# Chain the pipeline
raw_text = """
  hello world  
  
  Python generators  
  are powerful  
  
  and memory efficient  
"""

pipeline = to_uppercase(
    strip_whitespace(
        filter_non_empty(
            read_lines(raw_text)
        )
    )
)

for line in pipeline:
    print(line)
# HELLO WORLD
# PYTHON GENERATORS
# ARE POWERFUL
# AND MEMORY EFFICIENT

Data Processing Pipeline

# A more realistic pipeline: process log entries

def parse_log_entries(lines):
    """Parse each line into a structured dict."""
    for line in lines:
        parts = line.split(" | ")
        if len(parts) == 3:
            yield {
                "timestamp": parts[0],
                "level": parts[1],
                "message": parts[2]
            }

def filter_errors(entries):
    """Keep only ERROR entries."""
    for entry in entries:
        if entry["level"] == "ERROR":
            yield entry

def format_alerts(entries):
    """Format entries as alert strings."""
    for entry in entries:
        yield f"ALERT [{entry['timestamp']}]: {entry['message']}"

# Simulate log data
log_data = [
    "2024-01-15 10:00:01 | INFO | Server started",
    "2024-01-15 10:00:05 | ERROR | Database connection failed",
    "2024-01-15 10:00:10 | INFO | Retry attempt 1",
    "2024-01-15 10:00:15 | ERROR | Database connection failed again",
    "2024-01-15 10:00:20 | INFO | Connection restored",
    "2024-01-15 10:00:25 | ERROR | Disk space low",
]

# Build the pipeline
alerts = format_alerts(filter_errors(parse_log_entries(log_data)))

for alert in alerts:
    print(alert)
# ALERT [2024-01-15 10:00:05]: Database connection failed
# ALERT [2024-01-15 10:00:15]: Database connection failed again
# ALERT [2024-01-15 10:00:25]: Disk space low

Each stage processes one item at a time. No intermediate lists are created. This means you could pipe a 100 GB log file through this pipeline and it would use a trivial amount of memory.


9. The itertools Module

The itertools module is Python's standard library for efficient iterator operations. Every function in it returns an iterator, so they compose naturally with generators and pipelines. Here are the functions you will use most often.

Infinite Iterators

import itertools

# count: count from a start value with a step
for i in itertools.islice(itertools.count(10, 2), 5):
    print(i, end=" ")  # 10 12 14 16 18
print()

# cycle: repeat an iterable forever
colors = itertools.cycle(["red", "green", "blue"])
for _ in range(7):
    print(next(colors), end=" ")  # red green blue red green blue red
print()

# repeat: repeat a value n times (or forever)
fives = list(itertools.repeat(5, 4))
print(fives)  # [5, 5, 5, 5]

# Practical use of repeat: initialize a grid
row = list(itertools.repeat(0, 5))
grid = [list(itertools.repeat(0, 5)) for _ in range(3)]
print(grid)  # [[0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0]]

Terminating Iterators

import itertools

# chain: concatenate multiple iterables
combined = list(itertools.chain([1, 2], [3, 4], [5, 6]))
print(combined)  # [1, 2, 3, 4, 5, 6]

# chain.from_iterable: chain from a single iterable of iterables
nested = [[1, 2], [3, 4], [5, 6]]
flat = list(itertools.chain.from_iterable(nested))
print(flat)  # [1, 2, 3, 4, 5, 6]

# islice: slice an iterator (like list slicing but for iterators)
print(list(itertools.islice(range(100), 5)))         # [0, 1, 2, 3, 4]
print(list(itertools.islice(range(100), 10, 20, 3))) # [10, 13, 16, 19]

# takewhile / dropwhile: take/drop based on a predicate
nums = [1, 3, 5, 7, 2, 4, 6, 8]
print(list(itertools.takewhile(lambda x: x < 6, nums)))  # [1, 3, 5]
print(list(itertools.dropwhile(lambda x: x < 6, nums)))  # [7, 2, 4, 6, 8]

# groupby: group consecutive elements by a key function
data = [("A", 1), ("A", 2), ("B", 3), ("B", 4), ("A", 5)]
for key, group in itertools.groupby(data, key=lambda x: x[0]):
    print(f"{key}: {list(group)}")
# A: [('A', 1), ('A', 2)]
# B: [('B', 3), ('B', 4)]
# A: [('A', 5)]           <-- Note: only groups CONSECUTIVE matches

Combinatoric Iterators

import itertools

# combinations: all r-length combinations (no repeats, order doesn't matter)
print(list(itertools.combinations("ABCD", 2)))
# [('A','B'), ('A','C'), ('A','D'), ('B','C'), ('B','D'), ('C','D')]

# combinations_with_replacement: combinations allowing repeats
print(list(itertools.combinations_with_replacement("AB", 3)))
# [('A','A','A'), ('A','A','B'), ('A','B','B'), ('B','B','B')]

# permutations: all r-length arrangements (order matters)
print(list(itertools.permutations("ABC", 2)))
# [('A','B'), ('A','C'), ('B','A'), ('B','C'), ('C','A'), ('C','B')]

# product: Cartesian product (like nested for loops)
print(list(itertools.product("AB", [1, 2])))
# [('A',1), ('A',2), ('B',1), ('B',2)]

# Practical: generate all possible configs
sizes = ["small", "medium", "large"]
colors = ["red", "blue"]
materials = ["cotton", "silk"]

for combo in itertools.product(sizes, colors, materials):
    print(combo)
# ('small', 'red', 'cotton')
# ('small', 'red', 'silk')
# ('small', 'blue', 'cotton')
# ... (12 total combinations)

10. Practical Examples

Reading Large Files Line by Line

This is the canonical use case for generators. Instead of loading an entire file into memory, you process it one line at a time.

def read_large_file(file_path):
    """Read a file line by line using a generator."""
    with open(file_path, "r") as f:
        for line in f:
            yield line.strip()

def count_errors_in_log(file_path):
    """Count error lines in a log file without loading it into memory."""
    error_count = 0
    for line in read_large_file(file_path):
        if "ERROR" in line:
            error_count += 1
    return error_count

# For a 10 GB log file, this uses ~1 line of memory at a time
# Instead of loading all 10 GB:
# count = count_errors_in_log("/var/log/huge_application.log")

# Alternative using generator expression:
# error_count = sum(1 for line in read_large_file(path) if "ERROR" in line)

Infinite Sequence Generators

import itertools

def primes():
    """Generate prime numbers indefinitely using a sieve approach."""
    yield 2
    composites = {}  # Maps composite number -> list of primes that divide it
    candidate = 3
    while True:
        if candidate not in composites:
            # candidate is prime
            yield candidate
            composites[candidate * candidate] = [candidate]
        else:
            # candidate is composite; advance its prime factors
            for prime in composites[candidate]:
                composites.setdefault(candidate + prime, []).append(prime)
            del composites[candidate]
        candidate += 2  # Skip even numbers

# Get the first 20 prime numbers
first_20_primes = list(itertools.islice(primes(), 20))
print(first_20_primes)
# [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71]

# Sum of the first 1000 primes
print(sum(itertools.islice(primes(), 1000)))  # 3682913

Data Pipeline: Read CSV, Filter, Transform, Aggregate

import csv
from io import StringIO

# Simulated CSV data
csv_data = """name,department,salary
Alice,Engineering,120000
Bob,Marketing,85000
Charlie,Engineering,135000
Diana,Marketing,90000
Eve,Engineering,110000
Frank,HR,75000
Grace,Engineering,140000
"""

def read_csv_rows(csv_text):
    """Stage 1: Parse CSV into dictionaries."""
    reader = csv.DictReader(StringIO(csv_text))
    for row in reader:
        yield row

def filter_department(rows, dept):
    """Stage 2: Keep only rows matching the department."""
    for row in rows:
        if row["department"] == dept:
            yield row

def transform_salary(rows):
    """Stage 3: Convert salary to int and add a bonus field."""
    for row in rows:
        salary = int(row["salary"])
        row["salary"] = salary
        row["bonus"] = salary * 0.1  # 10% bonus
        yield row

def aggregate(rows):
    """Stage 4: Compute total salary and average."""
    total = 0
    count = 0
    for row in rows:
        total += row["salary"]
        count += 1
        yield row  # Pass through for downstream consumers
    # After iteration, print the summary
    if count > 0:
        print(f"\nTotal salary: ${total:,}")
        print(f"Average salary: ${total/count:,.0f}")
        print(f"Headcount: {count}")

# Build and run the pipeline
pipeline = aggregate(
    transform_salary(
        filter_department(
            read_csv_rows(csv_data),
            "Engineering"
        )
    )
)

for emp in pipeline:
    print(f"{emp['name']}: ${emp['salary']:,} (bonus: ${emp['bonus']:,.0f})")

# Alice: $120,000 (bonus: $12,000)
# Charlie: $135,000 (bonus: $13,500)
# Eve: $110,000 (bonus: $11,000)
# Grace: $140,000 (bonus: $14,000)
#
# Total salary: $505,000
# Average salary: $126,250
# Headcount: 4

Pagination Generator for API Results

import time

def paginated_api_fetch(base_url, page_size=100):
    """
    Generator that fetches paginated API results.
    Yields individual items across all pages.
    """
    page = 1
    while True:
        # Simulate API call (replace with real requests.get())
        url = f"{base_url}?page={page}&size={page_size}"
        print(f"Fetching: {url}")
        
        # Simulated response
        if page <= 3:
            results = [{"id": i, "name": f"Item {i}"} 
                       for i in range((page-1)*page_size + 1, page*page_size + 1)]
        else:
            results = []  # No more data
        
        if not results:
            break  # No more pages
        
        yield from results  # Yield each item individually
        page += 1
        time.sleep(0.1)  # Rate limiting

# The consumer does not need to know about pagination
for item in paginated_api_fetch("https://api.example.com/items", page_size=2):
    print(f"  Processing: {item}")
    if item["id"] >= 5:
        break  # Stop early — remaining pages are never fetched!

# Output:
# Fetching: https://api.example.com/items?page=1&size=2
#   Processing: {'id': 1, 'name': 'Item 1'}
#   Processing: {'id': 2, 'name': 'Item 2'}
# Fetching: https://api.example.com/items?page=2&size=2
#   Processing: {'id': 3, 'name': 'Item 3'}
#   Processing: {'id': 4, 'name': 'Item 4'}
# Fetching: https://api.example.com/items?page=3&size=2
#   Processing: {'id': 5, 'name': 'Item 5'}

Notice the key advantage: when the consumer breaks out of the loop, the generator stops fetching. Pages 4, 5, 6, etc. are never requested. Lazy evaluation means you only do the work that is actually needed.


11. Performance Comparison

Let us put hard numbers on the difference between lists and generators.

import sys
import time
import tracemalloc

def benchmark_list_vs_generator(n):
    """Compare list vs generator for summing n squared numbers."""
    
    # List approach
    tracemalloc.start()
    start = time.perf_counter()
    result_list = sum([x ** 2 for x in range(n)])
    list_time = time.perf_counter() - start
    list_peak = tracemalloc.get_traced_memory()[1]
    tracemalloc.stop()
    
    # Generator approach
    tracemalloc.start()
    start = time.perf_counter()
    result_gen = sum(x ** 2 for x in range(n))
    gen_time = time.perf_counter() - start
    gen_peak = tracemalloc.get_traced_memory()[1]
    tracemalloc.stop()
    
    assert result_list == result_gen
    
    print(f"n = {n:>12,}")
    print(f"  List:      {list_time:.4f}s | Peak memory: {list_peak:>12,} bytes")
    print(f"  Generator: {gen_time:.4f}s  | Peak memory: {gen_peak:>12,} bytes")
    print(f"  Memory saved: {(1 - gen_peak/list_peak)*100:.1f}%")
    print()

benchmark_list_vs_generator(100_000)
benchmark_list_vs_generator(1_000_000)
benchmark_list_vs_generator(10_000_000)

# Typical output:
# n =      100,000
#   List:      0.0234s | Peak memory:      824,464 bytes
#   Generator: 0.0228s | Peak memory:          464 bytes
#   Memory saved: 99.9%
#
# n =    1,000,000
#   List:      0.2451s | Peak memory:    8,448,688 bytes
#   Generator: 0.2389s | Peak memory:          464 bytes
#   Memory saved: 100.0%
#
# n =   10,000,000
#   List:      2.5102s | Peak memory:   80,000,048 bytes
#   Generator: 2.4231s | Peak memory:          464 bytes
#   Memory saved: 100.0%

Key takeaways from the benchmark:

  • Memory: Generators use a constant ~464 bytes regardless of dataset size. Lists grow linearly.
  • Speed: For aggregation operations like sum(), generators are slightly faster because they avoid the overhead of allocating and populating a list.
  • When lists win: If you need random access, multiple passes over the data, or the dataset fits comfortably in memory, a list is simpler and sometimes faster due to cache locality.

12. Common Pitfalls

Generators have some surprising behaviors that trip up even experienced developers. Here are the ones you must know.

Generator Exhaustion

# Generators can only be consumed ONCE
gen = (x ** 2 for x in range(5))

print(list(gen))  # [0, 1, 4, 9, 16]
print(list(gen))  # [] — exhausted! No error, just empty.

# This is a common bug:
def get_numbers():
    yield 1
    yield 2
    yield 3

nums = get_numbers()
print(sum(nums))  # 6
print(sum(nums))  # 0 — the generator is already exhausted!

# Fix: recreate the generator each time, or use a list if you need multiple passes
nums_list = list(get_numbers())
print(sum(nums_list))  # 6
print(sum(nums_list))  # 6

Cannot Index, Slice, or Get Length

gen = (x for x in range(10))

# These all fail:
# gen[0]      # TypeError: 'generator' object is not subscriptable
# gen[2:5]    # TypeError: 'generator' object is not subscriptable
# len(gen)    # TypeError: object of type 'generator' has no len()

# Workarounds:
import itertools

# Get the nth element (consumes n elements)
def nth(iterable, n, default=None):
    return next(itertools.islice(iterable, n, None), default)

gen = (x ** 2 for x in range(10))
print(nth(gen, 3))  # 9 (the 4th element, 0-indexed)

# Slice an iterator
gen = (x ** 2 for x in range(10))
print(list(itertools.islice(gen, 2, 5)))  # [4, 9, 16]

The Reuse Gotcha

# A subtle bug: storing a generator and trying to use it in multiple places

def get_even_numbers(n):
    return (x for x in range(n) if x % 2 == 0)

evens = get_even_numbers(20)

# First use works fine
for x in evens:
    if x > 6:
        break
print(f"Stopped at {x}")  # Stopped at 8

# Second use — CONTINUES from where we left off, not from the beginning!
remaining = list(evens)
print(remaining)  # [10, 12, 14, 16, 18]

# If you expected [0, 2, 4, 6, 8, 10, 12, 14, 16, 18], you have a bug.

Late Binding in Generator Expressions

# Variables in generator expressions are evaluated lazily
funcs = []
for i in range(5):
    funcs.append(lambda: i)  # All lambdas capture the SAME variable i

print([f() for f in funcs])  # [4, 4, 4, 4, 4] — not [0, 1, 2, 3, 4]!

# Fix: use a default argument to capture the current value
funcs = []
for i in range(5):
    funcs.append(lambda i=i: i)  # Each lambda gets its own copy

print([f() for f in funcs])  # [0, 1, 2, 3, 4]

13. Best Practices

Here are the guidelines I follow when deciding how to use generators in production code.

Use Generators for Large or Potentially Infinite Datasets

# GOOD: generator for processing a large file
def process_log_file(path):
    with open(path) as f:
        for line in f:
            if "ERROR" in line:
                yield parse_error(line)

# BAD: loading entire file into memory
def process_log_file_bad(path):
    with open(path) as f:
        lines = f.readlines()  # Entire file in memory!
    return [parse_error(line) for line in lines if "ERROR" in line]

Prefer Generator Expressions for Simple Transformations

# GOOD: generator expression passed directly to sum()
total = sum(order.total for order in orders if order.status == "completed")

# UNNECESSARY: creating an intermediate list
total = sum([order.total for order in orders if order.status == "completed"])

Use itertools Instead of Reinventing the Wheel

import itertools

# GOOD: use itertools.chain instead of nested loops
all_items = itertools.chain(list_a, list_b, list_c)

# GOOD: use itertools.groupby for grouping
for key, group in itertools.groupby(sorted_data, key=extract_key):
    process_group(key, list(group))

# GOOD: use itertools.islice for taking the first N items from an iterator
first_ten = list(itertools.islice(infinite_generator(), 10))

Make Reusable Iterables When Needed

# If you need to iterate multiple times, use a class with __iter__
class DataSource:
    def __init__(self, path):
        self.path = path
    
    def __iter__(self):
        with open(self.path) as f:
            for line in f:
                yield line.strip()

# Each for loop gets a fresh iterator
source = DataSource("data.txt")
count = sum(1 for _ in source)        # First pass: count lines
total = sum(len(line) for line in source)  # Second pass: total chars

Document Generator Exhaustion Behavior

def fetch_records(query):
    """
    Yield records matching the query from the database.
    
    WARNING: This generator can only be consumed once.
    If you need multiple passes, materialize with list().
    """
    cursor = db.execute(query)
    for row in cursor:
        yield transform(row)

14. Key Takeaways

  • Iterators are objects that implement __iter__() and __next__(). They produce values one at a time and raise StopIteration when done. Every for loop in Python uses this protocol.
  • Generators are iterators created with yield. They are dramatically simpler to write than class-based iterators. The function's state is automatically saved and restored between next() calls.
  • Generator expressions provide a compact syntax for simple generators: (expr for x in iterable if condition). They use constant memory regardless of the source size.
  • yield from delegates to sub-generators and is essential for flattening nested structures and composing generators cleanly.
  • send() turns generators into coroutines that can receive values as well as produce them. This is a powerful pattern for stateful data processing.
  • Generator pipelines chain multiple generators together like Unix pipes. Data flows through the pipeline one element at a time, keeping memory usage flat.
  • itertools provides battle-tested, C-optimized iterator utilities. Use chain, islice, groupby, combinations, permutations, and product instead of writing your own.
  • Memory matters. For datasets that do not fit in memory, generators are not optional — they are the only way. Even for smaller datasets, generators avoid unnecessary allocations.
  • Generators exhaust. You can only iterate through a generator once. If you need multiple passes, either recreate the generator or materialize it into a list.
  • Use generators by default when processing sequences of data. Switch to lists only when you need random access, multiple iterations, or the dataset is small enough that the simplicity of a list outweighs the memory cost.
March 21, 2021