Python table of content




Subscribe To Our Newsletter
You will receive our latest post and tutorial.
Thank you for subscribing!

required
required


Python – Modules & Packages

As your Python projects grow beyond a single script, you need a way to organize code into logical, reusable units. Copy-pasting functions between files is a maintenance disaster waiting to happen. This is where modules and packages come in — they are Python’s answer to code organization, reusability, and namespace management. Every serious Python project relies on them, and understanding how they work is essential for writing professional-grade software.

In this tutorial, we will cover everything from basic imports to creating your own distributable packages, managing dependencies with virtual environments, and avoiding the common pitfalls that trip up even experienced developers.

What is a Module?

A module is simply a .py file containing Python definitions — functions, classes, variables, and executable statements. The file name (minus the .py extension) becomes the module name. If you have a file called math_utils.py, you have a module called math_utils. That is it — there is no special registration step or configuration required.

Every Python file you have ever written is already a module. The only difference between a “script” and a “module” is how you use it: a script is executed directly, while a module is imported by other code.

# math_utils.py - this file IS a module

PI = 3.14159265358979

def circle_area(radius):
    """Calculate the area of a circle."""
    return PI * radius ** 2

def rectangle_area(length, width):
    """Calculate the area of a rectangle."""
    return length * width

def fahrenheit_to_celsius(f):
    """Convert Fahrenheit to Celsius."""
    return (f - 32) * 5 / 9

Now any other Python file can import and use these definitions without rewriting them.

Importing Modules

Python provides several ways to import modules, each with different trade-offs in terms of readability, namespace pollution, and convenience.

The import Statement

The most straightforward way to import a module. You access its contents using dot notation, which keeps it clear where each name comes from.

import math_utils

area = math_utils.circle_area(5)
print(f"Circle area: {area}")  # Circle area: 78.53981633974483

temp = math_utils.fahrenheit_to_celsius(212)
print(f"212°F = {temp}°C")  # 212°F = 100.0°C

The from…import Statement

When you only need specific items from a module, use from...import. This brings the names directly into your namespace so you do not need the module prefix.

from math_utils import circle_area, fahrenheit_to_celsius

area = circle_area(5)
temp = fahrenheit_to_celsius(100)

print(f"Area: {area}")   # Area: 78.53981633974483
print(f"Temp: {temp}")   # Temp: 37.77777777777778

Aliasing with as

You can rename a module or an imported name using as. This is useful when module names are long or when you want to avoid name collisions.

# Alias a module
import math_utils as mu

area = mu.circle_area(10)

# Alias a specific import
from math_utils import fahrenheit_to_celsius as f2c

temp = f2c(98.6)
print(f"Body temp: {temp:.1f}°C")  # Body temp: 37.0°C

You will see this convention everywhere in the Python ecosystem: import numpy as np, import pandas as pd, import matplotlib.pyplot as plt. These aliases are so standard that using different ones will confuse other developers reading your code.

The Wildcard import *

You can import everything from a module with from module import *. This pulls all public names (those not starting with an underscore) into your namespace.

from math_utils import *

# Now circle_area, rectangle_area, fahrenheit_to_celsius, and PI
# are all available directly
print(circle_area(3))     # 28.274333882308138
print(PI)                 # 3.14159265358979

Avoid wildcard imports in production code. They pollute your namespace, make it impossible to tell where a name came from, and can silently overwrite existing names. The only acceptable use case is in interactive sessions or the Python REPL for quick exploration.

The Module Search Path

When you write import math_utils, Python needs to find math_utils.py somewhere on disk. It searches the following locations, in order:

  1. The directory containing the script that was executed (or the current directory in an interactive session)
  2. Directories listed in the PYTHONPATH environment variable (if set)
  3. The installation-dependent default directories (site-packages, standard library)

You can inspect and modify this search path at runtime via sys.path.

import sys

# Print the module search path
for path in sys.path:
    print(path)

# Add a custom directory to the search path at runtime
sys.path.append("/home/folau/my_custom_libs")

Modifying sys.path at runtime is a quick fix, but not a best practice. For production code, install your modules properly as packages or use PYTHONPATH environment variable configuration.

Creating Your Own Module

Let us build a practical module step by step. Create a file called string_helpers.py.

# string_helpers.py

def slugify(text):
    """Convert a string to a URL-friendly slug."""
    import re
    text = text.lower().strip()
    text = re.sub(r'[^\w\s-]', '', text)
    text = re.sub(r'[\s_]+', '-', text)
    text = re.sub(r'-+', '-', text)
    return text

def truncate(text, max_length=100, suffix="..."):
    """Truncate text to max_length, adding suffix if truncated."""
    if len(text) <= max_length:
        return text
    return text[:max_length - len(suffix)].rsplit(' ', 1)[0] + suffix

def title_case(text):
    """Convert text to title case, handling common prepositions."""
    small_words = {'a', 'an', 'the', 'and', 'but', 'or', 'for', 'nor',
                   'on', 'at', 'to', 'by', 'in', 'of', 'up'}
    words = text.split()
    result = []
    for i, word in enumerate(words):
        if i == 0 or word.lower() not in small_words:
            result.append(word.capitalize())
        else:
            result.append(word.lower())
    return ' '.join(result)

def count_words(text):
    """Count the number of words in a string."""
    return len(text.split())


# This block runs ONLY when the file is executed directly,
# NOT when it is imported as a module
if __name__ == "__main__":
    print("Testing string_helpers module:")
    print(slugify("Hello World! This is a Test"))   # hello-world-this-is-a-test
    print(truncate("This is a very long string that should be truncated", 30))
    print(title_case("the quick brown fox jumps over the lazy dog"))
    print(count_words("Python modules are powerful"))  # 4

The __name__ == "__main__" Pattern

This is one of Python's most important idioms. Every module has a built-in __name__ attribute. When a file is run directly (e.g., python string_helpers.py), __name__ is set to "__main__". When the file is imported as a module, __name__ is set to the module's name (e.g., "string_helpers").

This pattern lets you include test code or a CLI interface in the same file as your module without it running on import.

# Using the module in another file
from string_helpers import slugify, truncate

title = "Python Modules & Packages: A Complete Guide"
slug = slugify(title)
print(slug)  # python-modules-packages-a-complete-guide

summary = truncate("This comprehensive tutorial covers everything you need...", 40)
print(summary)  # This comprehensive tutorial covers...

Packages

A package is a directory that contains Python modules and a special __init__.py file. Packages let you organize related modules into a hierarchical directory structure — think of them as "folders of modules."

Basic Package Structure

myproject/
├── utils/
│   ├── __init__.py
│   ├── string_helpers.py
│   ├── math_helpers.py
│   └── file_helpers.py
├── models/
│   ├── __init__.py
│   ├── user.py
│   └── product.py
└── main.py

The __init__.py file tells Python that the directory should be treated as a package. It can be empty, or it can contain initialization code and define what gets exported when someone uses from package import *.

The __init__.py File

# utils/__init__.py

# You can import commonly used items here for convenience
from .string_helpers import slugify, truncate
from .math_helpers import circle_area

# Define what 'from utils import *' exports
__all__ = ['slugify', 'truncate', 'circle_area']

# Package-level constants
VERSION = "1.0.0"

With this __init__.py, users of your package get a cleaner import experience.

# Without __init__.py convenience imports:
from utils.string_helpers import slugify

# With __init__.py convenience imports:
from utils import slugify  # much cleaner

Importing from Packages

# Import a specific module from a package
from utils import string_helpers
string_helpers.slugify("Hello World")

# Import a specific function from a module in a package
from utils.string_helpers import slugify
slugify("Hello World")

# Import the package itself (uses __init__.py)
import utils
utils.slugify("Hello World")  # only works if __init__.py exports it

Sub-packages

Packages can contain other packages, creating a hierarchy as deep as you need.

myproject/
├── services/
│   ├── __init__.py
│   ├── auth/
│   │   ├── __init__.py
│   │   ├── jwt_handler.py
│   │   └── oauth.py
│   └── payments/
│       ├── __init__.py
│       ├── stripe_client.py
│       └── paypal_client.py
└── main.py
# Importing from sub-packages
from services.auth.jwt_handler import create_token
from services.payments.stripe_client import charge_customer

Relative Imports

Inside a package, you can use relative imports to reference sibling modules. A single dot (.) refers to the current package, two dots (..) to the parent package.

# Inside services/auth/jwt_handler.py

# Relative import from the same package (auth)
from .oauth import get_oauth_token

# Relative import from the parent package (services)
from ..payments.stripe_client import charge_customer

Important: Relative imports only work inside packages. They will fail if you try to run the file directly as a script. Always prefer absolute imports unless you have a strong reason to use relative ones.

The Standard Library

Python ships with an extensive standard library — often described as "batteries included." Here are the modules you will reach for most often in real-world projects.

os — Operating System Interface

import os

# Get environment variables
db_host = os.environ.get("DB_HOST", "localhost")
debug = os.environ.get("DEBUG", "false")

# Work with file paths (prefer pathlib for new code)
current_dir = os.getcwd()
home_dir = os.path.expanduser("~")
full_path = os.path.join(current_dir, "data", "output.csv")

# Check if files/directories exist
print(os.path.exists("/tmp/myfile.txt"))
print(os.path.isdir("/tmp"))

# Create directories
os.makedirs("output/reports", exist_ok=True)

# List directory contents
files = os.listdir(".")
print(files)

sys — System-Specific Parameters

import sys

# Python version info
print(sys.version)        # 3.12.0 (main, Oct 2 2023, ...)
print(sys.version_info)   # sys.version_info(major=3, minor=12, ...)

# Command-line arguments
print(sys.argv)  # ['script.py', 'arg1', 'arg2']

# Module search path
print(sys.path)

# Exit the program with a status code
# sys.exit(0)  # 0 = success, non-zero = error

# Platform information
print(sys.platform)   # 'darwin', 'linux', 'win32'
print(sys.maxsize)    # Maximum integer size

json — JSON Encoding and Decoding

import json

# Python dict to JSON string
user = {"name": "Folau", "age": 30, "skills": ["Python", "Java", "SQL"]}
json_string = json.dumps(user, indent=2)
print(json_string)

# JSON string to Python dict
data = json.loads('{"status": "active", "count": 42}')
print(data["status"])  # active

# Read JSON from a file
with open("config.json", "r") as f:
    config = json.load(f)

# Write JSON to a file
with open("output.json", "w") as f:
    json.dump(user, f, indent=2)

datetime — Date and Time

from datetime import datetime, timedelta, date

# Current date and time
now = datetime.now()
print(now)  # 2026-02-26 10:30:45.123456

# Formatting dates
formatted = now.strftime("%Y-%m-%d %H:%M:%S")
print(formatted)  # 2026-02-26 10:30:45

# Parsing date strings
parsed = datetime.strptime("2026-02-26", "%Y-%m-%d")
print(parsed)  # 2026-02-26 00:00:00

# Date arithmetic
tomorrow = date.today() + timedelta(days=1)
next_week = datetime.now() + timedelta(weeks=1)
thirty_days_ago = datetime.now() - timedelta(days=30)

# Difference between dates
deadline = datetime(2026, 12, 31)
remaining = deadline - datetime.now()
print(f"Days remaining: {remaining.days}")

math — Mathematical Functions

import math

print(math.pi)          # 3.141592653589793
print(math.e)           # 2.718281828459045
print(math.sqrt(144))   # 12.0
print(math.ceil(4.2))   # 5
print(math.floor(4.8))  # 4
print(math.log(100, 10))  # 2.0
print(math.factorial(5))  # 120
print(math.gcd(48, 18))   # 6

random — Random Number Generation

import random

# Random integer in range [1, 100]
print(random.randint(1, 100))

# Random float in [0.0, 1.0)
print(random.random())

# Random choice from a sequence
colors = ["red", "green", "blue", "yellow"]
print(random.choice(colors))

# Shuffle a list in place
cards = list(range(1, 53))
random.shuffle(cards)
print(cards[:5])  # first 5 cards after shuffle

# Sample without replacement
lottery = random.sample(range(1, 50), 6)
print(sorted(lottery))

pathlib — Modern File Path Handling

from pathlib import Path

# Create Path objects
home = Path.home()
project = Path("/home/folau/projects/myapp")
config_file = project / "config" / "settings.json"

print(config_file)          # /home/folau/projects/myapp/config/settings.json
print(config_file.name)     # settings.json
print(config_file.stem)     # settings
print(config_file.suffix)   # .json
print(config_file.parent)   # /home/folau/projects/myapp/config

# Check existence
print(project.exists())
print(config_file.is_file())

# Create directories
output_dir = project / "output"
output_dir.mkdir(parents=True, exist_ok=True)

# Read and write files
readme = project / "README.md"
# readme.write_text("# My Project\n")
# content = readme.read_text()

# Glob for file patterns
python_files = list(project.glob("**/*.py"))
print(f"Found {len(python_files)} Python files")

collections — Specialized Container Types

from collections import Counter, defaultdict, namedtuple, deque

# Counter - count occurrences
words = ["apple", "banana", "apple", "cherry", "banana", "apple"]
word_counts = Counter(words)
print(word_counts)                # Counter({'apple': 3, 'banana': 2, 'cherry': 1})
print(word_counts.most_common(2)) # [('apple', 3), ('banana', 2)]

# defaultdict - dict with default values for missing keys
grouped = defaultdict(list)
students = [("math", "Alice"), ("science", "Bob"), ("math", "Charlie")]
for subject, student in students:
    grouped[subject].append(student)
print(dict(grouped))  # {'math': ['Alice', 'Charlie'], 'science': ['Bob']}

# namedtuple - lightweight immutable objects
Point = namedtuple("Point", ["x", "y"])
p = Point(3, 4)
print(f"x={p.x}, y={p.y}")  # x=3, y=4

# deque - double-ended queue with O(1) appends/pops on both ends
queue = deque(["first", "second", "third"])
queue.append("fourth")       # add to right
queue.appendleft("zeroth")   # add to left
print(queue.popleft())       # zeroth - remove from left

itertools — Iterator Building Blocks

import itertools

# chain - combine multiple iterables
combined = list(itertools.chain([1, 2], [3, 4], [5, 6]))
print(combined)  # [1, 2, 3, 4, 5, 6]

# product - cartesian product
sizes = ["S", "M", "L"]
colors = ["red", "blue"]
combos = list(itertools.product(sizes, colors))
print(combos)  # [('S', 'red'), ('S', 'blue'), ('M', 'red'), ...]

# groupby - group consecutive elements
data = [("A", 1), ("A", 2), ("B", 3), ("B", 4), ("A", 5)]
data.sort(key=lambda x: x[0])  # must be sorted first
for key, group in itertools.groupby(data, key=lambda x: x[0]):
    print(f"{key}: {list(group)}")

# islice - slice an iterator
first_five = list(itertools.islice(range(100), 5))
print(first_five)  # [0, 1, 2, 3, 4]

functools — Higher-Order Functions

from functools import lru_cache, partial, reduce

# lru_cache - memoize expensive function calls
@lru_cache(maxsize=128)
def fibonacci(n):
    if n < 2:
        return n
    return fibonacci(n - 1) + fibonacci(n - 2)

print(fibonacci(50))  # 12586269025 - computed instantly with caching

# partial - create a new function with some arguments pre-filled
def power(base, exponent):
    return base ** exponent

square = partial(power, exponent=2)
cube = partial(power, exponent=3)
print(square(5))  # 25
print(cube(3))    # 27

# reduce - apply a function cumulatively to a sequence
numbers = [1, 2, 3, 4, 5]
product = reduce(lambda a, b: a * b, numbers)
print(product)  # 120

Installing Third-Party Packages

While the standard library is extensive, real-world projects almost always need third-party packages. Python's package installer, pip, downloads and installs packages from the Python Package Index (PyPI).

Basic pip Usage

# Install a package
pip install requests

# Install a specific version
pip install requests==2.31.0

# Install minimum version
pip install "requests>=2.28.0"

# Upgrade a package
pip install --upgrade requests

# Uninstall a package
pip uninstall requests

# Show installed package info
pip show requests

# List all installed packages
pip list

requirements.txt

A requirements.txt file lists all the packages your project depends on, one per line. This is the standard way to share dependencies so anyone can recreate your environment.

# requirements.txt
requests==2.31.0
flask==3.0.0
sqlalchemy==2.0.23
pytest==7.4.3
python-dotenv==1.0.0
# Install all dependencies from requirements.txt
pip install -r requirements.txt

# Generate requirements.txt from currently installed packages
pip freeze > requirements.txt

Warning: Running pip freeze dumps every installed package, including transitive dependencies. For a cleaner approach, manually maintain your requirements.txt with only your direct dependencies and use tools like pip-compile (from pip-tools) to resolve the full dependency tree.

Virtual Environments

A virtual environment is an isolated Python environment with its own set of installed packages. Without virtual environments, all your projects share the same global Python installation, which leads to version conflicts: Project A needs requests==2.28, but Project B needs requests==2.31. Virtual environments solve this completely.

Creating and Using Virtual Environments

# Create a virtual environment named 'venv'
python3 -m venv venv

# Activate it (macOS / Linux)
source venv/bin/activate

# Activate it (Windows)
venv\Scripts\activate

# Your prompt changes to show the active environment
# (venv) $

# Now pip installs packages into the virtual environment only
pip install requests flask

# Verify isolation - packages are installed in the venv
pip list

# Deactivate when done
deactivate

Why Virtual Environments Matter

  • Dependency isolation: Each project gets its own set of packages at specific versions, preventing conflicts.
  • Reproducibility: Combined with requirements.txt, anyone can recreate the exact same environment.
  • Clean system Python: Your system Python installation stays clean and uncluttered.
  • Safe experimentation: You can install and test packages without affecting other projects.

Add venv to .gitignore

Never commit your virtual environment directory to version control. It contains platform-specific binaries and can be hundreds of megabytes. Instead, commit requirements.txt and let each developer create their own virtual environment.

# .gitignore
venv/
.venv/
env/
__pycache__/
*.pyc
.env

Package Management Best Practices

Pin Your Dependencies

Always specify exact versions in your requirements.txt for production deployments. Unpinned dependencies can break your application when a new version introduces a breaking change.

# BAD - unpinned, any version could be installed
requests
flask

# GOOD - pinned to exact versions
requests==2.31.0
flask==3.0.0

# ACCEPTABLE - minimum version constraints for libraries
requests>=2.28.0,<3.0.0
flask>=3.0.0,<4.0.0

Separate Development Dependencies

Keep your production and development dependencies separate. You do not need pytest or black on your production server.

# requirements.txt - production dependencies
requests==2.31.0
flask==3.0.0
sqlalchemy==2.0.23
gunicorn==21.2.0
# requirements-dev.txt - development dependencies
-r requirements.txt
pytest==7.4.3
black==23.12.0
flake8==6.1.0
mypy==1.7.1
# Install dev dependencies (includes production deps via -r)
pip install -r requirements-dev.txt

Using pip-compile for Dependency Resolution

The pip-tools package provides pip-compile, which resolves your dependencies and their transitive dependencies into a fully pinned requirements.txt.

# Install pip-tools
pip install pip-tools

# Create a requirements.in with your direct dependencies
# requirements.in
# flask
# sqlalchemy
# requests

# Compile to a fully resolved requirements.txt
pip-compile requirements.in

# The output requirements.txt will include all transitive
# dependencies with pinned versions and hash checking

Creating a Distributable Package

When you want to share your code as a reusable package that others can install with pip, you need a proper project structure with packaging metadata.

Modern Package Structure

my-awesome-package/
├── pyproject.toml          # Package metadata and build config
├── README.md
├── LICENSE
├── src/
│   └── my_package/
│       ├── __init__.py
│       ├── core.py
│       └── utils.py
├── tests/
│   ├── __init__.py
│   ├── test_core.py
│   └── test_utils.py
└── requirements.txt

pyproject.toml (Modern Approach)

The pyproject.toml file is the modern standard for Python project configuration. It replaces the older setup.py approach.

# pyproject.toml
[build-system]
requires = ["setuptools>=68.0", "wheel"]
build-backend = "setuptools.backends._legacy:_Backend"

[project]
name = "my-awesome-package"
version = "1.0.0"
description = "A short description of the package"
readme = "README.md"
license = {text = "MIT"}
requires-python = ">=3.9"
authors = [
    {name = "Folau Kaveinga", email = "folau@example.com"}
]
dependencies = [
    "requests>=2.28.0",
    "click>=8.0.0",
]

[project.optional-dependencies]
dev = [
    "pytest>=7.0.0",
    "black>=23.0.0",
]

setup.py (Legacy Approach)

You may still encounter setup.py in older projects. It serves the same purpose but uses imperative Python code instead of declarative TOML.

# setup.py
from setuptools import setup, find_packages

setup(
    name="my-awesome-package",
    version="1.0.0",
    packages=find_packages(where="src"),
    package_dir={"": "src"},
    install_requires=[
        "requests>=2.28.0",
        "click>=8.0.0",
    ],
    python_requires=">=3.9",
)

Building and Installing

# Build the package
python -m build

# Install in development mode (editable install)
pip install -e .

# Install with optional dev dependencies
pip install -e ".[dev]"

Practical Examples

Project Structure for a Web App

Here is a realistic structure for a Flask web application that demonstrates proper use of packages and modules.

webapp/
├── pyproject.toml
├── requirements.txt
├── requirements-dev.txt
├── .env
├── .gitignore
├── src/
│   └── webapp/
│       ├── __init__.py           # App factory
│       ├── config.py             # Configuration classes
│       ├── models/
│       │   ├── __init__.py
│       │   ├── user.py
│       │   └── product.py
│       ├── routes/
│       │   ├── __init__.py
│       │   ├── auth.py
│       │   └── api.py
│       ├── services/
│       │   ├── __init__.py
│       │   ├── email_service.py
│       │   └── payment_service.py
│       └── utils/
│           ├── __init__.py
│           ├── validators.py
│           └── formatters.py
├── tests/
│   ├── __init__.py
│   ├── conftest.py
│   ├── test_models/
│   ├── test_routes/
│   └── test_services/
└── scripts/
    ├── seed_db.py
    └── run_migrations.py
# src/webapp/__init__.py - App factory pattern
from flask import Flask
from .config import Config

def create_app(config_class=Config):
    app = Flask(__name__)
    app.config.from_object(config_class)

    # Register blueprints (route modules)
    from .routes.auth import auth_bp
    from .routes.api import api_bp
    app.register_blueprint(auth_bp, url_prefix="/auth")
    app.register_blueprint(api_bp, url_prefix="/api")

    return app
# src/webapp/config.py
import os
from pathlib import Path
from dotenv import load_dotenv

load_dotenv()

class Config:
    SECRET_KEY = os.environ.get("SECRET_KEY", "dev-secret-key")
    DATABASE_URL = os.environ.get("DATABASE_URL", "sqlite:///app.db")
    DEBUG = False

class DevelopmentConfig(Config):
    DEBUG = True

class ProductionConfig(Config):
    DEBUG = False
    SECRET_KEY = os.environ["SECRET_KEY"]  # must be set in production

Creating a Utility Module with Helper Functions

# src/webapp/utils/validators.py
import re
from typing import Optional

def validate_email(email: str) -> bool:
    """Validate email format."""
    pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
    return bool(re.match(pattern, email))

def validate_password(password: str) -> Optional[str]:
    """
    Validate password strength.
    Returns None if valid, error message if invalid.
    """
    if len(password) < 8:
        return "Password must be at least 8 characters"
    if not re.search(r'[A-Z]', password):
        return "Password must contain at least one uppercase letter"
    if not re.search(r'[a-z]', password):
        return "Password must contain at least one lowercase letter"
    if not re.search(r'\d', password):
        return "Password must contain at least one digit"
    return None

def validate_username(username: str) -> Optional[str]:
    """
    Validate username format.
    Returns None if valid, error message if invalid.
    """
    if len(username) < 3:
        return "Username must be at least 3 characters"
    if len(username) > 30:
        return "Username must be at most 30 characters"
    if not re.match(r'^[a-zA-Z0-9_]+$', username):
        return "Username can only contain letters, numbers, and underscores"
    return None
# src/webapp/utils/__init__.py
from .validators import validate_email, validate_password, validate_username
from .formatters import format_currency, format_date

__all__ = [
    'validate_email',
    'validate_password',
    'validate_username',
    'format_currency',
    'format_date',
]
# Using the utility module elsewhere in the project
from webapp.utils import validate_email, validate_password

email = "folau@example.com"
if validate_email(email):
    print(f"{email} is valid")

password = "MyStr0ngPass!"
error = validate_password(password)
if error:
    print(f"Invalid password: {error}")
else:
    print("Password is strong enough")

Managing Dependencies for a Project

# Step 1: Create and activate virtual environment
python3 -m venv venv
source venv/bin/activate

# Step 2: Install your project dependencies
pip install flask sqlalchemy requests python-dotenv

# Step 3: Install development tools
pip install pytest black flake8 mypy

# Step 4: Freeze production dependencies
pip freeze | grep -i "flask\|sqlalchemy\|requests\|dotenv\|werkzeug\|jinja2\|markupsafe\|click\|itsdangerous\|blinker\|greenlet\|typing-extensions\|certifi\|charset-normalizer\|idna\|urllib3" > requirements.txt

# Step 5: Create dev requirements
echo "-r requirements.txt" > requirements-dev.txt
echo "pytest==7.4.3" >> requirements-dev.txt
echo "black==23.12.0" >> requirements-dev.txt
echo "flake8==6.1.0" >> requirements-dev.txt
echo "mypy==1.7.1" >> requirements-dev.txt

# Step 6: Verify everything works
pip install -r requirements-dev.txt
pytest

Common Pitfalls

1. Circular Imports

Circular imports happen when module A imports module B, and module B imports module A. Python handles this partially, but it often leads to ImportError or AttributeError at runtime.

# models/user.py
from models.order import Order  # imports order module

class User:
    def get_orders(self):
        return Order.find_by_user(self.id)

# models/order.py
from models.user import User  # imports user module - CIRCULAR!

class Order:
    def get_user(self):
        return User.find_by_id(self.user_id)

Solutions:

  • Move the import inside the function that needs it (lazy import).
  • Restructure your code to break the circular dependency — often by creating a third module that both can import from.
  • Use TYPE_CHECKING for type hints that cause circular imports.
# Solution 1: Lazy import inside the function
class Order:
    def get_user(self):
        from models.user import User  # import here, not at the top
        return User.find_by_id(self.user_id)

# Solution 2: Use TYPE_CHECKING for type hints
from __future__ import annotations
from typing import TYPE_CHECKING

if TYPE_CHECKING:
    from models.user import User  # only imported during type checking, not at runtime

class Order:
    def get_user(self) -> "User":
        from models.user import User
        return User.find_by_id(self.user_id)

2. Name Shadowing

Creating a file with the same name as a standard library module will shadow it, causing confusing import errors.

# If you have a file named 'random.py' in your project:
import random  # This imports YOUR random.py, NOT the standard library!

random.randint(1, 10)  # AttributeError: module 'random' has no attribute 'randint'

Solution: Never name your files after standard library modules. Common offenders: random.py, email.py, test.py, string.py, collections.py, json.py. If you have already done this, rename your file and delete the corresponding __pycache__ directory.

3. Forgetting __init__.py

In Python 3, directories without __init__.py are treated as "namespace packages" — a feature designed for splitting a single logical package across multiple directories. This is almost never what you want. Without __init__.py, some tools (like pytest, mypy, and IDE auto-importers) may not recognize your directory as a package.

# BAD - missing __init__.py
utils/
├── helpers.py
└── formatters.py

# GOOD - proper package
utils/
├── __init__.py
├── helpers.py
└── formatters.py

4. Relative vs Absolute Import Confusion

Relative imports (from . import module) only work inside packages and fail when you run a file directly as a script.

# This fails:
python src/webapp/routes/auth.py
# ImportError: attempted relative import with no known parent package

# This works - run from the project root as a module:
python -m webapp.routes.auth

Best Practices

1. Prefer absolute imports: They are more readable and work regardless of where the script is run from. Use from webapp.utils import validate_email instead of from ..utils import validate_email.

2. Keep modules small and focused: A module with 2,000 lines of unrelated functions is hard to navigate. Split it into smaller, focused modules grouped by responsibility. A module named validators.py should contain validation logic, not database queries.

3. Use __all__ to define your public API: The __all__ list in a module or __init__.py explicitly declares which names are part of the public API. This controls what gets exported with from module import * and serves as documentation for other developers.

# utils/validators.py
__all__ = ['validate_email', 'validate_password']

def validate_email(email):
    ...

def validate_password(password):
    ...

def _internal_helper():
    """Not exported - underscore prefix signals 'private'."""
    ...

4. Always use virtual environments: Every project should have its own virtual environment. No exceptions. It takes 10 seconds to set up and saves hours of debugging dependency conflicts.

5. Structure imports consistently: Follow PEP 8 import ordering — standard library imports first, then third-party packages, then local imports, with a blank line between each group.

# Standard library
import os
import sys
from datetime import datetime
from pathlib import Path

# Third-party packages
import requests
from flask import Flask, jsonify
from sqlalchemy import create_engine

# Local imports
from webapp.utils import validate_email
from webapp.models.user import User

6. Avoid import side effects: Importing a module should not perform heavy operations like connecting to a database, making HTTP requests, or writing to files. Move such operations into functions that are called explicitly.

7. Document your package structure: For larger projects, include a brief description of each package and module in the project README or in the package's __init__.py docstring.

Key Takeaways

  • A module is any .py file. A package is a directory with an __init__.py file containing modules.
  • Use import module for namespace clarity, from module import name for convenience. Avoid import * in production code.
  • The __name__ == "__main__" guard lets a file serve as both a module and a runnable script.
  • Python's standard library is vast — learn modules like pathlib, json, collections, itertools, and functools to write more Pythonic code.
  • Use pip to install third-party packages and requirements.txt to track dependencies.
  • Virtual environments are non-negotiable for professional Python development — use python -m venv for every project.
  • Pin your dependency versions for reproducible deployments. Separate production and development dependencies.
  • Use pyproject.toml for new packages — it is the modern standard replacing setup.py.
  • Watch out for circular imports, name shadowing, and missing __init__.py files — they are the most common module-related bugs.
  • Follow PEP 8 import ordering: standard library, third-party, local — with blank lines between groups.
  • Keep modules small and focused, use __all__ to define your public API, and prefer absolute imports over relative ones.
March 15, 2021

Python – Decorators

If you have spent any time reading Python code — whether it is a Flask web app, a Django project, or a well-tested library — you have seen the @ symbol sitting above function definitions. That is a decorator, and it is one of the most powerful and elegant features in the Python language. Decorators let you modify or extend the behavior of functions and classes without changing their source code. They are the backbone of cross-cutting concerns like logging, authentication, caching, rate limiting, and input validation. Once you truly understand decorators, you will write cleaner, more reusable, and more Pythonic code.

In this tutorial, we will build decorators from the ground up — starting with the prerequisite concepts, moving through simple and advanced patterns, and finishing with real-world examples you can drop into production code today.

Prerequisites: First-Class Functions and Closures

Before we dive into decorators, you need to be comfortable with two foundational concepts: first-class functions and closures. If you have read the Python – Function tutorial, you already know that Python functions are first-class objects. Here is a quick recap.

First-class functions mean you can assign functions to variables, pass them as arguments, and return them from other functions — just like any other value.

def greet(name):
    return f"Hello, {name}!"

# Assign to a variable
say_hello = greet
print(say_hello("Folau"))  # Hello, Folau!

# Pass as an argument
def call_func(func, arg):
    return func(arg)

print(call_func(greet, "World"))  # Hello, World!

A closure is a function that remembers the variables from the enclosing scope even after that scope has finished executing. This is what makes decorators possible.

def make_greeter(greeting):
    def greeter(name):
        return f"{greeting}, {name}!"
    return greeter

hello = make_greeter("Hello")
good_morning = make_greeter("Good morning")

print(hello("Folau"))         # Hello, Folau!
print(good_morning("Folau"))  # Good morning, Folau!

The inner function greeter “closes over” the greeting variable. Even after make_greeter returns, the inner function retains access to greeting. This is exactly the mechanism decorators rely on.

Your First Decorator

A decorator is simply a function that takes another function as its argument, wraps it with additional behavior, and returns the wrapper. Let us build one step by step.

def my_decorator(func):
    def wrapper():
        print("Something is happening before the function is called.")
        func()
        print("Something is happening after the function is called.")
    return wrapper

def say_hello():
    print("Hello!")

# Manually apply the decorator
say_hello = my_decorator(say_hello)
say_hello()
# Output:
# Something is happening before the function is called.
# Hello!
# Something is happening after the function is called.

Here is what happens: my_decorator receives the original say_hello function, defines a wrapper that adds behavior before and after calling func(), and returns that wrapper. When we reassign say_hello = my_decorator(say_hello), the name say_hello now points to wrapper. Every subsequent call to say_hello() runs the wrapper code.

The @ Syntax

Writing say_hello = my_decorator(say_hello) every time is verbose. Python provides syntactic sugar with the @ symbol. The following two approaches are identical.

# Without @ syntax
def say_hello():
    print("Hello!")
say_hello = my_decorator(say_hello)

# With @ syntax (identical behavior)
@my_decorator
def say_hello():
    print("Hello!")

The @my_decorator line is just shorthand. When Python sees it, it calls my_decorator(say_hello) and rebinds the name say_hello to whatever the decorator returns. Clean, readable, and Pythonic.

Of course, most real functions accept arguments. A proper decorator must handle arbitrary arguments using *args and **kwargs.

def my_decorator(func):
    def wrapper(*args, **kwargs):
        print(f"Calling {func.__name__}")
        result = func(*args, **kwargs)
        print(f"{func.__name__} returned {result}")
        return result
    return wrapper

@my_decorator
def add(a, b):
    return a + b

print(add(3, 5))
# Output:
# Calling add
# add returned 8
# 8

By accepting *args and **kwargs, the wrapper forwards any positional and keyword arguments to the original function. Always capture and return the result of func(*args, **kwargs) — otherwise you will silently swallow the return value.

functools.wraps: Preserving Function Identity

There is a subtle problem with our decorator. After decoration, the function’s __name__, __doc__, and other metadata point to the wrapper, not the original function.

def my_decorator(func):
    def wrapper(*args, **kwargs):
        return func(*args, **kwargs)
    return wrapper

@my_decorator
def say_hello():
    """Greet the user."""
    print("Hello!")

print(say_hello.__name__)  # wrapper  (not 'say_hello'!)
print(say_hello.__doc__)   # None     (not 'Greet the user.'!)

This breaks introspection, help() output, debugging tools, and any framework that relies on function names (like Flask route registration). The fix is functools.wraps, which copies the original function’s metadata onto the wrapper.

import functools

def my_decorator(func):
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        return func(*args, **kwargs)
    return wrapper

@my_decorator
def say_hello():
    """Greet the user."""
    print("Hello!")

print(say_hello.__name__)  # say_hello
print(say_hello.__doc__)   # Greet the user.

Always use @functools.wraps(func) in every decorator you write. This is non-negotiable. It preserves __name__, __doc__, __module__, __qualname__, __dict__, and __wrapped__ (which gives access to the original unwrapped function).

Decorators with Arguments

Sometimes you need to configure a decorator. For example, you might want a retry decorator where you specify the number of retries, or a logging decorator where you specify the log level. This requires an extra layer of nesting — a function that returns a decorator.

import functools

def repeat(n):
    """Decorator that calls the function n times."""
    def decorator(func):
        @functools.wraps(func)
        def wrapper(*args, **kwargs):
            result = None
            for _ in range(n):
                result = func(*args, **kwargs)
            return result
        return wrapper
    return decorator

@repeat(3)
def say_hello(name):
    print(f"Hello, {name}!")

say_hello("Folau")
# Output:
# Hello, Folau!
# Hello, Folau!
# Hello, Folau!

Here is the flow: repeat(3) is called first and returns decorator. Then Python calls decorator(say_hello), which returns wrapper. The name say_hello is rebound to wrapper. The triple nesting — outer function, decorator, wrapper — is the standard pattern for parameterized decorators.

Another practical example: a decorator that controls the log level.

import functools
import logging

def log_calls(level=logging.INFO):
    """Decorator that logs function calls at the specified level."""
    def decorator(func):
        @functools.wraps(func)
        def wrapper(*args, **kwargs):
            logger = logging.getLogger(func.__module__)
            logger.log(level, f"Calling {func.__name__} with args={args}, kwargs={kwargs}")
            result = func(*args, **kwargs)
            logger.log(level, f"{func.__name__} returned {result}")
            return result
        return wrapper
    return decorator

@log_calls(level=logging.DEBUG)
def process_data(data):
    return [x * 2 for x in data]

Class-Based Decorators

You can also implement decorators as classes by defining the __call__ method. This is useful when the decorator needs to maintain state across calls or when the logic is complex enough that a class provides better organization.

import functools

class CountCalls:
    """Decorator that counts how many times a function is called."""

    def __init__(self, func):
        functools.update_wrapper(self, func)
        self.func = func
        self.call_count = 0

    def __call__(self, *args, **kwargs):
        self.call_count += 1
        print(f"{self.func.__name__} has been called {self.call_count} time(s)")
        return self.func(*args, **kwargs)

@CountCalls
def say_hello(name):
    print(f"Hello, {name}!")

say_hello("Folau")
say_hello("World")
say_hello("Python")
# Output:
# say_hello has been called 1 time(s)
# Hello, Folau!
# say_hello has been called 2 time(s)
# Hello, World!
# say_hello has been called 3 time(s)
# Hello, Python!

print(say_hello.call_count)  # 3

Notice we use functools.update_wrapper(self, func) in __init__ instead of @functools.wraps (which is designed for functions, not classes). The effect is the same — it copies over __name__, __doc__, and other attributes.

Class-based decorators with arguments require a slightly different pattern:

import functools

class RateLimit:
    """Decorator that limits how often a function can be called."""

    def __init__(self, max_calls, period=60):
        self.max_calls = max_calls
        self.period = period
        self.calls = []

    def __call__(self, func):
        @functools.wraps(func)
        def wrapper(*args, **kwargs):
            import time
            now = time.time()
            # Remove calls outside the time window
            self.calls = [t for t in self.calls if now - t < self.period]
            if len(self.calls) >= self.max_calls:
                raise RuntimeError(
                    f"Rate limit exceeded: {self.max_calls} calls per {self.period}s"
                )
            self.calls.append(now)
            return func(*args, **kwargs)
        return wrapper

@RateLimit(max_calls=5, period=60)
def api_request(endpoint):
    print(f"Requesting {endpoint}")
    return {"status": "ok"}

When the decorator takes arguments (@RateLimit(max_calls=5, period=60)), __init__ receives the arguments and __call__ receives the function. When there are no arguments (@CountCalls), __init__ receives the function directly.

Built-in Decorators

Python ships with several decorators that you should know and use regularly.

@property

Turns a method into a read-only attribute, enabling getter/setter patterns without changing the calling syntax.

class Circle:
    def __init__(self, radius):
        self._radius = radius

    @property
    def radius(self):
        return self._radius

    @radius.setter
    def radius(self, value):
        if value < 0:
            raise ValueError("Radius cannot be negative")
        self._radius = value

    @property
    def area(self):
        import math
        return math.pi * self._radius ** 2

c = Circle(5)
print(c.radius)     # 5
print(c.area)       # 78.5398...
c.radius = 10       # Uses the setter
print(c.area)       # 314.1592...
# c.radius = -1     # Raises ValueError

@classmethod and @staticmethod

@classmethod receives the class as its first argument instead of an instance. It is commonly used for alternative constructors. @staticmethod does not receive the instance or the class — it is just a regular function namespaced inside the class.

class User:
    def __init__(self, name, email):
        self.name = name
        self.email = email

    @classmethod
    def from_dict(cls, data):
        """Alternative constructor from a dictionary."""
        return cls(data["name"], data["email"])

    @classmethod
    def from_string(cls, user_string):
        """Alternative constructor from 'name:email' format."""
        name, email = user_string.split(":")
        return cls(name.strip(), email.strip())

    @staticmethod
    def is_valid_email(email):
        """Validate email format (no instance or class needed)."""
        return "@" in email and "." in email

# Using class methods
user1 = User.from_dict({"name": "Folau", "email": "folau@example.com"})
user2 = User.from_string("Folau : folau@example.com")
print(user1.name)  # Folau
print(user2.name)  # Folau

# Using static method
print(User.is_valid_email("folau@example.com"))  # True
print(User.is_valid_email("invalid"))             # False

@functools.lru_cache

Caches the return values of a function based on its arguments. This is incredibly useful for expensive computations or recursive algorithms.

import functools

@functools.lru_cache(maxsize=128)
def fibonacci(n):
    if n < 2:
        return n
    return fibonacci(n - 1) + fibonacci(n - 2)

# Without caching, this would take exponential time
print(fibonacci(50))  # 12586269025
print(fibonacci(100)) # 354224848179261915075

# Inspect cache statistics
print(fibonacci.cache_info())
# CacheInfo(hits=98, misses=101, maxsize=128, currsize=101)

Since Python 3.9, you can also use @functools.cache as a simpler unbounded cache (equivalent to @lru_cache(maxsize=None)).

Stacking Decorators

You can apply multiple decorators to a single function by stacking them. The decorators are applied bottom-up (the one closest to the function runs first), but they execute top-down when the function is called.

import functools

def bold(func):
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        return f"<b>{func(*args, **kwargs)}</b>"
    return wrapper

def italic(func):
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        return f"<i>{func(*args, **kwargs)}</i>"
    return wrapper

@bold
@italic
def greet(name):
    return f"Hello, {name}"

print(greet("Folau"))
# Output: <b><i>Hello, Folau</i></b>

This is equivalent to greet = bold(italic(greet)). The italic decorator wraps the original function first, then bold wraps the result. When you call greet("Folau"), execution flows through bold's wrapper, then italic's wrapper, then the original function.

A more practical example: combining a timer and a logger.

import functools
import time

def timer(func):
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        start = time.perf_counter()
        result = func(*args, **kwargs)
        elapsed = time.perf_counter() - start
        print(f"{func.__name__} took {elapsed:.4f} seconds")
        return result
    return wrapper

def logger(func):
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        print(f"[LOG] Calling {func.__name__}({args}, {kwargs})")
        result = func(*args, **kwargs)
        print(f"[LOG] {func.__name__} returned {result}")
        return result
    return wrapper

@timer
@logger
def compute_sum(n):
    """Compute the sum of numbers from 0 to n."""
    return sum(range(n + 1))

compute_sum(1000000)
# Output:
# [LOG] Calling compute_sum((1000000,), {})
# [LOG] compute_sum returned 500000500000
# compute_sum took 0.0312 seconds

The order matters. Here, logger runs inside timer, so the timer measures both the logging overhead and the function execution. If you swap them, timer would run inside logger, and the logged result would include the timing output.

Practical Decorator Examples

Now let us build decorators you will actually use in real projects. Each one solves a common cross-cutting concern.

Timer Decorator

Measures how long a function takes to execute. Essential for performance profiling.

import functools
import time

def timer(func):
    """Print the execution time of the decorated function."""
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        start_time = time.perf_counter()
        result = func(*args, **kwargs)
        end_time = time.perf_counter()
        elapsed = end_time - start_time
        print(f"[TIMER] {func.__name__} executed in {elapsed:.6f} seconds")
        return result
    return wrapper

@timer
def slow_computation(n):
    """Simulate a slow computation."""
    total = 0
    for i in range(n):
        total += i ** 2
    return total

result = slow_computation(1_000_000)
# [TIMER] slow_computation executed in 0.142356 seconds
print(result)

Logger Decorator

Automatically logs every function call with its arguments and return value.

import functools
import logging

logging.basicConfig(level=logging.DEBUG)

def log_calls(func):
    """Log function calls, arguments, and return values."""
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        args_repr = [repr(a) for a in args]
        kwargs_repr = [f"{k}={v!r}" for k, v in kwargs.items()]
        signature = ", ".join(args_repr + kwargs_repr)
        logging.info(f"Calling {func.__name__}({signature})")
        try:
            result = func(*args, **kwargs)
            logging.info(f"{func.__name__} returned {result!r}")
            return result
        except Exception as e:
            logging.exception(f"{func.__name__} raised {type(e).__name__}: {e}")
            raise
    return wrapper

@log_calls
def divide(a, b):
    return a / b

divide(10, 3)   # INFO: Calling divide(10, 3)
                # INFO: divide returned 3.3333333333333335
divide(10, 0)   # INFO: Calling divide(10, 0)
                # ERROR: divide raised ZeroDivisionError: division by zero

Retry Decorator with Exponential Backoff

Retries a function on failure with increasing wait times. Perfect for network calls, API requests, and database connections.

import functools
import time
import random

def retry(max_retries=3, base_delay=1, backoff_factor=2, exceptions=(Exception,)):
    """Retry a function with exponential backoff on failure."""
    def decorator(func):
        @functools.wraps(func)
        def wrapper(*args, **kwargs):
            last_exception = None
            for attempt in range(max_retries + 1):
                try:
                    return func(*args, **kwargs)
                except exceptions as e:
                    last_exception = e
                    if attempt < max_retries:
                        # Exponential backoff with jitter
                        delay = base_delay * (backoff_factor ** attempt)
                        jitter = random.uniform(0, delay * 0.1)
                        wait_time = delay + jitter
                        print(
                            f"[RETRY] {func.__name__} failed (attempt {attempt + 1}/{max_retries}): {e}"
                            f" -- retrying in {wait_time:.2f}s"
                        )
                        time.sleep(wait_time)
                    else:
                        print(
                            f"[RETRY] {func.__name__} failed after {max_retries + 1} attempts"
                        )
            raise last_exception
        return wrapper
    return decorator

@retry(max_retries=3, base_delay=1, exceptions=(ConnectionError, TimeoutError))
def fetch_data(url):
    """Simulate an unreliable network call."""
    import random
    if random.random() < 0.7:
        raise ConnectionError("Connection refused")
    return {"data": "success", "url": url}

# May succeed or fail depending on random chance
# result = fetch_data("https://api.example.com/data")

Authentication/Authorization Decorator

Checks if a user is authenticated and authorized before allowing access to a function.

import functools

def require_auth(role=None):
    """Decorator that checks authentication and optional role-based authorization."""
    def decorator(func):
        @functools.wraps(func)
        def wrapper(user, *args, **kwargs):
            # Check authentication
            if not user.get("authenticated", False):
                raise PermissionError(f"Authentication required for {func.__name__}")

            # Check authorization (role)
            if role and user.get("role") != role:
                raise PermissionError(
                    f"Role '{role}' required for {func.__name__}. "
                    f"Current role: '{user.get('role')}'"
                )

            return func(user, *args, **kwargs)
        return wrapper
    return decorator

@require_auth(role="admin")
def delete_user(current_user, user_id):
    print(f"User {user_id} deleted by {current_user['name']}")
    return True

@require_auth()
def view_profile(current_user):
    print(f"Viewing profile of {current_user['name']}")
    return current_user

# Authenticated admin -- works
admin = {"name": "Folau", "authenticated": True, "role": "admin"}
delete_user(admin, user_id=42)
# Output: User 42 deleted by Folau

# Authenticated but wrong role -- raises PermissionError
viewer = {"name": "Guest", "authenticated": True, "role": "viewer"}
try:
    delete_user(viewer, user_id=42)
except PermissionError as e:
    print(e)  # Role 'admin' required for delete_user. Current role: 'viewer'

# Not authenticated -- raises PermissionError
anonymous = {"name": "Anon", "authenticated": False}
try:
    view_profile(anonymous)
except PermissionError as e:
    print(e)  # Authentication required for view_profile

Memoization/Caching Decorator

Caches function results to avoid redundant computations. This is a simplified version of functools.lru_cache to show how caching works under the hood.

import functools

def memoize(func):
    """Cache function results based on arguments."""
    cache = {}

    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        # Create a hashable key from args and kwargs
        key = (args, tuple(sorted(kwargs.items())))
        if key not in cache:
            cache[key] = func(*args, **kwargs)
        return cache[key]

    # Expose cache for inspection and clearing
    wrapper.cache = cache
    wrapper.clear_cache = cache.clear
    return wrapper

@memoize
def expensive_computation(n):
    """Simulate an expensive computation."""
    print(f"Computing for n={n}...")
    import time
    time.sleep(1)  # Simulate slow operation
    return sum(i ** 2 for i in range(n))

# First call -- computes and caches
result1 = expensive_computation(1000)  # Computing for n=1000...

# Second call -- returns cached result instantly
result2 = expensive_computation(1000)  # No output -- cached!

print(result1 == result2)  # True
print(f"Cache size: {len(expensive_computation.cache)}")  # 1

# Clear cache when needed
expensive_computation.clear_cache()

Rate Limiter Decorator

Prevents a function from being called more than a specified number of times within a time window. Essential for API clients.

import functools
import time
from collections import deque

def rate_limit(max_calls, period=60):
    """Limit function calls to max_calls within period seconds."""
    def decorator(func):
        call_times = deque()

        @functools.wraps(func)
        def wrapper(*args, **kwargs):
            now = time.time()

            # Remove timestamps outside the current window
            while call_times and now - call_times[0] >= period:
                call_times.popleft()

            if len(call_times) >= max_calls:
                wait_time = period - (now - call_times[0])
                raise RuntimeError(
                    f"Rate limit exceeded for {func.__name__}. "
                    f"Try again in {wait_time:.1f} seconds."
                )

            call_times.append(now)
            return func(*args, **kwargs)
        return wrapper
    return decorator

@rate_limit(max_calls=3, period=10)
def call_api(endpoint):
    print(f"Calling {endpoint}")
    return {"status": "ok"}

# These three calls succeed
call_api("/users")    # Calling /users
call_api("/posts")    # Calling /posts
call_api("/comments") # Calling /comments

# This fourth call within 10 seconds raises RuntimeError
try:
    call_api("/tags")
except RuntimeError as e:
    print(e)  # Rate limit exceeded for call_api. Try again in 9.8 seconds.

Input Validation Decorator

Validates function arguments against expected types and custom rules before the function executes.

import functools
import inspect

def validate_types(**expected_types):
    """Validate that function arguments match the specified types."""
    def decorator(func):
        sig = inspect.signature(func)

        @functools.wraps(func)
        def wrapper(*args, **kwargs):
            bound = sig.bind(*args, **kwargs)
            bound.apply_defaults()

            for param_name, value in bound.arguments.items():
                if param_name in expected_types:
                    expected = expected_types[param_name]
                    if not isinstance(value, expected):
                        raise TypeError(
                            f"Argument '{param_name}' must be {expected.__name__}, "
                            f"got {type(value).__name__}"
                        )
            return func(*args, **kwargs)
        return wrapper
    return decorator

@validate_types(name=str, age=int, email=str)
def create_user(name, age, email):
    return {"name": name, "age": age, "email": email}

# Valid call
user = create_user("Folau", 30, "folau@example.com")
print(user)  # {'name': 'Folau', 'age': 30, 'email': 'folau@example.com'}

# Invalid call -- raises TypeError
try:
    create_user("Folau", "thirty", "folau@example.com")
except TypeError as e:
    print(e)  # Argument 'age' must be int, got str

You can also build more sophisticated validators that check ranges, patterns, or custom predicates.

import functools

def validate(rules):
    """Validate arguments using custom rule functions."""
    def decorator(func):
        @functools.wraps(func)
        def wrapper(*args, **kwargs):
            # Combine args with parameter names
            import inspect
            sig = inspect.signature(func)
            bound = sig.bind(*args, **kwargs)
            bound.apply_defaults()

            for param_name, check in rules.items():
                if param_name in bound.arguments:
                    value = bound.arguments[param_name]
                    is_valid, message = check(value)
                    if not is_valid:
                        raise ValueError(f"Invalid '{param_name}': {message}")
            return func(*args, **kwargs)
        return wrapper
    return decorator

# Define validation rules
def positive_number(value):
    return (value > 0, f"must be positive, got {value}")

def non_empty_string(value):
    return (isinstance(value, str) and len(value.strip()) > 0, "must be a non-empty string")

@validate({
    "amount": positive_number,
    "currency": non_empty_string,
})
def process_payment(amount, currency, description=""):
    print(f"Processing {currency} {amount}: {description}")
    return True

process_payment(99.99, "USD", description="Order #123")
# Processing USD 99.99: Order #123

try:
    process_payment(-50, "USD")
except ValueError as e:
    print(e)  # Invalid 'amount': must be positive, got -50

Decorators in the Real World

Decorators are not just an academic exercise. They are used extensively in Python's most popular frameworks and libraries.

Flask Routes

Flask uses decorators to map URL routes to handler functions.

from flask import Flask

app = Flask(__name__)

@app.route("/")
def home():
    return "Welcome to the homepage!"

@app.route("/users/<int:user_id>", methods=["GET"])
def get_user(user_id):
    return f"User {user_id}"

@app.route("/api/data", methods=["POST"])
def create_data():
    return {"status": "created"}, 201

Under the hood, @app.route("/") is a parameterized decorator. It registers the function in Flask's URL routing table.

Django Views

Django provides decorators for authentication, HTTP method enforcement, and caching.

from django.contrib.auth.decorators import login_required
from django.views.decorators.http import require_http_methods
from django.views.decorators.cache import cache_page

@login_required
@require_http_methods(["GET"])
@cache_page(60 * 15)  # Cache for 15 minutes
def dashboard(request):
    return render(request, "dashboard.html")

Pytest Fixtures and Parametrize

Pytest uses decorators for test parametrization and marking.

import pytest

@pytest.fixture
def sample_user():
    return {"name": "Folau", "email": "folau@example.com"}

@pytest.mark.parametrize("input_val,expected", [
    (1, 1),
    (2, 4),
    (3, 9),
    (4, 16),
])
def test_square(input_val, expected):
    assert input_val ** 2 == expected

@pytest.mark.slow
def test_large_dataset():
    # This test takes a long time to run
    pass

Common Pitfalls

Even experienced Python developers trip over these issues with decorators. Knowing them in advance will save you hours of debugging.

1. Forgetting functools.wraps

This is the most common mistake. Without @functools.wraps(func), the decorated function loses its identity.

# BAD -- no functools.wraps
def bad_decorator(func):
    def wrapper(*args, **kwargs):
        return func(*args, **kwargs)
    return wrapper

@bad_decorator
def my_function():
    """My function's docstring."""
    pass

print(my_function.__name__)  # wrapper (wrong!)
print(my_function.__doc__)   # None    (wrong!)
help(my_function)            # Shows wrapper's help, not my_function's

# GOOD -- always use functools.wraps
import functools

def good_decorator(func):
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        return func(*args, **kwargs)
    return wrapper

@good_decorator
def my_function():
    """My function's docstring."""
    pass

print(my_function.__name__)  # my_function (correct!)
print(my_function.__doc__)   # My function's docstring. (correct!)

2. Incorrect Decorator Order

When stacking decorators, order matters. The decorator closest to the function is applied first.

import functools
import time

def timer(func):
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        start = time.perf_counter()
        result = func(*args, **kwargs)
        print(f"Time: {time.perf_counter() - start:.4f}s")
        return result
    return wrapper

def require_login(func):
    @functools.wraps(func)
    def wrapper(user, *args, **kwargs):
        if not user.get("authenticated"):
            raise PermissionError("Login required")
        return func(user, *args, **kwargs)
    return wrapper

# CORRECT order: check auth BEFORE timing
@timer
@require_login
def get_dashboard(user):
    time.sleep(0.1)
    return "Dashboard data"

# WRONG order: timing includes auth check overhead
@require_login
@timer
def get_dashboard_wrong(user):
    time.sleep(0.1)
    return "Dashboard data"

Think about it like layers of an onion. The outermost decorator runs first when the function is called. Put cross-cutting concerns like timing and logging on the outside, and domain-specific checks like authentication closer to the function.

3. Decorating Methods vs Functions

When decorating instance methods, remember that self is passed as the first argument. Your wrapper must handle it correctly through *args.

import functools

def log_method(func):
    @functools.wraps(func)
    def wrapper(*args, **kwargs):  # 'self' is captured in *args
        print(f"Calling {func.__qualname__}")
        return func(*args, **kwargs)
    return wrapper

class UserService:
    @log_method
    def get_user(self, user_id):
        return {"id": user_id, "name": "Folau"}

service = UserService()
service.get_user(42)  # Calling UserService.get_user

If your decorator explicitly names the first parameter (e.g., def wrapper(request, ...)), it will break when applied to a method because self will be passed as request. Always use *args, **kwargs to keep decorators generic.

4. Not Returning the Function Result

A decorator that forgets to return func(*args, **kwargs) will cause the decorated function to always return None.

# BAD -- missing return
def bad_timer(func):
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        start = time.perf_counter()
        func(*args, **kwargs)  # Result is discarded!
        print(f"Time: {time.perf_counter() - start:.4f}s")
        # No return statement -- returns None!
    return wrapper

# GOOD -- always return the result
def good_timer(func):
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        start = time.perf_counter()
        result = func(*args, **kwargs)  # Capture result
        print(f"Time: {time.perf_counter() - start:.4f}s")
        return result  # Return it!
    return wrapper

Best Practices

1. Always use @functools.wraps(func): This preserves the original function's metadata. There is no excuse for skipping it.

2. Keep decorators simple and focused: A decorator should do one thing. If you need logging and authentication and caching, write three separate decorators and stack them. This follows the Single Responsibility Principle.

3. Accept *args and **kwargs: Always use *args and **kwargs in your wrapper function so the decorator works with any function signature.

4. Return the wrapped function's result: Always capture and return func(*args, **kwargs). Forgetting this is a silent bug that causes decorated functions to return None.

5. Document your decorator's behavior: Add a docstring to the decorator explaining what it does, what arguments it accepts (if parameterized), and any side effects. Someone reading @retry(max_retries=3) should be able to look at the decorator's docstring and immediately understand what will happen.

6. Test decorators independently: Write unit tests for your decorators separate from the functions they decorate. You can access the original function via __wrapped__ (provided by functools.wraps) when you need to test the undecorated version.

# Access the original function through __wrapped__
@my_decorator
def original_function():
    return 42

# Test the decorator's behavior
assert original_function() is not None

# Test the original function without the decorator
assert original_function.__wrapped__() == 42

7. Be careful with stateful decorators: If your decorator maintains state (like a counter or cache), be aware that the state is shared across all calls. This can cause issues in multi-threaded applications. Use threading.Lock if thread safety is required.

8. Prefer function-based decorators for simplicity: Use class-based decorators only when you need to maintain significant state or when the logic is complex enough to benefit from class organization. For most use cases, function-based decorators are clearer.

Key Takeaways

  • A decorator is a function that takes a function, adds behavior, and returns a modified function. The @ syntax is just syntactic sugar.
  • Decorators rely on two Python features: first-class functions (functions as objects) and closures (inner functions remembering enclosing scope).
  • Always use @functools.wraps(func) in every decorator to preserve the original function's __name__, __doc__, and other metadata.
  • Parameterized decorators require three levels of nesting: the outer function takes arguments, returns a decorator, which returns a wrapper.
  • Class-based decorators use __call__ and are best when you need to maintain state across calls.
  • Python includes powerful built-in decorators: @property, @classmethod, @staticmethod, and @functools.lru_cache.
  • When stacking decorators, they are applied bottom-up but execute top-down. Order matters.
  • Common practical decorators include timers, loggers, retry logic, authentication, caching, rate limiting, and input validation.
  • Major frameworks like Flask, Django, and pytest use decorators extensively — understanding them is essential for working with these tools.
  • Watch out for common pitfalls: forgetting wraps, wrong decorator order, not returning results, and issues with methods vs functions.
  • Keep decorators simple, focused, and well-documented. Stack multiple simple decorators rather than building one monolithic decorator.

Source code on Github

March 14, 2021

Python – Lambda Functions

In the previous tutorial on Python Functions, we briefly touched on lambda functions. Now it is time to go deep. Lambda functions — also called anonymous functions — are one of Python’s most concise and expressive features. They let you define a small, throwaway function in a single expression, right where you need it, without the ceremony of a full def block. You will encounter them constantly in production code: as sort keys, filter predicates, map transformations, callback handlers, and more.

The key insight is this: a lambda is not a different kind of function. It is simply a syntactic shorthand for defining a function object inline. Under the hood, Python treats lambda functions identically to named functions — they are first-class objects, they create closures, and they follow the same scoping rules. The difference is purely in how you write them.

In this tutorial, we will explore every facet of lambda functions — syntax, use cases, integration with built-in functions, practical patterns, common pitfalls, and when you should reach for alternatives instead. By the end, you will know exactly when a lambda is the right tool and when it is not.

Lambda Syntax

The syntax for a lambda function is straightforward.

lambda arguments: expression

There are three parts: the lambda keyword, zero or more comma-separated arguments, and a single expression that is evaluated and returned. There is no return statement — the expression’s result is implicitly returned. There is no function name — hence “anonymous.”

# A lambda that doubles a number
double = lambda x: x * 2
print(double(5))   # 10

# A lambda with no arguments
get_pi = lambda: 3.14159
print(get_pi())    # 3.14159

# A lambda with multiple arguments
add = lambda a, b: a + b
print(add(3, 7))   # 10

Notice that assigning a lambda to a variable (like double = lambda x: x * 2) is technically discouraged by PEP 8. If you need to give a function a name, use def. The real power of lambdas is using them inline, as we will see throughout this tutorial.

Lambda vs Regular Functions

Let us compare lambdas and regular functions side by side so the trade-offs are clear.

Feature Lambda Function Regular Function (def)
Syntax lambda args: expr def name(args): ...
Name Anonymous (shown as <lambda>) Named (shown in tracebacks)
Body Single expression only Multiple statements allowed
Return Implicit (expression result) Explicit return required
Docstrings Not supported Fully supported
Type Hints Not supported Fully supported
Decorators Cannot be decorated directly Can be decorated
Readability Best for short, simple logic Best for anything complex
Debugging Harder (no name in stack traces) Easier (name appears in stack traces)
Reusability Designed for one-off use Designed for reuse

Here is the same logic written both ways.

# Regular function
def square(x):
    return x * x

# Equivalent lambda
square_lambda = lambda x: x * x

# Both produce the same result
print(square(4))         # 16
print(square_lambda(4))  # 16

# But check the __name__ attribute
print(square.__name__)         # square
print(square_lambda.__name__)  # <lambda>

Rule of thumb: Use a lambda when the function is so simple that giving it a name would add more noise than clarity. Use def for everything else.

Using Lambda with Built-in Functions

This is where lambda functions earn their keep. Python’s built-in higher-order functions — sorted(), map(), filter(), min(), max() — all accept a function argument, and lambda is the most concise way to provide one inline.

sorted() with key parameter

The key parameter of sorted() accepts a function that extracts a comparison key from each element.

# Sort strings by length
words = ["python", "is", "a", "powerful", "language"]
sorted_by_length = sorted(words, key=lambda w: len(w))
print(sorted_by_length)
# ['a', 'is', 'python', 'powerful', 'language']

# Sort tuples by second element
students = [("Alice", 88), ("Bob", 95), ("Charlie", 72)]
sorted_by_grade = sorted(students, key=lambda s: s[1], reverse=True)
print(sorted_by_grade)
# [('Bob', 95), ('Alice', 88), ('Charlie', 72)]

# Case-insensitive sort
names = ["charlie", "Alice", "bob", "David"]
sorted_names = sorted(names, key=lambda n: n.lower())
print(sorted_names)
# ['Alice', 'bob', 'charlie', 'David']

map() with lambda

map() applies a function to every item in an iterable and returns an iterator of results.

# Square every number
numbers = [1, 2, 3, 4, 5]
squared = list(map(lambda x: x ** 2, numbers))
print(squared)  # [1, 4, 9, 16, 25]

# Convert temperatures from Celsius to Fahrenheit
celsius = [0, 20, 37, 100]
fahrenheit = list(map(lambda c: round(c * 9/5 + 32, 1), celsius))
print(fahrenheit)  # [32.0, 68.0, 98.6, 212.0]

# Extract keys from a list of dicts
users = [{"name": "Alice"}, {"name": "Bob"}, {"name": "Charlie"}]
names = list(map(lambda u: u["name"], users))
print(names)  # ['Alice', 'Bob', 'Charlie']

Note that list comprehensions are often more Pythonic than map() with a lambda. The equivalent of the first example is [x ** 2 for x in numbers]. Use whichever reads more clearly in context.

filter() with lambda

filter() returns an iterator of elements for which the function returns True.

# Keep only even numbers
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
evens = list(filter(lambda x: x % 2 == 0, numbers))
print(evens)  # [2, 4, 6, 8, 10]

# Filter out empty strings
data = ["hello", "", "world", "", "python", ""]
non_empty = list(filter(lambda s: s, data))
print(non_empty)  # ['hello', 'world', 'python']

# Keep only adults
people = [("Alice", 30), ("Bob", 17), ("Charlie", 22), ("Diana", 15)]
adults = list(filter(lambda p: p[1] >= 18, people))
print(adults)  # [('Alice', 30), ('Charlie', 22)]

min() and max() with key

Like sorted(), min() and max() accept a key function to determine which element is smallest or largest.

# Find the longest word
words = ["Python", "is", "absolutely", "fantastic"]
longest = max(words, key=lambda w: len(w))
print(longest)  # absolutely

# Find the cheapest product
products = [
    {"name": "Laptop", "price": 999},
    {"name": "Mouse", "price": 29},
    {"name": "Keyboard", "price": 79},
    {"name": "Monitor", "price": 349}
]
cheapest = min(products, key=lambda p: p["price"])
print(cheapest)  # {'name': 'Mouse', 'price': 29}

# Find the student with the highest GPA
students = [("Alice", 3.8), ("Bob", 3.9), ("Charlie", 3.5)]
top_student = max(students, key=lambda s: s[1])
print(f"{top_student[0]} with GPA {top_student[1]}")
# Bob with GPA 3.9

Lambda with Multiple Arguments

Lambdas can take two or more parameters, separated by commas, just like regular function parameters.

# Two arguments
multiply = lambda a, b: a * b
print(multiply(6, 7))  # 42

# Three arguments
full_name = lambda first, middle, last: f"{first} {middle} {last}"
print(full_name("Folau", "L", "Kaveinga"))  # Folau L Kaveinga

# With default arguments
power = lambda base, exp=2: base ** exp
print(power(3))     # 9  (3 squared)
print(power(3, 3))  # 27 (3 cubed)

# Using *args in a lambda
sum_all = lambda *args: sum(args)
print(sum_all(1, 2, 3, 4, 5))  # 15

You can also use **kwargs in a lambda, though at that point you should seriously consider whether a named function would be clearer.

# Lambda with **kwargs (legal but rarely practical)
build_greeting = lambda **kwargs: f"Hello, {kwargs.get('name', 'World')}!"
print(build_greeting(name="Folau"))  # Hello, Folau!
print(build_greeting())              # Hello, World!

Conditional Expressions in Lambda

Since a lambda body must be a single expression, you use Python’s ternary operator (value_if_true if condition else value_if_false) for conditional logic.

# Simple conditional
classify = lambda x: "even" if x % 2 == 0 else "odd"
print(classify(4))  # even
print(classify(7))  # odd

# Grade classification
grade = lambda score: "A" if score >= 90 else "B" if score >= 80 else "C" if score >= 70 else "F"
print(grade(95))  # A
print(grade(85))  # B
print(grade(72))  # C
print(grade(60))  # F

# Absolute value (manual implementation)
absolute = lambda x: x if x >= 0 else -x
print(absolute(-5))  # 5
print(absolute(3))   # 3

# Clamp a value to a range
clamp = lambda value, low, high: max(low, min(high, value))
print(clamp(15, 0, 10))   # 10
print(clamp(-3, 0, 10))   # 0
print(clamp(5, 0, 10))    # 5

While nested ternaries work (as in the grade example above), they become hard to read quickly. If you have more than two conditions, a named function with if/elif/else is almost always the better choice.

Immediately Invoked Lambda (IIFE Pattern)

You can define and call a lambda in one expression, similar to JavaScript’s Immediately Invoked Function Expressions (IIFEs). This is occasionally useful for inline computation or creating a scope.

# Immediately invoked lambda
result = (lambda x, y: x + y)(3, 5)
print(result)  # 8

# Useful in default argument initialization
import os
config = {
    "debug": (lambda: os.environ.get("DEBUG", "false").lower() == "true")(),
    "port": (lambda: int(os.environ.get("PORT", "8080")))()
}
print(config)  # {'debug': False, 'port': 8080}

# Inline computation in a data structure
data = {
    "sum": (lambda nums: sum(nums))([1, 2, 3, 4, 5]),
    "avg": (lambda nums: sum(nums) / len(nums))([1, 2, 3, 4, 5])
}
print(data)  # {'sum': 15, 'avg': 3.0}

This pattern is not common in Python. You will see it occasionally in configuration builders or when initializing computed values in data structures, but most of the time a regular function call or a comprehension is clearer.

Lambda in Data Processing

Lambda functions are particularly useful when processing collections of structured data — sorting, transforming, grouping, and filtering records.

# Sorting complex data by multiple criteria
employees = [
    {"name": "Alice", "dept": "Engineering", "salary": 95000},
    {"name": "Bob", "dept": "Marketing", "salary": 72000},
    {"name": "Charlie", "dept": "Engineering", "salary": 110000},
    {"name": "Diana", "dept": "Marketing", "salary": 68000},
    {"name": "Eve", "dept": "Engineering", "salary": 95000}
]

# Sort by department, then by salary descending
sorted_employees = sorted(
    employees,
    key=lambda e: (e["dept"], -e["salary"])
)
for emp in sorted_employees:
    print(f"  {emp['dept']:12} | {emp['name']:8} | ${emp['salary']:,}")
# Engineering  | Charlie  | $110,000
# Engineering  | Alice    | $95,000
# Engineering  | Eve      | $95,000
# Marketing    | Bob      | $72,000
# Marketing    | Diana    | $68,000
# Transforming collections
raw_data = ["  Alice  ", "BOB", "  charlie", "DIANA  "]
cleaned = list(map(lambda s: s.strip().title(), raw_data))
print(cleaned)  # ['Alice', 'Bob', 'Charlie', 'Diana']

# Grouping with a lambda (using itertools.groupby)
from itertools import groupby

transactions = [
    {"type": "credit", "amount": 500},
    {"type": "debit", "amount": 200},
    {"type": "credit", "amount": 300},
    {"type": "debit", "amount": 150},
    {"type": "credit", "amount": 700}
]

# Sort first (groupby requires sorted input)
sorted_tx = sorted(transactions, key=lambda t: t["type"])
for tx_type, group in groupby(sorted_tx, key=lambda t: t["type"]):
    items = list(group)
    total = sum(t["amount"] for t in items)
    print(f"{tx_type}: {len(items)} transactions, total ${total}")
# credit: 3 transactions, total $1500
# debit: 2 transactions, total $350

Practical Examples

Sort a List of Dicts by Multiple Keys

Sorting by multiple fields is one of the most common real-world uses of lambda.

products = [
    {"name": "Widget", "category": "A", "price": 25.99},
    {"name": "Gadget", "category": "B", "price": 49.99},
    {"name": "Doohickey", "category": "A", "price": 15.50},
    {"name": "Thingamajig", "category": "B", "price": 49.99},
    {"name": "Gizmo", "category": "A", "price": 25.99}
]

# Sort by category ascending, then price ascending, then name ascending
sorted_products = sorted(
    products,
    key=lambda p: (p["category"], p["price"], p["name"])
)

for p in sorted_products:
    print(f"  {p['category']} | ${p['price']:6.2f} | {p['name']}")
# A | $ 15.50 | Doohickey
# A | $ 25.99 | Gizmo
# A | $ 25.99 | Widget
# B | $ 49.99 | Gadget
# B | $ 49.99 | Thingamajig

Data Transformation Pipeline

You can chain map() and filter() to build a lightweight data pipeline.

orders = [
    {"customer": "Alice", "total": 150.00, "status": "completed"},
    {"customer": "Bob", "total": 89.50, "status": "pending"},
    {"customer": "Charlie", "total": 220.00, "status": "completed"},
    {"customer": "Diana", "total": 45.00, "status": "cancelled"},
    {"customer": "Eve", "total": 310.00, "status": "completed"}
]

# Pipeline: filter completed orders -> apply 10% discount -> extract summaries
result = list(
    map(
        lambda o: f"{o['customer']}: ${o['total'] * 0.9:.2f}",
        filter(
            lambda o: o["status"] == "completed",
            orders
        )
    )
)
print(result)
# ['Alice: $135.00', 'Charlie: $198.00', 'Eve: $279.00']

# The same pipeline using list comprehension (often more readable)
result_v2 = [
    f"{o['customer']}: ${o['total'] * 0.9:.2f}"
    for o in orders
    if o["status"] == "completed"
]
print(result_v2)
# ['Alice: $135.00', 'Charlie: $198.00', 'Eve: $279.00']

Event Handler Callbacks

Lambdas are a natural fit for short callback functions, especially in GUI frameworks or event-driven architectures.

# Simulating a simple event system
class EventEmitter:
    def __init__(self):
        self.handlers = {}

    def on(self, event, handler):
        self.handlers.setdefault(event, []).append(handler)

    def emit(self, event, *args):
        for handler in self.handlers.get(event, []):
            handler(*args)

emitter = EventEmitter()

# Register lambda callbacks
emitter.on("user_login", lambda user: print(f"Welcome back, {user}!"))
emitter.on("user_login", lambda user: print(f"Logging: {user} logged in"))
emitter.on("error", lambda code, msg: print(f"Error {code}: {msg}"))

emitter.emit("user_login", "Folau")
# Welcome back, Folau!
# Logging: Folau logged in

emitter.emit("error", 404, "Page not found")
# Error 404: Page not found

Quick String Operations

# Normalize a list of email addresses
emails = ["Alice@Example.COM", "  bob@test.org  ", "CHARLIE@DOMAIN.NET"]
normalized = list(map(lambda e: e.strip().lower(), emails))
print(normalized)
# ['alice@example.com', 'bob@test.org', 'charlie@domain.net']

# Extract domain from email
domains = list(map(lambda e: e.split("@")[1], normalized))
print(domains)
# ['example.com', 'test.org', 'domain.net']

# Sort strings by their last character
words = ["hello", "lambda", "python", "code"]
sorted_by_last = sorted(words, key=lambda w: w[-1])
print(sorted_by_last)
# ['lambda', 'code', 'python', 'hello']

# Pad strings to uniform length
items = ["cat", "elephant", "dog", "hippopotamus"]
padded = list(map(lambda s: s.ljust(15, "."), items))
for p in padded:
    print(p)
# cat............
# elephant.......
# dog............
# hippopotamus...

When NOT to Use Lambda

Lambda functions are a sharp tool, but like all sharp tools, they can cause damage when misused. Here are situations where you should use a named function instead.

1. Complex logic that requires multiple expressions

# BAD - trying to cram too much into a lambda
process = lambda x: x.strip().lower().replace(" ", "_") if isinstance(x, str) else str(x).strip()

# GOOD - use a named function
def process(x):
    """Normalize a value into a clean, lowercase, underscored string."""
    if isinstance(x, str):
        return x.strip().lower().replace(" ", "_")
    return str(x).strip()

2. When you need to reuse the function in multiple places

# BAD - assigning lambda to a variable for reuse (PEP 8 violation: E731)
calculate_tax = lambda amount: amount * 0.08

# GOOD - use def when you need a reusable, named function
def calculate_tax(amount):
    """Calculate sales tax at 8%."""
    return amount * 0.08

3. When debugging matters

Lambda functions show up as <lambda> in stack traces, making debugging harder. If the function is in a code path that might fail, give it a proper name so the traceback is useful.

4. When you need documentation

Lambdas cannot have docstrings. If the function’s purpose is not immediately obvious from context, a named function with a docstring is the responsible choice.

5. PEP 8 guidance

PEP 8, Python’s official style guide, explicitly discourages assigning lambdas to names: “Always use a def statement instead of an assignment statement that binds a lambda expression directly to an identifier.” Linting tools like flake8 will flag this as error E731.

Alternatives to Lambda

Python provides several alternatives that can replace lambdas and sometimes produce cleaner code.

The operator Module

The operator module provides function equivalents of common operators. These are faster than lambdas because they are implemented in C.

import operator

# Instead of: lambda a, b: a + b
print(operator.add(3, 5))  # 8

# Instead of: sorted(items, key=lambda x: x[1])
from operator import itemgetter
students = [("Alice", 88), ("Bob", 95), ("Charlie", 72)]
sorted_students = sorted(students, key=itemgetter(1))
print(sorted_students)
# [('Charlie', 72), ('Alice', 88), ('Bob', 95)]

# Instead of: sorted(objects, key=lambda o: o.name)
from operator import attrgetter

class Student:
    def __init__(self, name, gpa):
        self.name = name
        self.gpa = gpa

students = [Student("Alice", 3.8), Student("Bob", 3.9), Student("Charlie", 3.5)]
sorted_students = sorted(students, key=attrgetter("gpa"))
for s in sorted_students:
    print(f"  {s.name}: {s.gpa}")
# Charlie: 3.5
# Alice: 3.8
# Bob: 3.9

# Multiple keys with itemgetter
data = [("A", 2, 300), ("B", 1, 200), ("A", 1, 100)]
sorted_data = sorted(data, key=itemgetter(0, 1))
print(sorted_data)
# [('A', 1, 100), ('A', 2, 300), ('B', 1, 200)]

functools.partial

functools.partial creates a new function with some arguments pre-filled. This is cleaner than a lambda that just wraps another function call.

from functools import partial

# Instead of: lambda x: int(x, base=2)
binary_to_int = partial(int, base=2)
print(binary_to_int("1010"))  # 10
print(binary_to_int("1111"))  # 15

# Instead of: lambda x: round(x, 2)
round_2 = partial(round, ndigits=2)
print(round_2(3.14159))  # 3.14

# Pre-fill a logging function
import logging
error_log = partial(logging.log, logging.ERROR)
# error_log("Something went wrong")  # logs at ERROR level

Named Functions

Sometimes the simplest alternative is the best. A well-named function, even a short one, is more readable than a lambda when used in multiple places or when the logic is not immediately obvious.

# Instead of a lambda for a sort key
def by_last_name(full_name):
    """Extract last name for sorting."""
    return full_name.split()[-1].lower()

names = ["John Smith", "Alice Johnson", "Bob Adams"]
sorted_names = sorted(names, key=by_last_name)
print(sorted_names)
# ['Bob Adams', 'Alice Johnson', 'John Smith']

Common Pitfalls

1. Late Binding in Closures

This is the single most common lambda gotcha. When a lambda references a variable from an enclosing scope, it captures the variable itself, not its current value. The variable is looked up at call time, not at definition time.

# THE BUG
functions = []
for i in range(5):
    functions.append(lambda: i)

# All lambdas see the FINAL value of i
print([f() for f in functions])
# [4, 4, 4, 4, 4]  -- NOT [0, 1, 2, 3, 4]!

# THE FIX: capture the current value as a default argument
functions = []
for i in range(5):
    functions.append(lambda i=i: i)

print([f() for f in functions])
# [0, 1, 2, 3, 4]  -- correct!

# Another common scenario with event handlers
buttons = {}
for label in ["Save", "Delete", "Cancel"]:
    # BUG: all buttons would print "Cancel"
    # buttons[label] = lambda: print(f"Clicked: {label}")

    # FIX: capture label's current value
    buttons[label] = lambda lbl=label: print(f"Clicked: {lbl}")

buttons["Save"]()    # Clicked: Save
buttons["Delete"]()  # Clicked: Delete

This is not a lambda-specific issue — it affects all closures in Python — but it comes up most often with lambdas because they are frequently created inside loops.

2. No Statements Allowed

A lambda body must be a single expression. You cannot use statements like print() (in Python 2), raise, assert, assignments, import, or multi-line logic.

# These will cause SyntaxError
# lambda x: x = 5              # assignment not allowed
# lambda x: import math        # import not allowed
# lambda x: assert x > 0       # assert not allowed

# Workarounds (but consider using def instead)
# For raising exceptions, you can use a helper or an expression trick
validate = lambda x: x if x > 0 else (_ for _ in ()).throw(ValueError(f"Expected positive, got {x}"))
# But really, just use def:
def validate(x):
    if x <= 0:
        raise ValueError(f"Expected positive, got {x}")
    return x

3. No Type Hints

Lambda functions do not support type annotations. If type safety matters in your codebase (and it should), this is a significant limitation.

# Cannot add type hints to a lambda
# lambda x: int -> int: x * 2  # SyntaxError

# Use def when type hints are important
def double(x: int) -> int:
    return x * 2

Best Practices

Here is a concise guide to using lambda functions effectively in production Python code.

1. Keep lambdas short and simple. If the expression is not immediately obvious at a glance, use a named function. A lambda should be understandable in under three seconds.

# Good - immediately clear
sorted(users, key=lambda u: u["last_name"])

# Bad - takes too long to parse
sorted(users, key=lambda u: (u["active"], -u["login_count"], u["name"].lower()))
# Better as a named function
def user_sort_key(u):
    return (u["active"], -u["login_count"], u["name"].lower())
sorted(users, key=user_sort_key)

2. Prefer named functions for reuse. If you find yourself writing the same lambda in multiple places, extract it into a def.

3. Use lambdas for short callbacks and sort keys. This is their sweet spot. When you need a quick, one-off function for sorted(), map(), filter(), min(), max(), or a callback argument, lambda is ideal.

4. Consider operator.itemgetter and operator.attrgetter for attribute and index access. They are faster and more explicit than an equivalent lambda.

5. Watch out for late binding in loops. Always capture loop variables as default arguments when creating lambdas inside a loop.

6. Never nest lambdas. A lambda that returns a lambda is legal Python, but it is an unreadable nightmare. Use named functions.

# Don't do this
make_adder = lambda x: lambda y: x + y

# Do this instead
def make_adder(x):
    def adder(y):
        return x + y
    return adder

7. Use list comprehensions over map/filter with lambdas when it improves readability.

# map + lambda
result = list(map(lambda x: x ** 2, range(10)))

# List comprehension (preferred for simple transformations)
result = [x ** 2 for x in range(10)]

Key Takeaways

  • A lambda function is an anonymous, single-expression function defined with the lambda keyword.
  • The syntax is lambda arguments: expression — no return statement, no function name, no docstring.
  • Lambdas are first-class objects, just like functions created with def. Under the hood, they are identical.
  • Their sweet spot is inline usage with higher-order functions: sorted(), map(), filter(), min(), max().
  • Use the ternary operator (x if condition else y) for conditional logic inside a lambda.
  • Beware of late binding in closures — capture loop variables as default arguments to avoid subtle bugs.
  • Lambdas cannot contain statements (assignments, imports, raise, assert) or type hints.
  • PEP 8 discourages assigning lambdas to names — use def when you need a named, reusable function.
  • Consider alternatives like operator.itemgetter, operator.attrgetter, and functools.partial for cleaner code.
  • The golden rule: if a lambda is not immediately readable, replace it with a named function. Readability always wins.

Source code on Github

March 13, 2021

Python – String Methods

Introduction

Strings are one of the most frequently used data types in Python — and in programming in general. Whether you are parsing user input, building API responses, reading files, or constructing SQL queries, you are working with strings. Mastering string methods is not optional; it is a core skill that separates beginners from competent developers.

The single most important thing to understand about Python strings is that they are immutable. Once a string object is created in memory, it cannot be changed. Every method that appears to “modify” a string actually returns a new string object. This has real consequences for performance and for how you think about your code.

name = "Folau"
# This does NOT modify the original string
upper_name = name.upper()

print(name)        # Folau  -- unchanged
print(upper_name)  # FOLAU  -- new string object
print(id(name) == id(upper_name))  # False -- different objects in memory

Keep immutability in mind throughout this tutorial. It will explain why certain patterns (like concatenation in loops) are slow, and why methods like join() exist.

 

String Creation

Python gives you several ways to create strings. Each has its place.

# Single quotes -- most common for short strings
name = 'Folau'

# Double quotes -- identical behavior, useful when string contains apostrophes
message = "It's a great day to code"

# Triple quotes -- multiline strings, also used for docstrings
bio = """Software developer
who enjoys building
clean, testable code."""

# Triple single quotes work too
query = '''SELECT *
FROM users
WHERE active = 1'''

print(bio)
# Software developer
# who enjoys building
# clean, testable code.

Raw strings treat backslashes as literal characters. This is essential for regular expressions and Windows file paths.

# Without raw string -- backslash-n is interpreted as newline
path = "C:\new_folder\test"
print(path)

# With raw string -- backslashes are literal
path = r"C:
ew_folder	est"
print(path)  # C:
ew_folder	est

# Raw strings are critical for regex patterns
import re
pattern = r"\d{3}-\d{4}"  # Without r, \d would be an invalid escape

Byte strings represent raw bytes rather than Unicode text. You will encounter these when working with network sockets, binary files, or encoding/decoding operations.

# Byte string
data = b"Hello"
print(type(data))  # <class 'bytes'>

# Convert between str and bytes
text = "Python"
encoded = text.encode("utf-8")   # str to bytes
decoded = encoded.decode("utf-8")  # bytes to str
print(encoded)   # b'Python'
print(decoded)   # Python

 

String Indexing and Slicing

Strings are sequences, which means you can access individual characters by index and extract substrings with slicing. This is fundamental — you will use it constantly.

text = "Python"

# Positive indexing (left to right, starting at 0)
print(text[0])   # P
print(text[1])   # y
print(text[5])   # n

# Negative indexing (right to left, starting at -1)
print(text[-1])  # n  (last character)
print(text[-2])  # o  (second to last)
print(text[-6])  # P  (same as text[0])

Slicing syntax: string[start:stop:step]

  • start — inclusive (defaults to 0)
  • stop — exclusive (defaults to end of string)
  • step — how many characters to skip (defaults to 1)
text = "Hello, World!"

# Basic slicing
print(text[0:5])    # Hello
print(text[7:12])   # World
print(text[:5])     # Hello  (start defaults to 0)
print(text[7:])     # World! (stop defaults to end)

# Slicing with step
print(text[::2])    # Hlo ol!  (every 2nd character)
print(text[1::2])   # el,Wrd   (every 2nd character, starting at index 1)

# Reverse a string
print(text[::-1])   # !dlroW ,olleH

# Practical: extract domain from email
email = "dev@lovemesomecoding.com"
domain = email[email.index("@") + 1:]
print(domain)  # lovemesomecoding.com

 

String Formatting

String formatting is how you embed variables and expressions inside strings. Python has evolved through several approaches. Use f-strings for new code — they are the most readable and performant.

f-strings (Python 3.6+) — Recommended

name = "Folau"
age = 30
salary = 95000.50

# Basic variable interpolation
print(f"My name is {name} and I am {age} years old.")

# Expressions inside braces
print(f"Next year I will be {age + 1}")

# Formatting numbers
print(f"Salary: ${salary:,.2f}")       # Salary: $95,000.50
print(f"Hex: {255:#x}")                # Hex: 0xff
print(f"Percentage: {0.856:.1%}")      # Percentage: 85.6%

# Padding and alignment
print(f"{'left':<20}|")     # left                |
print(f"{'center':^20}|")   #        center        |
print(f"{'right':>20}|")    #                right|

# Multiline f-strings
user_info = (
    f"Name: {name}
"
    f"Age: {age}
"
    f"Salary: ${salary:,.2f}"
)
print(user_info)

.format() method — Still common in existing codebases

# Positional arguments
print("Hello, {}! You are {} years old.".format("Folau", 30))

# Named arguments
print("Hello, {name}! You are {age} years old.".format(name="Folau", age=30))

# Index-based
print("{0} loves {1}. {0} also loves {2}.".format("Folau", "Python", "Java"))

# Number formatting
print("Price: ${:,.2f}".format(1999.99))  # Price: $1,999.99

% formatting — Legacy, avoid in new code

# You will see this in older codebases
name = "Folau"
age = 30
print("Hello, %s! You are %d years old." % (name, age))
print("Pi is approximately %.4f" % 3.14159)

# Why avoid it: limited features, error-prone with tuples, less readable

Template strings — Safe substitution for user-provided templates

from string import Template

# Use when the format string comes from user input (security)
template = Template("Hello, $name! Welcome to $site.")
result = template.substitute(name="Folau", site="lovemesomecoding.com")
print(result)  # Hello, Folau! Welcome to lovemesomecoding.com.

# safe_substitute won't raise KeyError for missing keys
result = template.safe_substitute(name="Folau")
print(result)  # Hello, Folau! Welcome to $site.

 

Common String Methods

Python strings have over 40 built-in methods. Here are the ones you will use most, organized by category.

 

Case Methods

These return a new string with the casing changed. Remember: the original string is never modified.

text = "hello, World! welcome to PYTHON."

print(text.upper())       # HELLO, WORLD! WELCOME TO PYTHON.
print(text.lower())       # hello, world! welcome to python.
print(text.title())       # Hello, World! Welcome To Python.
print(text.capitalize())  # Hello, world! welcome to python.  (only first char)
print(text.swapcase())    # HELLO, wORLD! WELCOME TO python.

# Practical: case-insensitive comparison
user_input = "Yes"
if user_input.lower() == "yes":
    print("User confirmed")  # This runs

# casefold() -- aggressive lowercasing for case-insensitive matching
# Handles special Unicode characters better than lower()
german = "Straße"
print(german.lower())     # straße
print(german.casefold())  # strasse  -- better for comparison

 

Search Methods

These methods help you find substrings and check string content.

text = "Python is powerful. Python is readable. Python is fun."

# find() -- returns index of first occurrence, or -1 if not found
print(text.find("Python"))       # 0
print(text.find("Python", 1))    # 20  (search starting from index 1)
print(text.find("Java"))         # -1  (not found)

# rfind() -- searches from the right
print(text.rfind("Python"))      # 40  (last occurrence)

# index() -- like find(), but raises ValueError if not found
print(text.index("Python"))      # 0
# text.index("Java")             # ValueError! Use find() if missing is possible

# count() -- how many times a substring appears
print(text.count("Python"))      # 3
print(text.count("is"))          # 3

# startswith() and endswith()
url = "https://lovemesomecoding.com/python"
print(url.startswith("https"))   # True
print(url.endswith(".com/python"))  # True

# You can pass a tuple of prefixes/suffixes
filename = "script.py"
print(filename.endswith((".py", ".js", ".ts")))  # True

# 'in' operator -- the most Pythonic way to check membership
print("powerful" in text)   # True
print("Java" in text)       # False
print("Java" not in text)   # True

 

Modification Methods

These methods return new strings with content added, removed, or replaced.

# strip() -- removes leading and trailing whitespace (or specified characters)
messy = "   Hello, World!   "
print(messy.strip())          # "Hello, World!"
print(messy.lstrip())         # "Hello, World!   "
print(messy.rstrip())         # "   Hello, World!"

# Strip specific characters
csv_value = "###price###"
print(csv_value.strip("#"))   # "price"

# replace(old, new, count)
text = "I love Java. Java is great."
print(text.replace("Java", "Python"))        # I love Python. Python is great.
print(text.replace("Java", "Python", 1))     # I love Python. Java is great. (only first)

# split() -- breaks string into a list
csv_line = "name,age,city,country"
fields = csv_line.split(",")
print(fields)  # ['name', 'age', 'city', 'country']

# Split with maxsplit
log = "2024-01-15 ERROR Something went wrong in the system"
parts = log.split(" ", 2)  # Split into at most 3 parts
print(parts)  # ['2024-01-15', 'ERROR', 'Something went wrong in the system']

# splitlines() -- splits on line boundaries
multiline = "Line 1
Line 2
Line 3"
print(multiline.splitlines())  # ['Line 1', 'Line 2', 'Line 3']

# join() -- the inverse of split()
words = ["Python", "is", "awesome"]
print(" ".join(words))       # Python is awesome
print(", ".join(words))      # Python, is, awesome
print("
".join(words))      # Each word on its own line

# Practical: build a file path
parts = ["home", "folau", "projects", "app"]
path = "/".join(parts)
print(f"/{path}")  # /home/folau/projects/app

 

Validation Methods

These return True or False and are great for input validation.

# isalpha() -- only alphabetic characters (no spaces, no numbers)
print("Hello".isalpha())      # True
print("Hello World".isalpha()) # False (space)
print("Hello123".isalpha())   # False (digits)

# isdigit() -- only digit characters
print("12345".isdigit())      # True
print("123.45".isdigit())     # False (decimal point)
print("-123".isdigit())       # False (minus sign)

# isnumeric() -- broader than isdigit(), includes Unicode numerals
print("12345".isnumeric())    # True

# isalnum() -- alphanumeric (letters or digits)
print("Python3".isalnum())    # True
print("Python 3".isalnum())   # False (space)

# isspace() -- only whitespace characters
print("   ".isspace())        # True
print("  a  ".isspace())      # False

# isupper() / islower()
print("HELLO".isupper())      # True
print("hello".islower())      # True
print("Hello".isupper())      # False
print("Hello".islower())      # False

# Practical: validate a username
def is_valid_username(username):
    """Username must be 3-20 chars, alphanumeric or underscore."""
    if not 3 <= len(username) <= 20:
        return False
    return all(c.isalnum() or c == "_" for c in username)

print(is_valid_username("folau_dev"))    # True
print(is_valid_username("fo"))           # False (too short)
print(is_valid_username("hello world"))  # False (space)

 

Alignment and Padding Methods

Useful for formatting output, building CLI tools, or creating text-based tables.

# center(width, fillchar)
print("Python".center(20))        #        Python
print("Python".center(20, "-"))   # -------Python-------

# ljust(width, fillchar) and rjust(width, fillchar)
print("Name".ljust(15) + "Age")   # Name           Age
print("42".rjust(10, "0"))        # 0000000042

# zfill(width) -- pad with zeros on the left
print("42".zfill(5))     # 00042
print("-42".zfill(5))    # -0042  (handles negative sign correctly)

# Practical: format a simple table
headers = ["Name", "Age", "City"]
rows = [
    ["Folau", "30", "Salt Lake City"],
    ["Sione", "28", "San Francisco"],
    ["Mele", "25", "New York"],
]

# Print header
print(" | ".join(h.ljust(15) for h in headers))
print("-" * 51)

# Print rows
for row in rows:
    print(" | ".join(val.ljust(15) for val in row))

# Output:
# Name            | Age             | City
# ---------------------------------------------------
# Folau           | 30              | Salt Lake City
# Sione           | 28              | San Francisco
# Mele            | 25              | New York

 

String Concatenation

There are multiple ways to combine strings. The approach you choose matters for performance.

# The + operator -- fine for a few strings
first = "Hello"
last = "World"
greeting = first + ", " + last + "!"
print(greeting)  # Hello, World!

# The * operator -- repeat a string
divider = "-" * 40
print(divider)   # ----------------------------------------

# join() -- the right way to combine many strings
words = ["Python", "is", "fast", "and", "readable"]
sentence = " ".join(words)
print(sentence)  # Python is fast and readable

Why join() is better than + in loops:

Because strings are immutable, every + operation creates a new string object and copies all the data. In a loop with N iterations, this means O(N²) time complexity. join() pre-calculates the total size, allocates once, and copies once — O(N) time.

import time

n = 100_000

# BAD: concatenation in a loop -- O(n squared), slow
start = time.time()
result = ""
for i in range(n):
    result += str(i)
bad_time = time.time() - start

# GOOD: collect and join -- O(n), fast
start = time.time()
parts = []
for i in range(n):
    parts.append(str(i))
result = "".join(parts)
good_time = time.time() - start

# BEST: generator expression with join
start = time.time()
result = "".join(str(i) for i in range(n))
best_time = time.time() - start

print(f"Concatenation: {bad_time:.4f}s")
print(f"List + join:   {good_time:.4f}s")
print(f"Generator join: {best_time:.4f}s")

# Typical output:
# Concatenation: 0.0350s
# List + join:   0.0120s
# Generator join: 0.0110s

 

Regular Expressions Basics

When built-in string methods are not powerful enough, Python’s re module provides regular expressions for advanced pattern matching. Regex is a deep topic, but here are the essentials every developer needs.

import re

text = "Contact us at support@example.com or sales@example.com"

# search() -- find the first match
match = re.search(r"[\w.]+@[\w.]+", text)
if match:
    print(match.group())  # support@example.com

# match() -- only matches at the START of the string
result = re.match(r"Contact", text)
print(result.group() if result else "No match")  # Contact

result = re.match(r"support", text)
print(result)  # None -- "support" is not at the start

# findall() -- find ALL matches, returns a list of strings
emails = re.findall(r"[\w.]+@[\w.]+", text)
print(emails)  # ['support@example.com', 'sales@example.com']

# sub() -- search and replace with regex
cleaned = re.sub(r"[\w.]+@[\w.]+", "[REDACTED]", text)
print(cleaned)  # Contact us at [REDACTED] or [REDACTED]

# compile() -- pre-compile a pattern for repeated use (better performance)
email_pattern = re.compile(r"[\w.]+@[\w.]+")
print(email_pattern.findall(text))  # ['support@example.com', 'sales@example.com']

Common regex patterns you should know:

import re

# \d  -- digit            \D -- non-digit
# \w  -- word char (a-z, A-Z, 0-9, _)  \W -- non-word char
# \s  -- whitespace       \S -- non-whitespace
# .   -- any char except newline
# ^   -- start of string  $ -- end of string
# +   -- one or more      * -- zero or more      ? -- zero or one
# {n} -- exactly n        {n,m} -- between n and m

# Extract phone numbers
text = "Call 555-1234 or 555-5678 for info"
phones = re.findall(r"\d{3}-\d{4}", text)
print(phones)  # ['555-1234', '555-5678']

# Validate a date format (YYYY-MM-DD)
date_pattern = re.compile(r"^\d{4}-\d{2}-\d{2}$")
print(bool(date_pattern.match("2024-01-15")))  # True
print(bool(date_pattern.match("01-15-2024")))  # False

# Groups -- capture specific parts of a match
log = "2024-01-15 ERROR: Connection timed out"
match = re.match(r"(\d{4}-\d{2}-\d{2})\s+(\w+):\s+(.*)", log)
if match:
    date, level, message = match.groups()
    print(f"Date: {date}")      # Date: 2024-01-15
    print(f"Level: {level}")    # Level: ERROR
    print(f"Message: {message}")  # Message: Connection timed out

 

Practical Examples

Email Validator

import re

def is_valid_email(email):
    """
    Validate an email address.
    Rules:
    - Must have exactly one @
    - Local part: letters, digits, dots, hyphens, underscores
    - Domain: letters, digits, hyphens, with at least one dot
    - TLD: 2-10 alphabetic characters
    """
    pattern = r"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,10}$"
    return bool(re.match(pattern, email))

# Test cases
test_emails = [
    "user@example.com",          # True
    "first.last@company.co.uk",  # True
    "dev+tag@gmail.com",         # True
    "invalid@",                  # False
    "@no-local.com",             # False
    "spaces in@email.com",       # False
    "no@dots",                   # False
]

for email in test_emails:
    status = "VALID" if is_valid_email(email) else "INVALID"
    print(f"  {status}: {email}")

Text Cleaner

import re
import string

def clean_text(text):
    """
    Clean raw text for processing:
    1. Remove punctuation
    2. Normalize whitespace (collapse multiple spaces or tabs into one)
    3. Strip leading/trailing whitespace
    4. Convert to lowercase
    """
    # Remove punctuation
    text = text.translate(str.maketrans("", "", string.punctuation))

    # Normalize whitespace
    text = re.sub(r"\s+", " ", text)

    # Strip and lowercase
    return text.strip().lower()

raw = "  Hello,   World!!!   This   is		a   TEST...  "
print(clean_text(raw))
# Output: hello world this is a test

# Advanced version: preserve sentence structure
def clean_text_advanced(text, lowercase=True, remove_punct=True):
    """Configurable text cleaner."""
    if remove_punct:
        # Keep periods and question marks for sentence boundaries
        text = re.sub(r"[^\w\s.?]", "", text)

    text = re.sub(r"\s+", " ", text).strip()

    if lowercase:
        text = text.lower()
    return text

raw = "Hello, World!!! How are you??? I'm doing GREAT..."
print(clean_text_advanced(raw))
# Output: hello world. how are you?? im doing great.

Password Strength Checker

import re

def check_password_strength(password):
    """
    Check password strength and return a score with feedback.

    Criteria:
    - Length >= 8 characters
    - Contains uppercase letter
    - Contains lowercase letter
    - Contains digit
    - Contains special character
    - No common patterns
    """
    score = 0
    feedback = []

    # Length check
    if len(password) >= 8:
        score += 1
    else:
        feedback.append("Must be at least 8 characters")

    if len(password) >= 12:
        score += 1  # Bonus for longer passwords

    # Character type checks
    if re.search(r"[A-Z]", password):
        score += 1
    else:
        feedback.append("Add an uppercase letter")

    if re.search(r"[a-z]", password):
        score += 1
    else:
        feedback.append("Add a lowercase letter")

    if re.search(r"\d", password):
        score += 1
    else:
        feedback.append("Add a digit")

    if re.search(r'[!@#$%^&*(),.?":{}|<>]', password):
        score += 1
    else:
        feedback.append("Add a special character")

    # Common pattern check
    common_patterns = ["password", "123456", "qwerty", "abc123"]
    if password.lower() in common_patterns:
        score = 0
        feedback = ["This is a commonly used password. Choose something unique."]

    # Rating
    if score <= 2:
        strength = "Weak"
    elif score <= 4:
        strength = "Moderate"
    else:
        strength = "Strong"

    return {
        "score": score,
        "max_score": 6,
        "strength": strength,
        "feedback": feedback,
    }

# Test it
passwords = ["abc", "password", "Hello123", "C0mpl3x!Pass", "Str0ng#Pass!2024"]
for pwd in passwords:
    result = check_password_strength(pwd)
    print(f"'{pwd}' => {result['strength']} ({result['score']}/{result['max_score']})")
    if result["feedback"]:
        for tip in result["feedback"]:
            print(f"    - {tip}")

Simple Template Engine

import re

def render_template(template, context):
    """
    A simple template engine that replaces {{ variable }} placeholders
    with values from the context dictionary.

    Supports:
    - {{ variable }} -- simple substitution
    - {{ variable | upper }} -- with filter
    - {{ variable | default: 'fallback' }} -- default values
    """
    def replace_placeholder(match):
        expression = match.group(1).strip()

        # Check for filter (pipe)
        if "|" in expression:
            var_name, filter_expr = expression.split("|", 1)
            var_name = var_name.strip()
            filter_expr = filter_expr.strip()

            value = context.get(var_name, "")

            # Apply filters
            if filter_expr == "upper":
                return str(value).upper()
            elif filter_expr == "lower":
                return str(value).lower()
            elif filter_expr == "title":
                return str(value).title()
            elif filter_expr.startswith("default:"):
                if not value:
                    default_val = filter_expr.split(":", 1)[1].strip().strip("'"")
                    return default_val
                return str(value)
        else:
            var_name = expression
            value = context.get(var_name, "")
            return str(value)

        return str(value)

    # Match {{ ... }} patterns
    pattern = r"\{\{\s*(.*?)\s*\}\}"
    return re.sub(pattern, replace_placeholder, template)

# Usage
template_text = """
Hello, {{ name | title }}!

Your role: {{ role | upper }}
Company: {{ company | default: 'Freelance' }}
Email: {{ email }}
"""

context = {
    "name": "folau kaveinga",
    "role": "senior developer",
    "email": "folau@example.com",
}

print(render_template(template_text, context))
# Hello, Folau Kaveinga!
#
# Your role: SENIOR DEVELOPER
# Company: Freelance
# Email: folau@example.com

 

Common Pitfalls

1. Forgetting that strings are immutable

# WRONG -- this does nothing useful
name = "folau"
name.upper()       # Returns "FOLAU" but you never captured it
print(name)        # folau -- unchanged!

# RIGHT -- assign the result
name = "folau"
name = name.upper()
print(name)        # FOLAU

2. Concatenation in loops (performance killer)

# BAD -- O(n squared) time, creates n intermediate string objects
result = ""
for word in large_list:
    result += word + " "

# GOOD -- O(n) time, one allocation
result = " ".join(large_list)

3. Encoding issues with non-ASCII text

# Python 3 strings are Unicode by default, but issues arise at boundaries

# Reading a file with unknown encoding
try:
    with open("data.txt", "r", encoding="utf-8") as f:
        content = f.read()
except UnicodeDecodeError:
    # Fallback: try a different encoding or use errors parameter
    with open("data.txt", "r", encoding="latin-1") as f:
        content = f.read()

# Or handle errors gracefully
with open("data.txt", "r", encoding="utf-8", errors="replace") as f:
    content = f.read()  # Replaces bad bytes with ?

4. Using is instead of == for string comparison

# 'is' checks identity (same object in memory), not equality
a = "hello"
b = "hello"
print(a is b)   # True -- but only due to Python's string interning optimization

a = "hello world"
b = "hello world"
print(a is b)   # Might be False! Not guaranteed for longer strings

# ALWAYS use == for string comparison
print(a == b)   # True -- correct and reliable

5. Not using raw strings for regex

import re

# BAD --  is interpreted as a backspace character
pattern = "word"

# GOOD -- raw string,  is a word boundary in regex
pattern = r"word"
print(re.findall(pattern, "a word in a sentence"))  # ['word']

 

Best Practices

  • Use f-strings for string formatting. They are the most readable, performant, and Pythonic option (Python 3.6+).
  • Use join() when combining many strings. Never concatenate in a loop with +.
  • Use raw strings (r"...") for regex patterns to avoid backslash confusion.
  • Use in for substring checks instead of find() != -1. It reads better and is more Pythonic.
  • Use startswith() and endswith() with tuples when checking multiple options.
  • Specify encoding explicitly when reading/writing files: open("file.txt", encoding="utf-8").
  • Use str.translate() for bulk character removal or replacement — it is significantly faster than chained replace() calls.
  • Use casefold() instead of lower() for case-insensitive comparisons, especially with international text.
  • Pre-compile regex patterns with re.compile() when using the same pattern multiple times.

 

Key Takeaways

  1. Strings are immutable. Every “modification” creates a new string. Assign the result or you lose it.
  2. f-strings are the modern standard for string formatting. Use them unless you have a specific reason not to.
  3. Slicing is powerful. Master string[start:stop:step] — it handles extraction, reversal, and sampling.
  4. Built-in methods handle 90% of use cases. Know split(), join(), strip(), replace(), find(), startswith(), and endswith() cold.
  5. join() beats + in loops. The performance difference is real and grows with data size — O(n) vs O(n²).
  6. Use regex when built-in methods are not enough, but do not reach for it first. Simple string methods are faster and more readable.
  7. Always validate and sanitize user-provided strings before processing them.
  8. Handle encoding explicitly. Specify utf-8 when reading/writing files to avoid surprises across platforms.
March 12, 2021

Python – Dictionaries & Sets

Dictionaries and sets are two of the most powerful and frequently used data structures in Python. Dictionaries give you a fast, flexible way to associate keys with values — think of them as a lookup table where you can instantly retrieve data by its label. Sets give you an unordered collection of unique elements with blazing-fast membership testing. Together, they solve a huge range of real-world programming problems: configuration management, deduplication, counting, caching, grouping, and more. If you are writing Python professionally, you will reach for dicts and sets daily.

In this tutorial, we will cover both data structures thoroughly — from creation and basic operations through advanced patterns like defaultdict, Counter, set algebra, and frozensets. By the end, you will understand not just the syntax, but when and why to choose each structure.

Part 1: Dictionaries

Introduction to Dictionaries

A dictionary (dict) is a mutable, unordered (as of Python 3.7+, insertion-ordered) collection of key-value pairs. Each key must be unique and hashable (strings, numbers, tuples of immutables), and each key maps to exactly one value. Dictionaries are implemented as hash tables, which means lookups, insertions, and deletions all run in O(1) average time — regardless of how many entries the dictionary contains.

Use a dictionary when you need to:

  • Map identifiers to data (user ID to user record, config key to value)
  • Count occurrences of items
  • Group data by category
  • Build lookup tables for fast retrieval
  • Represent structured data (similar to JSON objects)

Creating Dictionaries

There are several ways to create a dictionary in Python. Choose the one that best fits your situation.

Literal syntax (most common)

# Curly braces with key: value pairs
user = {
    "name": "Folau",
    "age": 30,
    "city": "Salt Lake City",
    "is_active": True
}
print(user)
# {'name': 'Folau', 'age': 30, 'city': 'Salt Lake City', 'is_active': True}

The dict() constructor

# From keyword arguments (keys must be valid identifiers)
user = dict(name="Folau", age=30, city="Salt Lake City")
print(user)
# {'name': 'Folau', 'age': 30, 'city': 'Salt Lake City'}

# From a list of tuples
pairs = [("host", "localhost"), ("port", 5432), ("db", "myapp")]
config = dict(pairs)
print(config)
# {'host': 'localhost', 'port': 5432, 'db': 'myapp'}

# From two parallel lists using zip
keys = ["name", "language", "level"]
values = ["Folau", "Python", "Senior"]
profile = dict(zip(keys, values))
print(profile)
# {'name': 'Folau', 'language': 'Python', 'level': 'Senior'}

Dictionary comprehension

# Create a dict of squares
squares = {x: x ** 2 for x in range(1, 6)}
print(squares)
# {1: 1, 2: 4, 3: 9, 4: 16, 5: 25}

dict.fromkeys() — initialize with a default value

# All keys get the same default value
statuses = dict.fromkeys(["Alice", "Bob", "Charlie"], "pending")
print(statuses)
# {'Alice': 'pending', 'Bob': 'pending', 'Charlie': 'pending'}

# Without a default, values are None
placeholders = dict.fromkeys(["name", "email", "phone"])
print(placeholders)
# {'name': None, 'email': None, 'phone': None}

Accessing Values

You can retrieve values from a dictionary using bracket notation or the get() method. The key difference: brackets raise a KeyError if the key does not exist, while get() returns a default value (defaulting to None).

user = {"name": "Folau", "age": 30, "city": "Salt Lake City"}

# Bracket notation
print(user["name"])    # Folau
# print(user["email"])  # KeyError: 'email'

# get() with default - safer for uncertain keys
print(user.get("name"))           # Folau
print(user.get("email"))          # None (no KeyError)
print(user.get("email", "N/A"))   # N/A (custom default)

You can also retrieve all keys, values, or key-value pairs as view objects. These views are dynamic — they reflect changes to the dictionary in real time.

user = {"name": "Folau", "age": 30, "city": "Salt Lake City"}

# Keys
print(user.keys())    # dict_keys(['name', 'age', 'city'])

# Values
print(user.values())  # dict_values(['Folau', 30, 'Salt Lake City'])

# Key-value pairs as tuples
print(user.items())   # dict_items([('name', 'Folau'), ('age', 30), ('city', 'Salt Lake City')])

# Check if a key exists
print("name" in user)    # True
print("email" in user)   # False

Modifying Dictionaries

Dictionaries are mutable. You can add new key-value pairs, update existing ones, and merge dictionaries together.

Add or update a single key

user = {"name": "Folau", "age": 30}

# Add a new key
user["email"] = "folau@example.com"

# Update an existing key
user["age"] = 31

print(user)
# {'name': 'Folau', 'age': 31, 'email': 'folau@example.com'}

update() — merge another dictionary or key-value pairs

config = {"host": "localhost", "port": 5432}

# Merge from another dict (existing keys are overwritten)
config.update({"port": 3306, "database": "myapp"})
print(config)
# {'host': 'localhost', 'port': 3306, 'database': 'myapp'}

# Merge from keyword arguments
config.update(user="admin", password="secret")
print(config)
# {'host': 'localhost', 'port': 3306, 'database': 'myapp', 'user': 'admin', 'password': 'secret'}

setdefault() — set a key only if it does not exist

user = {"name": "Folau", "age": 30}

# Key does not exist - sets it and returns the value
email = user.setdefault("email", "folau@example.com")
print(email)  # folau@example.com
print(user)   # {'name': 'Folau', 'age': 30, 'email': 'folau@example.com'}

# Key already exists - does nothing, returns existing value
name = user.setdefault("name", "Unknown")
print(name)   # Folau (not overwritten)

Merge operator |= (Python 3.9+)

# The | operator creates a new merged dictionary
defaults = {"theme": "dark", "language": "en", "page_size": 25}
overrides = {"theme": "light", "page_size": 50}

final = defaults | overrides
print(final)
# {'theme': 'light', 'language': 'en', 'page_size': 50}

# The |= operator updates in place
defaults |= overrides
print(defaults)
# {'theme': 'light', 'language': 'en', 'page_size': 50}

Removing Items

Python provides several ways to remove entries from a dictionary, each with different behavior.

user = {"name": "Folau", "age": 30, "city": "Salt Lake City", "email": "folau@example.com"}

# del - remove a specific key (raises KeyError if missing)
del user["email"]
print(user)
# {'name': 'Folau', 'age': 30, 'city': 'Salt Lake City'}

# pop() - remove and return the value (with optional default)
age = user.pop("age")
print(age)    # 30
print(user)   # {'name': 'Folau', 'city': 'Salt Lake City'}

# pop() with default avoids KeyError
missing = user.pop("phone", "not found")
print(missing)  # not found

# popitem() - remove and return the last inserted key-value pair
user["role"] = "developer"
user["level"] = "senior"
last = user.popitem()
print(last)   # ('level', 'senior')
print(user)   # {'name': 'Folau', 'city': 'Salt Lake City', 'role': 'developer'}

# clear() - remove all entries
user.clear()
print(user)   # {}

Iterating Over Dictionaries

Dictionaries support several iteration patterns. The default behavior iterates over keys.

user = {"name": "Folau", "age": 30, "city": "Salt Lake City"}

# Iterate over keys (default)
for key in user:
    print(key)
# name
# age
# city

# Iterate over values
for value in user.values():
    print(value)
# Folau
# 30
# Salt Lake City

# Iterate over key-value pairs (most common)
for key, value in user.items():
    print(f"{key}: {value}")
# name: Folau
# age: 30
# city: Salt Lake City

# With enumerate (when you also need an index)
for index, (key, value) in enumerate(user.items()):
    print(f"{index}. {key} = {value}")
# 0. name = Folau
# 1. age = 30
# 2. city = Salt Lake City

Dictionary Comprehensions

Dictionary comprehensions let you build dictionaries in a single expression, similar to list comprehensions. They are concise, readable, and often more performant than building a dict with a loop.

Basic comprehension

# Square numbers
squares = {n: n ** 2 for n in range(1, 8)}
print(squares)
# {1: 1, 2: 4, 3: 9, 4: 16, 5: 25, 6: 36, 7: 49}

# Swap keys and values
original = {"a": 1, "b": 2, "c": 3}
flipped = {v: k for k, v in original.items()}
print(flipped)
# {1: 'a', 2: 'b', 3: 'c'}

With conditions

scores = {"Alice": 92, "Bob": 67, "Charlie": 85, "Diana": 45, "Eve": 78}

# Only passing scores (>= 70)
passing = {name: score for name, score in scores.items() if score >= 70}
print(passing)
# {'Alice': 92, 'Charlie': 85, 'Eve': 78}

# Categorize scores
grades = {
    name: ("A" if score >= 90 else "B" if score >= 80 else "C" if score >= 70 else "F")
    for name, score in scores.items()
}
print(grades)
# {'Alice': 'A', 'Bob': 'F', 'Charlie': 'B', 'Diana': 'F', 'Eve': 'C'}

Nested comprehension

# Multiplication table as a nested dict
table = {
    i: {j: i * j for j in range(1, 4)}
    for i in range(1, 4)
}
print(table)
# {1: {1: 1, 2: 2, 3: 3}, 2: {1: 2, 2: 4, 3: 6}, 3: {1: 3, 2: 6, 3: 9}}
print(table[2][3])  # 6

Nested Dictionaries

Dictionaries can contain other dictionaries as values, creating a tree-like structure. This is the natural way to represent JSON data, configuration files, and hierarchical records in Python.

# A nested structure representing company data (JSON-like)
company = {
    "name": "Tech Corp",
    "founded": 2015,
    "departments": {
        "engineering": {
            "head": "Folau",
            "team_size": 25,
            "technologies": ["Python", "Java", "AWS"]
        },
        "marketing": {
            "head": "Sarah",
            "team_size": 10,
            "budget": 500000
        }
    },
    "locations": [
        {"city": "Salt Lake City", "is_hq": True},
        {"city": "San Francisco", "is_hq": False}
    ]
}

# Accessing nested data
print(company["departments"]["engineering"]["head"])          # Folau
print(company["departments"]["engineering"]["technologies"])  # ['Python', 'Java', 'AWS']
print(company["locations"][0]["city"])                        # Salt Lake City

# Safe nested access with get()
budget = company.get("departments", {}).get("sales", {}).get("budget", 0)
print(budget)  # 0 (no KeyError even though 'sales' does not exist)

For deeply nested structures, chaining get() calls is a common defensive pattern. Each call returns an empty dict if the key is missing, so the next get() still works without raising an error.

DefaultDict, OrderedDict, and Counter

The collections module provides specialized dictionary subclasses that handle common patterns more elegantly than a plain dict.

defaultdict — auto-initialize missing keys

from collections import defaultdict

# With a regular dict, you must check if a key exists before appending
groups = {}
words = ["apple", "banana", "avocado", "blueberry", "cherry", "apricot"]
for word in words:
    first_letter = word[0]
    if first_letter not in groups:
        groups[first_letter] = []
    groups[first_letter].append(word)

# With defaultdict, the factory function handles initialization
groups = defaultdict(list)
for word in words:
    groups[word[0]].append(word)

print(dict(groups))
# {'a': ['apple', 'avocado', 'apricot'], 'b': ['banana', 'blueberry'], 'c': ['cherry']}

# defaultdict with int (perfect for counting)
word_count = defaultdict(int)
for word in ["apple", "banana", "apple", "cherry", "banana", "apple"]:
    word_count[word] += 1

print(dict(word_count))
# {'apple': 3, 'banana': 2, 'cherry': 1}

OrderedDict — dictionary with guaranteed order

from collections import OrderedDict

# Since Python 3.7, regular dicts preserve insertion order.
# OrderedDict is still useful for two reasons:
# 1. It supports move_to_end() and popitem(last=False)
# 2. Order matters in equality comparison

od = OrderedDict()
od["first"] = 1
od["second"] = 2
od["third"] = 3

# Move an item to the end
od.move_to_end("first")
print(list(od.keys()))  # ['second', 'third', 'first']

# Move to the beginning
od.move_to_end("third", last=False)
print(list(od.keys()))  # ['third', 'second', 'first']

# Pop from the front (FIFO behavior)
od.popitem(last=False)  # Removes 'third'
print(list(od.keys()))  # ['second', 'first']

# Equality comparison considers order
dict1 = OrderedDict(a=1, b=2)
dict2 = OrderedDict(b=2, a=1)
print(dict1 == dict2)  # False (order differs)

# Regular dicts ignore order in comparison
print({"a": 1, "b": 2} == {"b": 2, "a": 1})  # True

Counter — count occurrences effortlessly

from collections import Counter

# Count elements in a list
fruits = ["apple", "banana", "apple", "cherry", "banana", "apple", "date"]
fruit_count = Counter(fruits)
print(fruit_count)
# Counter({'apple': 3, 'banana': 2, 'cherry': 1, 'date': 1})

# Most common elements
print(fruit_count.most_common(2))
# [('apple', 3), ('banana', 2)]

# Count characters in a string
char_count = Counter("mississippi")
print(char_count)
# Counter({'s': 4, 'i': 4, 'p': 2, 'm': 1})

# Arithmetic with Counters
inventory = Counter(apples=5, oranges=3, bananas=2)
sold = Counter(apples=2, oranges=1)
remaining = inventory - sold
print(remaining)
# Counter({'apples': 3, 'oranges': 2, 'bananas': 2})

# Combine inventories
new_stock = Counter(apples=10, grapes=5)
total = remaining + new_stock
print(total)
# Counter({'apples': 13, 'grapes': 5, 'oranges': 2, 'bananas': 2})

Practical Dictionary Examples

Word frequency counter

def word_frequency(text):
    """
    Count the frequency of each word in a text.
    Returns a dictionary sorted by frequency (descending).
    """
    # Normalize: lowercase and split on whitespace
    words = text.lower().split()
    # Remove punctuation from each word
    cleaned = [word.strip(".,!?;:\"'()") for word in words]
    # Count using a dict comprehension on Counter
    from collections import Counter
    counts = Counter(cleaned)
    # Sort by frequency
    return dict(counts.most_common())

sample = """Python is great. Python is powerful.
Python is used by developers who love Python."""

result = word_frequency(sample)
for word, count in result.items():
    print(f"  {word}: {count}")
# python: 4
# is: 3
# great: 1
# powerful: 1
# used: 1
# by: 1
# developers: 1
# who: 1
# love: 1

Configuration manager

class ConfigManager:
    """
    A simple configuration manager that supports defaults,
    environment-specific overrides, and dot-notation-style access.
    """

    def __init__(self, defaults=None):
        self._config = defaults.copy() if defaults else {}

    def load_env(self, env_name, overrides):
        """Apply environment-specific overrides."""
        self._config["environment"] = env_name
        self._config.update(overrides)

    def get(self, key, default=None):
        """Retrieve a config value with an optional default."""
        keys = key.split(".")
        value = self._config
        for k in keys:
            if isinstance(value, dict):
                value = value.get(k)
            else:
                return default
            if value is None:
                return default
        return value

    def set(self, key, value):
        """Set a config value."""
        self._config[key] = value

    def to_dict(self):
        return self._config.copy()


# Usage
defaults = {
    "app_name": "MyApp",
    "debug": False,
    "database": {
        "host": "localhost",
        "port": 5432,
        "name": "myapp_db"
    },
    "cache_ttl": 300
}

config = ConfigManager(defaults)
config.load_env("production", {
    "debug": False,
    "database": {
        "host": "db.production.com",
        "port": 5432,
        "name": "myapp_prod"
    },
    "cache_ttl": 3600
})

print(config.get("app_name"))          # MyApp
print(config.get("database.host"))     # db.production.com
print(config.get("missing_key", 42))   # 42
print(config.get("environment"))       # production

Caching with memoization

import time

def memoize(func):
    """
    A simple memoization decorator using a dictionary cache.
    Caches results of expensive function calls.
    """
    cache = {}

    def wrapper(*args):
        if args in cache:
            print(f"  Cache hit for {args}")
            return cache[args]
        print(f"  Computing result for {args}")
        result = func(*args)
        cache[args] = result
        return result

    wrapper.cache = cache  # Expose cache for inspection
    return wrapper


@memoize
def expensive_computation(n):
    """Simulate an expensive operation."""
    time.sleep(0.1)  # Simulate delay
    return n ** 3 + n ** 2 + n + 1


# First call - computes and caches
result1 = expensive_computation(10)
print(f"Result: {result1}")  # Result: 1111

# Second call - returns from cache instantly
result2 = expensive_computation(10)
print(f"Result: {result2}")  # Result: 1111

# Different argument - computes and caches
result3 = expensive_computation(5)
print(f"Result: {result3}")  # Result: 156

# Inspect the cache
print(f"Cache contents: {expensive_computation.cache}")
# Cache contents: {(10,): 1111, (5,): 156}

For production code, Python provides functools.lru_cache which handles this pattern with additional features like a maximum cache size and thread safety.

Part 2: Sets

Introduction to Sets

A set is an unordered collection of unique, hashable elements. Sets are implemented as hash tables (like dictionary keys without values), which gives them O(1) average time for membership testing, insertion, and deletion. If you need to check whether something is “in” a collection, a set is almost always the right choice — it is dramatically faster than scanning a list.

Sets are ideal when you need to:

  • Eliminate duplicate entries from a collection
  • Perform mathematical set operations (union, intersection, difference)
  • Test membership efficiently
  • Find common or unique elements between collections

Creating Sets

Sets can be created using curly braces or the set() constructor.

# Literal syntax with curly braces
fruits = {"apple", "banana", "cherry"}
print(fruits)       # {'cherry', 'apple', 'banana'} (order may vary)
print(type(fruits)) # <class 'set'>

# Using the set() constructor
numbers = set([1, 2, 3, 4, 5])
print(numbers)  # {1, 2, 3, 4, 5}

# From a string (each character becomes an element)
letters = set("hello")
print(letters)  # {'h', 'e', 'l', 'o'} (duplicates removed)

# IMPORTANT: empty set must use set(), not {}
empty_set = set()     # Correct: empty set
empty_dict = {}       # This is an empty DICTIONARY, not a set!
print(type(empty_set))   # <class 'set'>
print(type(empty_dict))  # <class 'dict'>

Deduplication — removing duplicates from a list

# The simplest way to remove duplicates
numbers = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5]
unique = list(set(numbers))
print(unique)  # [1, 2, 3, 4, 5, 6, 9] (order not preserved)

# To preserve original order (Python 3.7+)
def deduplicate(items):
    """Remove duplicates while preserving insertion order."""
    seen = set()
    result = []
    for item in items:
        if item not in seen:
            seen.add(item)
            result.append(item)
    return result

print(deduplicate(numbers))  # [3, 1, 4, 5, 9, 2, 6]

# Or use dict.fromkeys() (preserves order, Python 3.7+)
print(list(dict.fromkeys(numbers)))  # [3, 1, 4, 5, 9, 2, 6]

Set comprehension

# Create a set using comprehension syntax
even_squares = {x ** 2 for x in range(1, 11) if x % 2 == 0}
print(even_squares)  # {4, 16, 36, 64, 100}

Set Operations

Sets support all the standard mathematical set operations. Each operation is available as both a method and an operator.

python_devs = {"Alice", "Bob", "Charlie", "Diana"}
java_devs = {"Bob", "Diana", "Eve", "Frank"}

# UNION - all elements from both sets
# Method: .union() | Operator: |
all_devs = python_devs.union(java_devs)
print(all_devs)
# {'Alice', 'Bob', 'Charlie', 'Diana', 'Eve', 'Frank'}

all_devs = python_devs | java_devs  # Same result

# INTERSECTION - elements in both sets
# Method: .intersection() | Operator: &
both = python_devs.intersection(java_devs)
print(both)  # {'Bob', 'Diana'}

both = python_devs & java_devs  # Same result

# DIFFERENCE - elements in first set but not in second
# Method: .difference() | Operator: -
python_only = python_devs.difference(java_devs)
print(python_only)  # {'Alice', 'Charlie'}

python_only = python_devs - java_devs  # Same result

java_only = java_devs - python_devs
print(java_only)  # {'Eve', 'Frank'}

# SYMMETRIC DIFFERENCE - elements in either set but not both
# Method: .symmetric_difference() | Operator: ^
exclusive = python_devs.symmetric_difference(java_devs)
print(exclusive)  # {'Alice', 'Charlie', 'Eve', 'Frank'}

exclusive = python_devs ^ java_devs  # Same result

The operator forms (|, &, -, ^) require both operands to be sets. The method forms accept any iterable as the argument, which can be more flexible.

# Method accepts any iterable
my_set = {1, 2, 3}
result = my_set.union([4, 5, 6])        # Works with a list
print(result)  # {1, 2, 3, 4, 5, 6}

# Operator requires a set
# my_set | [4, 5, 6]  # TypeError: unsupported operand type(s)
result = my_set | set([4, 5, 6])         # Must convert to set first

Set Methods

Beyond set operations, sets provide methods for adding, removing, and testing relationships between sets.

skills = {"Python", "Java", "SQL"}

# add() - add a single element
skills.add("Docker")
print(skills)  # {'Python', 'Java', 'SQL', 'Docker'}

# Adding a duplicate has no effect
skills.add("Python")
print(skills)  # {'Python', 'Java', 'SQL', 'Docker'}

# remove() - remove an element (raises KeyError if missing)
skills.remove("Java")
print(skills)  # {'Python', 'SQL', 'Docker'}
# skills.remove("Go")  # KeyError: 'Go'

# discard() - remove an element (NO error if missing)
skills.discard("Go")      # No error
skills.discard("Docker")  # Removes Docker
print(skills)  # {'Python', 'SQL'}

# pop() - remove and return an arbitrary element
skills = {"Python", "Java", "SQL", "Docker"}
removed = skills.pop()
print(f"Removed: {removed}")  # Removed: (arbitrary element)

# clear() - remove all elements
skills.clear()
print(skills)  # set()

Subset and superset testing

backend_skills = {"Python", "Java", "SQL", "Docker", "AWS"}
my_skills = {"Python", "SQL"}

# issubset() - is every element of my_skills in backend_skills?
print(my_skills.issubset(backend_skills))    # True
print(my_skills <= backend_skills)            # True (operator form)

# issuperset() - does backend_skills contain all of my_skills?
print(backend_skills.issuperset(my_skills))  # True
print(backend_skills >= my_skills)            # True (operator form)

# Proper subset (subset but not equal)
print(my_skills < backend_skills)  # True
print(backend_skills < backend_skills)  # False (equal, not proper subset)

# isdisjoint() - do the sets share NO elements?
frontend = {"React", "CSS", "JavaScript"}
print(frontend.isdisjoint(backend_skills))  # True (no overlap)
print(my_skills.isdisjoint(backend_skills)) # False (overlap exists)

Frozen Sets

A frozenset is an immutable version of a set. Once created, you cannot add or remove elements. Because frozensets are immutable and hashable, they can be used as dictionary keys or as elements of another set — something regular sets cannot do.

# Create a frozenset
immutable_skills = frozenset(["Python", "Java", "SQL"])
print(immutable_skills)  # frozenset({'Python', 'Java', 'SQL'})

# All read operations work
print("Python" in immutable_skills)  # True
print(len(immutable_skills))          # 3

# Set operations return new frozensets
more_skills = frozenset(["Docker", "Python"])
combined = immutable_skills | more_skills
print(combined)  # frozenset({'Python', 'Java', 'SQL', 'Docker'})

# Mutation is not allowed
# immutable_skills.add("Go")     # AttributeError
# immutable_skills.remove("SQL") # AttributeError

# Use as dictionary keys (regular sets cannot do this)
permissions = {
    frozenset(["read"]): "viewer",
    frozenset(["read", "write"]): "editor",
    frozenset(["read", "write", "admin"]): "admin"
}

user_perms = frozenset(["read", "write"])
print(permissions[user_perms])  # editor

# Use as elements of another set
set_of_sets = {frozenset([1, 2]), frozenset([3, 4])}
print(set_of_sets)  # {frozenset({1, 2}), frozenset({3, 4})}

Use frozenset when you need a set that should never change after creation — for example, representing a fixed set of permissions, a cache key based on a combination of values, or a constant lookup table.

Practical Set Examples

Remove duplicates while tracking what was removed

def find_duplicates(items):
    """
    Find and return duplicate items from a list.
    Returns a tuple of (unique_items, duplicates).
    """
    seen = set()
    duplicates = set()
    for item in items:
        if item in seen:
            duplicates.add(item)
        else:
            seen.add(item)
    return list(seen), list(duplicates)

names = ["Alice", "Bob", "Charlie", "Alice", "Diana", "Bob", "Eve", "Alice"]
unique, dupes = find_duplicates(names)
print(f"Unique: {unique}")
print(f"Duplicates: {dupes}")
# Unique: ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve']
# Duplicates: ['Alice', 'Bob']

Find common elements across multiple collections

def common_elements(*collections):
    """
    Find elements common to all provided collections.
    Accepts any number of iterables.
    """
    if not collections:
        return set()
    result = set(collections[0])
    for collection in collections[1:]:
        result &= set(collection)
    return result

team_a_skills = ["Python", "Java", "SQL", "Docker"]
team_b_skills = ["Python", "Go", "SQL", "Kubernetes"]
team_c_skills = ["Python", "Rust", "SQL", "AWS"]

shared = common_elements(team_a_skills, team_b_skills, team_c_skills)
print(f"Skills all teams share: {shared}")
# Skills all teams share: {'Python', 'SQL'}

Membership testing performance

import time

# Build a large dataset
data_list = list(range(1_000_000))
data_set = set(data_list)

target = 999_999  # Worst case for a list (last element)

# List lookup - O(n)
start = time.perf_counter()
for _ in range(1000):
    _ = target in data_list
list_time = time.perf_counter() - start

# Set lookup - O(1)
start = time.perf_counter()
for _ in range(1000):
    _ = target in data_set
set_time = time.perf_counter() - start

print(f"List lookup (1000x): {list_time:.4f}s")
print(f"Set lookup  (1000x): {set_time:.6f}s")
print(f"Set is ~{list_time / set_time:.0f}x faster")
# Typical output:
# List lookup (1000x): 8.1234s
# Set lookup  (1000x): 0.000045s
# Set is ~180000x faster

This performance difference is exactly why you should convert a list to a set when you need to check membership repeatedly. The conversion itself is O(n), but each lookup after that is O(1).

Shared Topics

Common Pitfalls

1. Empty dict vs empty set — the {} trap

# This is a dict, NOT a set!
empty = {}
print(type(empty))  # <class 'dict'>

# To create an empty set, you must use set()
empty_set = set()
print(type(empty_set))  # <class 'set'>

2. Unhashable types as dictionary keys or set elements

# Lists and dicts are mutable, so they are NOT hashable
# my_dict = {[1, 2, 3]: "value"}   # TypeError: unhashable type: 'list'
# my_set = {[1, 2], [3, 4]}         # TypeError: unhashable type: 'list'

# Use tuples instead (they are immutable and hashable)
my_dict = {(1, 2, 3): "value"}     # Works
my_set = {(1, 2), (3, 4)}          # Works
print(my_dict[(1, 2, 3)])  # value

# But tuples containing mutable objects are also unhashable
# bad = {([1, 2], [3, 4]): "value"}  # TypeError: unhashable type: 'list'

3. Dictionary ordering assumptions

# Since Python 3.7, dicts preserve INSERTION order.
# But do not assume dicts are sorted by key or value.
d = {"banana": 2, "apple": 1, "cherry": 3}
print(list(d.keys()))  # ['banana', 'apple', 'cherry'] (insertion order)

# If you need sorted keys, sort explicitly
for key in sorted(d.keys()):
    print(f"{key}: {d[key]}")
# apple: 1
# banana: 2
# cherry: 3

4. Modifying a dict while iterating over it

scores = {"Alice": 90, "Bob": 45, "Charlie": 72, "Diana": 38}

# BAD: modifying during iteration causes RuntimeError
# for name, score in scores.items():
#     if score < 50:
#         del scores[name]  # RuntimeError: dictionary changed size during iteration

# GOOD: collect keys to remove, then delete
to_remove = [name for name, score in scores.items() if score < 50]
for name in to_remove:
    del scores[name]
print(scores)  # {'Alice': 90, 'Charlie': 72}

# Or use a dict comprehension to create a new dict
scores = {"Alice": 90, "Bob": 45, "Charlie": 72, "Diana": 38}
passing = {k: v for k, v in scores.items() if v >= 50}
print(passing)  # {'Alice': 90, 'Charlie': 72}

Best Practices

1. Use get() with defaults instead of bracket notation

When a missing key is a normal possibility (not an error), use get() to avoid try/except blocks or pre-checks with in.

# Instead of this:
if "email" in user:
    email = user["email"]
else:
    email = "not provided"

# Do this:
email = user.get("email", "not provided")

2. Use dict comprehensions over manual loops

# Instead of this:
result = {}
for key, value in data.items():
    if value > 0:
        result[key] = value * 2

# Do this:
result = {k: v * 2 for k, v in data.items() if v > 0}

3. Use sets for membership testing

If you check if x in collection inside a loop, convert collection to a set first. The speedup can be orders of magnitude on large datasets.

# Instead of searching a list:
valid_codes = ["US", "CA", "MX", "UK", "DE", "FR", "JP"]  # O(n) per lookup

# Use a set:
valid_codes = {"US", "CA", "MX", "UK", "DE", "FR", "JP"}  # O(1) per lookup

if country_code in valid_codes:
    process(country_code)

4. Use defaultdict or setdefault to avoid key-existence checks

from collections import defaultdict

# Instead of:
groups = {}
for item in items:
    key = item["category"]
    if key not in groups:
        groups[key] = []
    groups[key].append(item)

# Do this:
groups = defaultdict(list)
for item in items:
    groups[item["category"]].append(item)

5. Use the merge operator for combining dicts (Python 3.9+)

# Clean, readable dict merging
defaults = {"timeout": 30, "retries": 3, "verbose": False}
user_config = {"timeout": 60, "verbose": True}

final = defaults | user_config  # user_config wins on conflicts

6. Prefer frozenset when immutability is needed

If a set should not change after creation, use frozenset. This communicates intent, prevents accidental modification, and enables use as a dict key or set element.

Key Takeaways

  • Dictionaries store key-value pairs with O(1) average lookup, insertion, and deletion. They are the go-to structure for mappings, lookups, and structured data.
  • Create dicts with literals {}, dict(), comprehensions, or fromkeys(). Use whichever is most readable for your situation.
  • Always prefer get() with a default over bracket notation when a missing key is a normal case, not an error.
  • update() and the |= operator (Python 3.9+) merge dictionaries. The right-hand side wins on key conflicts.
  • defaultdict, Counter, and OrderedDict from the collections module handle specialized patterns more cleanly than a plain dict.
  • Dictionary comprehensions are concise, readable, and usually faster than manual loops.
  • Sets store unique, hashable elements with O(1) membership testing. Use them when duplicates are not allowed or when you need fast "is this in the collection?" checks.
  • Set operations (|, &, -, ^) correspond to union, intersection, difference, and symmetric difference. Both operator and method forms are available.
  • frozenset is an immutable set that can be used as a dictionary key or an element of another set.
  • Remember: {} creates an empty dict, not a set. Use set() for an empty set.
  • Never use mutable objects (lists, dicts, sets) as dictionary keys or set elements — use their immutable counterparts (tuples, frozensets) instead.
  • Convert lists to sets when you need repeated membership checks — the performance difference is dramatic.
  • Never modify a dictionary or set while iterating over it — collect changes first, then apply them.
March 11, 2021