Python – File(read/write)

Every meaningful application eventually needs to interact with files. Whether you are reading configuration, writing logs, processing CSV reports, or persisting user data as JSON, file I/O is one of the fundamental skills that separates hobby scripts from production-grade software. Python makes file operations straightforward, but doing them correctly—handling encodings, closing resources, and dealing with platform differences—requires a disciplined approach.

In this guide we will work through Python’s file I/O capabilities from the ground up: opening and closing files, reading and writing text, working with structured formats like CSV and JSON, managing file paths properly, and building practical utilities you can use in real projects.


How Files Work

At the operating-system level, a file is a contiguous sequence of bytes stored on disk. Those bytes are organized according to a format specification—plain text, CSV, JSON, PNG, and so on. Every file has three conceptual parts:

  • Header – metadata such as file name, size, type, and permissions
  • Data – the actual content
  • End of File (EOF) – a marker that signals where the content ends

What the bytes represent depends on the format. A .txt file stores human-readable characters; a .png stores pixel data in a binary layout. Python gives us a unified API to handle both text and binary files.


Opening Files: The open() Function

All file operations begin with open(). Its signature is:

file_object = open(file_path, mode="r", encoding=None)

The mode string tells Python what you intend to do with the file:

Mode Description
"r" Read (default). File must exist.
"w" Write. Creates the file if it does not exist; truncates (empties) the file if it does.
"a" Append. Creates the file if it does not exist; writes are added to the end.
"r+" Read and write. File must exist.
"x" Exclusive creation. Fails if the file already exists.
"b" Binary mode (combine with others, e.g., "rb", "wb").
"t" Text mode (default; combine with others, e.g., "rt").

You can combine modes: "rb" reads a binary file, "w+" opens for both writing and reading (truncating first), and so on.

# Open for reading (default mode)
f = open("data.txt")

# Open for writing with explicit UTF-8 encoding
f = open("output.txt", "w", encoding="utf-8")

# Open a binary file for reading
f = open("image.png", "rb")

The with Statement (Context Managers)

This is the correct way to work with files in Python. The with statement guarantees that the file is closed when the block exits, even if an exception is raised. Forgetting to close a file can cause data loss (buffered writes never flushed), resource leaks, and locking issues on Windows.

# Always prefer this pattern
with open("data.txt", "r", encoding="utf-8") as f:
    contents = f.read()
    # f is automatically closed when we leave this block

# The manual alternative (avoid this)
f = open("data.txt", "r", encoding="utf-8")
try:
    contents = f.read()
finally:
    f.close()

Every file example in the rest of this tutorial uses the with statement. If you take away only one thing from this post, let it be this: always use with when opening files.


Reading Files

Python provides several methods for reading file content. Which one you choose depends on the file size and how you want to process the data.

read() – Read Entire File

with open("notes.txt", "r", encoding="utf-8") as f:
    content = f.read()       # entire file as a single string
    print(content)

Use this for small files where loading everything into memory is acceptable.

read(size) – Read a Fixed Number of Characters

with open("large_file.txt", "r", encoding="utf-8") as f:
    chunk = f.read(1024)     # read first 1024 characters
    print(chunk)

readline() – Read One Line at a Time

with open("notes.txt", "r", encoding="utf-8") as f:
    first_line = f.readline()    # includes the trailing newline
    second_line = f.readline()
    print(first_line.strip())
    print(second_line.strip())

readlines() – Read All Lines into a List

with open("notes.txt", "r", encoding="utf-8") as f:
    lines = f.readlines()    # list of strings, each ending with \n
    for line in lines:
        print(line.strip())

Iterating Line by Line (Memory-Efficient)

For large files, iterating directly over the file object is the best approach. Python reads one line at a time, keeping memory usage low.

with open("server.log", "r", encoding="utf-8") as f:
    for line_number, line in enumerate(f, start=1):
        print(f"{line_number}: {line.strip()}")

Writing Files

write() – Write a String

with open("output.txt", "w", encoding="utf-8") as f:
    f.write("Line one\n")
    f.write("Line two\n")
    f.write("Line three\n")

Remember: "w" mode truncates the file first. Every time you open with "w", you are starting fresh.

writelines() – Write a List of Strings

lines = ["First\n", "Second\n", "Third\n"]

with open("output.txt", "w", encoding="utf-8") as f:
    f.writelines(lines)     # does NOT add newlines automatically

Note that writelines() does not insert line separators for you. Each string must already include \n if you want separate lines.


Appending to Files

Appending adds content to the end of an existing file without erasing what is already there.

with open("log.txt", "a", encoding="utf-8") as f:
    f.write("2024-01-15 10:30:00 INFO Application started\n")
    f.write("2024-01-15 10:30:05 INFO Connected to database\n")

If the file does not exist, "a" mode creates it.


Working with File Paths

Hard-coding file paths as strings is fragile. Paths differ between operating systems (forward slashes on Linux/macOS, backslashes on Windows). Python offers two approaches, but pathlib is the modern, recommended choice.

pathlib.Path (Recommended)

from pathlib import Path

# Build paths in a cross-platform way
base = Path.home() / "projects" / "myapp"
config_path = base / "config" / "settings.json"

print(config_path)              # /home/user/projects/myapp/config/settings.json
print(config_path.exists())     # True or False
print(config_path.suffix)       # .json
print(config_path.stem)         # settings
print(config_path.parent)       # /home/user/projects/myapp/config

# Read and write shortcuts
text = config_path.read_text(encoding="utf-8")
config_path.write_text("new content", encoding="utf-8")

# Iterate over files in a directory
for py_file in base.glob("**/*.py"):
    print(py_file)

os.path (Legacy Approach)

import os

# Join paths
config_path = os.path.join(os.path.expanduser("~"), "projects", "myapp", "settings.json")

# Common checks
print(os.path.exists(config_path))
print(os.path.isfile(config_path))
print(os.path.isdir(os.path.dirname(config_path)))
print(os.path.splitext(config_path))   # ('/path/to/settings', '.json')

Prefer pathlib for new code. It provides an object-oriented API that is easier to read, compose, and debug.


Reading and Writing CSV Files

CSV (Comma-Separated Values) is one of the most common data exchange formats. Python’s built-in csv module handles quoting, escaping, and different delimiters correctly.

Writing CSV

import csv

employees = [
    ["Name", "Department", "Salary"],
    ["Alice", "Engineering", 95000],
    ["Bob", "Marketing", 72000],
    ["Carol", "Engineering", 110000],
]

with open("employees.csv", "w", newline="", encoding="utf-8") as f:
    writer = csv.writer(f)
    writer.writerows(employees)

The newline="" parameter prevents the csv module from adding extra blank lines on Windows.

Reading CSV

import csv

with open("employees.csv", "r", encoding="utf-8") as f:
    reader = csv.reader(f)
    header = next(reader)          # skip header row
    print(f"Columns: {header}")

    for row in reader:
        name, department, salary = row
        print(f"{name} works in {department}, earns ${salary}")

Using DictReader and DictWriter

For more readable code, use dictionary-based readers and writers:

import csv

# Writing with DictWriter
employees = [
    {"name": "Alice", "department": "Engineering", "salary": 95000},
    {"name": "Bob", "department": "Marketing", "salary": 72000},
]

with open("employees.csv", "w", newline="", encoding="utf-8") as f:
    fieldnames = ["name", "department", "salary"]
    writer = csv.DictWriter(f, fieldnames=fieldnames)
    writer.writeheader()
    writer.writerows(employees)

# Reading with DictReader
with open("employees.csv", "r", encoding="utf-8") as f:
    reader = csv.DictReader(f)
    for row in reader:
        print(f"{row['name']}: ${row['salary']}")

Reading and Writing JSON Files

JSON is the standard format for configuration files, API responses, and structured data storage. Python’s json module handles serialization and deserialization.

Writing JSON

import json

config = {
    "app_name": "DataProcessor",
    "version": "2.1.0",
    "database": {
        "host": "localhost",
        "port": 5432,
        "name": "analytics"
    },
    "features": ["logging", "caching", "retry"],
    "debug": False
}

with open("config.json", "w", encoding="utf-8") as f:
    json.dump(config, f, indent=4, ensure_ascii=False)
    # indent=4 makes the output human-readable

Reading JSON

import json

with open("config.json", "r", encoding="utf-8") as f:
    config = json.load(f)

print(config["app_name"])            # DataProcessor
print(config["database"]["host"])    # localhost
print(config["features"])            # ['logging', 'caching', 'retry']

Tip: Use json.dumps() (with an “s”) to serialize to a string instead of writing directly to a file. Use json.loads() to parse a JSON string.


Binary Files

When working with images, audio, PDFs, or any non-text file, open in binary mode by adding "b" to the mode string. In binary mode, you read and write bytes objects instead of strings.

# Copy a binary file
with open("photo.jpg", "rb") as source:
    data = source.read()

with open("photo_backup.jpg", "wb") as dest:
    dest.write(data)

print(f"Copied {len(data)} bytes")

For large binary files, read and write in chunks to avoid loading the entire file into memory:

def copy_file_chunked(source_path, dest_path, chunk_size=8192):
    """Copy a file in chunks to handle large files efficiently."""
    with open(source_path, "rb") as src, open(dest_path, "wb") as dst:
        while True:
            chunk = src.read(chunk_size)
            if not chunk:
                break
            dst.write(chunk)

copy_file_chunked("large_video.mp4", "large_video_backup.mp4")

File and Directory Operations

The os and pathlib modules provide functions for managing files and directories beyond reading and writing.

import os
from pathlib import Path

# --- Check existence ---
print(os.path.exists("data.txt"))         # True/False
print(Path("data.txt").exists())          # True/False

# --- Create directories ---
os.makedirs("output/reports/2024", exist_ok=True)    # creates parent dirs too
Path("output/logs").mkdir(parents=True, exist_ok=True)

# --- List directory contents ---
for item in os.listdir("output"):
    print(item)

# pathlib approach (more powerful)
for item in Path("output").iterdir():
    print(f"{item.name}  (is_file={item.is_file()}, is_dir={item.is_dir()})")

# --- Delete files and directories ---
os.remove("temp_file.txt")               # delete a file
os.rmdir("empty_directory")              # delete an empty directory

Path("temp_file.txt").unlink(missing_ok=True)   # delete, ignore if missing

# --- Rename / move ---
os.rename("old_name.txt", "new_name.txt")
Path("draft.txt").rename("final.txt")

# --- Get file information ---
stats = os.stat("data.txt")
print(f"Size: {stats.st_size} bytes")
print(f"Last modified: {stats.st_mtime}")

Character Encoding

Text files are stored as bytes, and an encoding defines how those bytes map to characters. The two most common encodings are ASCII (128 characters) and Unicode / UTF-8 (over 1.1 million characters). ASCII is a subset of UTF-8, so any ASCII file is also valid UTF-8.

Always specify the encoding explicitly. If you omit it, Python uses the platform default, which varies across systems and can cause subtle bugs.

# Explicit encoding prevents cross-platform surprises
with open("data.txt", "r", encoding="utf-8") as f:
    content = f.read()

# Handle files with unknown encoding gracefully
with open("legacy_data.txt", "r", encoding="utf-8", errors="replace") as f:
    content = f.read()    # replaces undecodable bytes with the ? character

Practical Examples

1. Log File Processor

This script reads a log file, parses each line, and produces a summary of log levels.

from collections import Counter
from pathlib import Path


def process_log_file(log_path):
    """Read a log file and summarize entries by log level."""
    level_counts = Counter()
    error_messages = []

    with open(log_path, "r", encoding="utf-8") as f:
        for line_num, line in enumerate(f, start=1):
            line = line.strip()
            if not line:
                continue

            # Expected format: "2024-01-15 10:30:00 INFO Some message"
            parts = line.split(maxsplit=3)
            if len(parts) < 4:
                continue

            date, time, level, message = parts
            level_counts[level] += 1

            if level == "ERROR":
                error_messages.append(f"  Line {line_num}: {message}")

    # Print summary
    print("=== Log Summary ===")
    total = sum(level_counts.values())
    for level, count in level_counts.most_common():
        percentage = (count / total) * 100
        print(f"  {level:8s}: {count:5d} ({percentage:.1f}%)")

    if error_messages:
        print(f"\n=== Errors ({len(error_messages)}) ===")
        for msg in error_messages[:10]:    # show first 10
            print(msg)

    return level_counts


# Usage
# process_log_file("application.log")

2. Config File Reader/Writer (JSON-Based)

A reusable configuration manager that loads defaults, merges user overrides, and persists changes.

import json
from pathlib import Path


class ConfigManager:
    """Manage application configuration stored as JSON."""

    DEFAULT_CONFIG = {
        "app_name": "MyApp",
        "log_level": "INFO",
        "max_retries": 3,
        "database": {
            "host": "localhost",
            "port": 5432,
        },
    }

    def __init__(self, config_path):
        self.path = Path(config_path)
        self.config = {}
        self.load()

    def load(self):
        """Load config from file, falling back to defaults."""
        if self.path.exists():
            with open(self.path, "r", encoding="utf-8") as f:
                user_config = json.load(f)
            # Merge: defaults first, then user overrides
            self.config = {**self.DEFAULT_CONFIG, **user_config}
        else:
            self.config = dict(self.DEFAULT_CONFIG)

    def save(self):
        """Persist current configuration to disk."""
        self.path.parent.mkdir(parents=True, exist_ok=True)
        with open(self.path, "w", encoding="utf-8") as f:
            json.dump(self.config, f, indent=4)

    def get(self, key, default=None):
        return self.config.get(key, default)

    def set(self, key, value):
        self.config[key] = value
        self.save()


# Usage
# cfg = ConfigManager("config/app_settings.json")
# print(cfg.get("log_level"))
# cfg.set("log_level", "DEBUG")

3. CSV Data Analyzer

Reads a CSV file, computes basic statistics, and writes a summary report.

import csv
from pathlib import Path


def analyze_csv(input_path, output_path):
    """Analyze a CSV file with a 'salary' column and write a report."""
    departments = {}

    with open(input_path, "r", encoding="utf-8") as f:
        reader = csv.DictReader(f)
        for row in reader:
            dept = row["department"]
            salary = float(row["salary"])
            departments.setdefault(dept, []).append(salary)

    # Build report lines
    report_lines = ["Department Analysis Report", "=" * 40, ""]

    for dept, salaries in sorted(departments.items()):
        avg = sum(salaries) / len(salaries)
        report_lines.append(f"Department: {dept}")
        report_lines.append(f"  Employees : {len(salaries)}")
        report_lines.append(f"  Avg Salary: ${avg:,.2f}")
        report_lines.append(f"  Min Salary: ${min(salaries):,.2f}")
        report_lines.append(f"  Max Salary: ${max(salaries):,.2f}")
        report_lines.append("")

    report_text = "\n".join(report_lines)
    print(report_text)

    # Write report to file
    Path(output_path).write_text(report_text, encoding="utf-8")
    print(f"\nReport saved to {output_path}")


# Usage
# analyze_csv("employees.csv", "salary_report.txt")

4. Simple File Backup Script

Copies files from a source directory to a timestamped backup folder.

import shutil
from datetime import datetime
from pathlib import Path


def backup_directory(source_dir, backup_root="backups"):
    """Create a timestamped backup of a directory."""
    source = Path(source_dir)
    if not source.is_dir():
        print(f"Error: {source_dir} is not a valid directory")
        return

    # Create timestamped backup folder
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    backup_dir = Path(backup_root) / f"{source.name}_{timestamp}"
    backup_dir.mkdir(parents=True, exist_ok=True)

    file_count = 0
    total_bytes = 0

    for file_path in source.rglob("*"):
        if file_path.is_file():
            # Preserve directory structure
            relative = file_path.relative_to(source)
            dest = backup_dir / relative
            dest.parent.mkdir(parents=True, exist_ok=True)

            shutil.copy2(file_path, dest)      # copy2 preserves metadata
            file_count += 1
            total_bytes += file_path.stat().st_size

    size_mb = total_bytes / (1024 * 1024)
    print(f"Backup complete: {file_count} files ({size_mb:.2f} MB)")
    print(f"Location: {backup_dir}")
    return backup_dir


# Usage
# backup_directory("my_project")

Common Pitfalls

  1. Forgetting to close files. Without with, an unclosed file can leak resources and leave buffered data unwritten. On Windows, open file handles can also prevent other processes from accessing the file. Always use the with statement.
  2. Encoding mismatches. Opening a UTF-8 file without specifying encoding="utf-8" works on some systems but fails on others where the default encoding differs. Always pass the encoding parameter explicitly.
  3. FileNotFoundError. Attempting to read a file that does not exist crashes your program. Validate paths with Path.exists() or wrap operations in try/except.
  4. Using "w" when you meant "a". Write mode truncates the file. If you want to add content to an existing file, use append mode.
  5. Platform-specific path separators. Hard-coding / or \ in paths breaks on other operating systems. Use pathlib.Path or os.path.join() instead.
  6. Not using newline="" with the csv module. On Windows, omitting this causes extra blank lines in the output file.

Best Practices

  • Always use the with statement for file operations. It is safer, cleaner, and eliminates an entire class of resource-leak bugs.
  • Specify encoding explicitly. Use encoding="utf-8" unless you have a specific reason to use something else.
  • Use pathlib for file paths. It produces clearer code than string manipulation and handles cross-platform differences automatically.
  • Handle exceptions. Wrap file operations in try/except blocks to handle FileNotFoundError, PermissionError, and IOError gracefully.
  • Process large files in chunks or line by line. Avoid read() on files that could be gigabytes in size.
  • Use the right mode. Understand the difference between "w" (truncate), "a" (append), and "r+" (read/write) to avoid accidental data loss.
  • Validate before writing. Check that the target directory exists (or create it) before writing files.

Key Takeaways

  1. File I/O is essential for data persistence—logs, configuration, reports, and data exchange all depend on it.
  2. The with statement is non-negotiable. It guarantees files are properly closed and is the standard pattern in professional Python code.
  3. pathlib.Path is the modern way to work with file paths. It is expressive, cross-platform, and part of the standard library.
  4. Python's csv and json modules handle the two most common structured file formats. Use them instead of parsing these formats manually.
  5. Always specify encoding="utf-8" when opening text files to avoid cross-platform surprises.
  6. For binary files, use "rb" / "wb" modes and process large files in chunks.
  7. Combine file I/O with exception handling and path validation to build robust, production-quality scripts.

 

Source code on Github




Subscribe To Our Newsletter
You will receive our latest post and tutorial.
Thank you for subscribing!

required
required


Leave a Reply

Your email address will not be published. Required fields are marked *