2.2.3 File Operations and Serialization

File reading/writing and serialization flowchart

Where this section fits

This section shows how program data can be saved and loaded again later. File reading and writing, CSV, JSON, and serialization are the foundation for dataset processing, training logs, configuration files, and saving model results. They are also a key step from temporary code in memory to real projects.

Learning objectives

Master basic file reading and writing operations (open, read, write)
Understand the role and benefits of the with statement
Learn how to handle common data formats such as CSV and JSON
Understand the concepts of serialization and deserialization

Why do we need file operations?

So far, the data in your programs has lived in memory — once the program closes, the data is gone. But in real-world scenarios:

Trained AI models need to be saved to a file so they can be loaded later
Datasets are stored in CSV files and need to be read into the program
Training logs need to be written to files for later analysis
Configuration parameters are stored in JSON files and need to be loaded at startup

File operations let your program persist data.

File I/O basics

Open a file: `open()`

# Basic syntax
file = open("file_path", "mode", encoding="encoding")

Common modes:

Mode	Meaning	When the file does not exist
`"r"`	Read (default)	Error
`"w"`	Write (overwrite)	Create automatically
`"a"`	Append (add to the end)	Create automatically
`"x"`	Create (error if file already exists)	Create automatically
`"rb"`	Read binary file	Error
`"wb"`	Write binary file	Create automatically

Write to a file

# Method 1: Manually open and close (not recommended)
file = open("hello.txt", "w", encoding="utf-8")
file.write("Hello, world!\n")
file.write("I am learning Python file operations.\n")
file.close()  # Don't forget to close the file!

# Method 2: Use the with statement (recommended!)
with open("hello.txt", "w", encoding="utf-8") as file:
    file.write("Hello, world!\n")
    file.write("I am learning Python file operations.\n")
# When you leave the with block, the file is closed automatically, so no manual close() is needed

Read a file

# Read the entire content
with open("hello.txt", "r", encoding="utf-8") as file:
    content = file.read()
    print(content)

# Read line by line
with open("hello.txt", "r", encoding="utf-8") as file:
    for line in file:
        print(line.strip())  # strip() removes the newline at the end of the line

# Read all lines into a list
with open("hello.txt", "r", encoding="utf-8") as file:
    lines = file.readlines()
    print(lines)  # ['Hello, world!\n', 'I am learning Python file operations.\n']

Append content

# "a" mode: append to the end of the file without overwriting existing content
with open("log.txt", "a", encoding="utf-8") as file:
    file.write("2026-02-09: Started learning\n")
    file.write("2026-02-09: Finished Chapter 1\n")

Write multiple lines

lines = ["Line 1\n", "Line 2\n", "Line 3\n"]

with open("output.txt", "w", encoding="utf-8") as file:
    file.writelines(lines)  # Write a list of strings

# Or use print to write to a file
with open("output.txt", "w", encoding="utf-8") as file:
    print("Line 1", file=file)  # print can direct output to a file
    print("Line 2", file=file)
    print("Line 3", file=file)

Real-world examples: working with different file formats

CSV files

CSV (Comma-Separated Values) is one of the most common data file formats:

import csv

# Write CSV
tasks = [
    ["Feature", "Owner", "Hours"],
    ["Login API", "Mina", 8],
    ["RAG demo", "Kai", 12],
    ["Chart view", "Noah", 5],
]

with open("tasks.csv", "w", newline="", encoding="utf-8") as file:
    writer = csv.writer(file)
    writer.writerows(tasks)

# Read CSV
with open("tasks.csv", "r", encoding="utf-8") as file:
    reader = csv.reader(file)
    header = next(reader)  # Read the header row
    print(f"Column names: {header}")

    for row in reader:
        feature, owner, hours = row
        print(f"{feature}, owner: {owner}, estimate: {hours} hours")

# Read as dictionaries (more convenient)
with open("tasks.csv", "r", encoding="utf-8") as file:
    reader = csv.DictReader(file)
    for row in reader:
        print(f"{row['Feature']} is owned by {row['Owner']}")

JSON files

JSON is the most common data format in web development and APIs:

import json

# Write JSON
config = {
    "model": "ResNet-50",
    "learning_rate": 0.001,
    "epochs": 100,
    "batch_size": 32,
    "classes": ["cat", "dog", "bird"],
    "use_gpu": True
}

with open("config.json", "w", encoding="utf-8") as file:
    json.dump(config, file, ensure_ascii=False, indent=2)

# Read JSON
with open("config.json", "r", encoding="utf-8") as file:
    loaded_config = json.load(file)

print(f"Model: {loaded_config['model']}")
print(f"Learning rate: {loaded_config['learning_rate']}")
print(f"Classes: {loaded_config['classes']}")

Generated config.json content:

{
  "model": "ResNet-50",
  "learning_rate": 0.001,
  "epochs": 100,
  "batch_size": 32,
  "classes": ["cat", "dog", "bird"],
  "use_gpu": true
}

Text log files

from datetime import datetime

def log(message, filename="app.log"):
    """Write a log entry"""
    timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
    with open(filename, "a", encoding="utf-8") as file:
        file.write(f"[{timestamp}] {message}\n")

# Use it
log("Program started")
log("Loaded dataset: train.csv")
log("Started training model")
log("Training complete, accuracy: 92.5%")

Generated log file:

[2026-02-09 14:30:01] Program started
[2026-02-09 14:30:02] Loaded dataset: train.csv
[2026-02-09 14:30:03] Started training model
[2026-02-09 14:35:15] Training complete, accuracy: 92.5%

Path handling: `pathlib`

pathlib is the recommended way to handle paths in Python 3. It is more modern and easier to use than os.path:

from pathlib import Path

# Create Path objects
data_dir = Path("data")
train_file = data_dir / "train" / "data.csv"  # Use / to join paths!
print(train_file)  # data/train/data.csv

# Check paths
print(train_file.exists())    # Whether the file exists
print(train_file.is_file())   # Whether it is a file
print(data_dir.is_dir())      # Whether it is a directory

# Get file information
path = Path("model.pth")
print(path.name)       # model.pth (file name)
print(path.stem)       # model (without extension)
print(path.suffix)     # .pth (extension)
print(path.parent)     # . (parent directory)

# Create directories
Path("output/results").mkdir(parents=True, exist_ok=True)

# List files in a directory
for file in Path(".").glob("*.py"):
    print(file)

# Recursively find all CSV files
for csv_file in Path("data").rglob("*.csv"):
    print(csv_file)

# Convenient file read/write methods
Path("note.txt").write_text("Hello!", encoding="utf-8")
content = Path("note.txt").read_text(encoding="utf-8")
print(content)  # Hello!

Serialization: saving Python objects

What is serialization?

Serialization means converting Python objects (lists, dictionaries, class instances, and so on) into a format that can be saved to a file. Deserialization means doing the reverse: restoring Python objects from a file.

Choose the format by what you want to save:

Need	Recommended format
Configuration, API response, small structured data	JSON with the `json` module
Rows and columns that can open in a spreadsheet	CSV with the `csv` module
A Python-only object that you fully trust	pickle with the `pickle` module

The important trade-off is safety. JSON and CSV are readable and safe for normal learning projects. pickle is fast and convenient, but it is binary and unsafe when the file comes from someone else.

`pickle`: save any Python object

import pickle

# Save Python object
data = {
    "hours": [2, 5, 1, 3],
    "features": ["Login API", "RAG demo", "Chart view", "Deploy script"],
    "metadata": {"module": "portfolio backend", "year": 2026}
}

with open("data.pkl", "wb") as file:  # Note: "wb" (binary write)
    pickle.dump(data, file)

# Load Python object
with open("data.pkl", "rb") as file:  # Note: "rb" (binary read)
    loaded_data = pickle.load(file)

print(loaded_data["features"])  # ['Login API', 'RAG demo', 'Chart view', 'Deploy script']

Comprehensive example: task log persistence system

import json
from pathlib import Path

class TaskLog:
    """Task work log with file persistence"""

    def __init__(self, filename="task_log.json"):
        self.filename = Path(filename)
        self.tasks = {}
        self.load()  # Load data at startup

    def load(self):
        """Load data from a file"""
        if self.filename.exists():
            with open(self.filename, "r", encoding="utf-8") as f:
                self.tasks = json.load(f)
            print(f"✅ Loaded data for {len(self.tasks)} tasks")
        else:
            print("📝 Creating a new task log")

    def save(self):
        """Save data to a file"""
        with open(self.filename, "w", encoding="utf-8") as f:
            json.dump(self.tasks, f, ensure_ascii=False, indent=2)

    def add_work(self, task_name, stage, hours):
        """Add a work-hour entry"""
        if task_name not in self.tasks:
            self.tasks[task_name] = {}
        self.tasks[task_name][stage] = hours
        self.save()
        print(f"✅ Saved {task_name}'s {stage} hours ({hours})")

    def get_report(self, task_name):
        """Get a task report"""
        if task_name not in self.tasks:
            print(f"❌ Task not found: {task_name}")
            return

        stages = self.tasks[task_name]
        print(f"\n{'='*30}")
        print(f"  {task_name} Work Report")
        print(f"{'='*30}")
        for stage, hours in stages.items():
            print(f"  {stage}: {hours} hours")
        total = sum(stages.values())
        print(f"{'─'*30}")
        print(f"  Total hours: {total:.1f}")
        print(f"{'='*30}")

    def export_csv(self, filename="task_hours.csv"):
        """Export as CSV"""
        import csv
        stages = set()
        for task_stages in self.tasks.values():
            stages.update(task_stages.keys())
        stages = sorted(stages)

        with open(filename, "w", newline="", encoding="utf-8") as f:
            writer = csv.writer(f)
            writer.writerow(["Task"] + stages)
            for task_name, task_stages in self.tasks.items():
                row = [task_name] + [task_stages.get(s, "") for s in stages]
                writer.writerow(row)
        print(f"✅ Exported to {filename}")

# Use it
log = TaskLog()
log.add_work("Login API", "design", 2)
log.add_work("Login API", "implementation", 5)
log.add_work("Login API", "tests", 1)
log.add_work("RAG demo", "implementation", 7)
log.add_work("RAG demo", "docs", 2)
log.get_report("Login API")
log.export_csv()

Hands-on exercises

Exercise 1: File statistics tool

from pathlib import Path

def file_stats(filename):
    """Return line, character, word, and longest-line statistics."""
    path = Path(filename)
    lines = path.read_text(encoding="utf-8").splitlines()
    longest_index, longest_line = max(
        enumerate(lines, start=1),
        key=lambda item: len(item[1]),
        default=(0, ""),
    )
    return {
        "lines": len(lines),
        "characters": sum(len(line) for line in lines),
        "words": sum(len(line.split()) for line in lines),
        "longest_line_number": longest_index,
        "longest_line": longest_line,
    }

Path("sample.txt").write_text("hello world\nthis is Python\n", encoding="utf-8")
print(file_stats("sample.txt"))

Exercise 2: Diary app

Write a simple diary app:

Support writing new diary entries (automatically add a timestamp)
Support viewing all diary entries
Store diary entries in a text file so data is not lost after the program closes

Exercise 3: Configuration file manager

import json
from pathlib import Path

DEFAULT_CONFIG = {"theme": "light", "language": "en", "page_size": 20}

def load_config(filename="config.json"):
    """Load a configuration file, or create a default config if it does not exist."""
    path = Path(filename)
    if not path.exists():
        save_config(DEFAULT_CONFIG.copy(), filename)
    return json.loads(path.read_text(encoding="utf-8"))

def save_config(config, filename="config.json"):
    """Save the config to a file."""
    Path(filename).write_text(json.dumps(config, indent=2), encoding="utf-8")

def update_config(key, value, filename="config.json"):
    """Update a configuration item."""
    config = load_config(filename)
    config[key] = value
    save_config(config, filename)
    return config

print(update_config("theme", "dark"))

Reference implementation and walkthrough

file_stats should report line count, character count, word count, and the longest line. The default=(0, "") in max() is important because it keeps empty files from crashing the code.
The diary app should append timestamped entries to a text file, list them back in order, and keep the storage format simple enough to inspect by hand.
The config manager should load JSON, create a default file if none exists, update one key, and save the result back with pretty-printed JSON. Using Path keeps the file paths portable.

Evidence to Keep

Keep this page’s proof of learning as a small evidence card:

Pattern: class, exception, file IO, functional pipeline, generator, or type hint
Code Artifact: minimal runnable example and one realistic use case
Output: printed object state, caught error, saved file, yielded values, or type-check note
Failure Check: hidden mutation, swallowed exception, file path issue, lazy iterator confusion, or misleading annotation
Expected Output: small advanced-Python example with a debugging note

Summary

Operation	Code	Notes
Write file	`with open("f.txt", "w") as f:`	`"w"` overwrites, `"a"` appends
Read file	`with open("f.txt", "r") as f:`	`.read()`, `.readlines()`
Write JSON	`json.dump(data, file)`	Dictionary → JSON file
Read JSON	`json.load(file)`	JSON file → Dictionary
Write CSV	`csv.writer(file).writerow()`	List → CSV row
Read CSV	`csv.reader(file)`	CSV row → List
Path handling	`Path("data") / "file.txt"`	`pathlib` is recommended